Tag cloud seems to be wrong
The tag cloud is wrong (regarding post count) http://www.bibsonomy.org/search/clustering
Comments (19)
-
-
- changed status to duplicate
Duplicate of
#2439. -
reporter But 30 for dblp seems to be the wrong number and clustering should be something like 1700 as /tag/clustering has 1700 posts.
-
removed old rawsearch and replaced it with search (modified search was used by the database queries)
getPosts used rawsearch while getTags used search (which added boolean operators to the query)
addresses #2573; reopening #2573
→ <<cset ebef8c5242d1>>
-
- changed status to open
removed old rawsearch and replaced it with search (modified search was used by the database queries)
getPosts used rawsearch while getTags used search (which added boolean operators to the query)
addresses #2573; reopening #2573
→ <<cset ebef8c5242d1>>
-
- changed component to search
- changed title to Tag cloud seems to be wrong
- changed milestone to 3.5
- edited description
-
one problem was that the methods for retrieving posts and tags did not use the same query:
getPosts used
clustering methods
and getTags used+clustering +methods
-
Still the limit and offset for search is broken: The tag cloud is only computed on
limit
post results; currently we compute the tag cloud based on the tags of the 1000 most recent posts that matched the query. Do we want to change this behaviour? (Note: handling was not changed when switching from lucene to elasticsearch). I don't know how performant it is to calc the tag cloud on all results. Maybe we could use the aggregation function of Elasticsearch.@jaeschke Do you have any experience with aggregations in Elasticsearch?
-
No, unfortunately, not yet. It sounds interesting and like a good option. For single-tag queries we can, of course, use the database which provides the related tags. For more than two tags, however, we have to rely on Elasticsearch.
-
reporter Even for single tag queries, numbers can be different as tags can be in description of other posts as well. The best way would be the use of the elastic search. I suggest to have it as a kind of lazy loading with Ajax. This would allow to load the page very quickly and the tag cloud with some delay.
-
Yes, that's possible. We have/had AJAX loading of the tag cloud implemented anyway (for the case where a user changed the number of shown tags, etc.). What I don't understand is the remark "tags can be in description of other posts as well". Which description and which posts are meant? Normally, the tag cloud shows all tags of all posts of the user.
-
reporter Ok, tags is the wrong term, it is more words which can be found in the description field. The tags of this post should be count as well.
-
Yes, that makes sense indeed.
-
- changed milestone to 3.6
-
- changed milestone to 3.6.0
-
- changed milestone to 3.7.0
-
no system for testing aggregation -> next release
-
- changed milestone to 3.8.0
-
- removed responsible
- Log in to comment
the size is correct, dblp with ~1000 dominates the cloud.