Tag cloud seems to be wrong

Create issue
Issue #2573 open
Andreas Hotho created an issue

The tag cloud is wrong (regarding post count) http://www.bibsonomy.org/search/clustering

Comments (19)

  1. Andreas Hotho reporter

    But 30 for dblp seems to be the wrong number and clustering should be something like 1700 as /tag/clustering has 1700 posts.

  2. Daniel Zoller

    removed old rawsearch and replaced it with search (modified search was used by the database queries)

    getPosts used rawsearch while getTags used search (which added boolean operators to the query)

    addresses #2573; reopening #2573

    → <<cset ebef8c5242d1>>

  3. Daniel Zoller
    • changed status to open

    removed old rawsearch and replaced it with search (modified search was used by the database queries)

    getPosts used rawsearch while getTags used search (which added boolean operators to the query)

    addresses #2573; reopening #2573

    → <<cset ebef8c5242d1>>

  4. Daniel Zoller

    one problem was that the methods for retrieving posts and tags did not use the same query:

    getPosts used clustering methods and getTags used +clustering +methods

  5. Daniel Zoller

    Still the limit and offset for search is broken: The tag cloud is only computed on limit post results; currently we compute the tag cloud based on the tags of the 1000 most recent posts that matched the query. Do we want to change this behaviour? (Note: handling was not changed when switching from lucene to elasticsearch). I don't know how performant it is to calc the tag cloud on all results. Maybe we could use the aggregation function of Elasticsearch.

    @jaeschke Do you have any experience with aggregations in Elasticsearch?

  6. Robert Jäschke

    No, unfortunately, not yet. It sounds interesting and like a good option. For single-tag queries we can, of course, use the database which provides the related tags. For more than two tags, however, we have to rely on Elasticsearch.

  7. Andreas Hotho reporter

    Even for single tag queries, numbers can be different as tags can be in description of other posts as well. The best way would be the use of the elastic search. I suggest to have it as a kind of lazy loading with Ajax. This would allow to load the page very quickly and the tag cloud with some delay.

  8. Robert Jäschke

    Yes, that's possible. We have/had AJAX loading of the tag cloud implemented anyway (for the case where a user changed the number of shown tags, etc.). What I don't understand is the remark "tags can be in description of other posts as well". Which description and which posts are meant? Normally, the tag cloud shows all tags of all posts of the user.

  9. Andreas Hotho reporter

    Ok, tags is the wrong term, it is more words which can be found in the description field. The tags of this post should be count as well.

  10. Log in to comment