BaGLAMa: Data quality / reliability

Issue #11 new
Beat Estermann created an issue

I have some doubts regarding the quality of the data. Maybe it helps tracking related problems if I list a few observations that I made a couple of weeks ago while inspecting the stats for "Media contributed by Zentralbibliothek Zürich":

  • In the English version: The Wikipedia page “Zurich” had monthly page views between 95’922 and 76’978 between April and July 2013. Then, the lemma was changed to “Zürich”, and the monthly page views dropped to 12’944 in August 2013, before recovering to 53’430 in the following month. In general, values seem to be generally lower after the article was moved to the new lemma. – Is there something wrong with the way page view statistics are provided for pages that were accessed through a redirect?

  • In the English version: The Wikipedia page “Zürich” doesn’t appear in the monthly page view statistics for January 2014, although the picture still figures in the article. – Is there an explanation why it was not counted this time?

My impression is that artefacts as the ones described above could easily result in errors of +/- 25%. That’s quite a lot, and we should probably come to grips with these artefacts or at least try to provide a reasonable estimate of the error if we want to provide our GLAM partners with reliable statistics.

Comments (1)

  1. Log in to comment