[support request] option to not update references if only the date changed

Issue #104 new
Egon Willighagen created an issue

Is there an option to not overwrite references if only the date? Because right now, if I run my PBB-based bot, it will update the reference if the date changed. An example diff:

https://www.wikidata.org/w/index.php?title=Q118551&diff=prev&oldid=370957915

That makes sense if I run the bot once a month or so, but not when I run it regularly.

It would be great if, besides append and overwrite, there is an option to ignore these kind of changes and not write to Wikidata. Is there a way for me to do this with the current code? If not, please consider this a feature request.

Comments (4)

  1. Sebastian Burgstaller

    I will not be able to finish implementation a solution before early next week. What I can do at this point is to introduce a workaround, which allows for not overwriting refs, but this will required you to change your code as soon as I have implemented the final solution.

  2. Sebastian Burgstaller

    If you would like to only update items which have changed and therefore require an update, you could use the fastrun mode of PBB_Core, see an example at the bottom of this readme.

  3. Egon Willighagen reporter

    Short term versus long term solutions needing me to update code is fine!

    There is no immediate urgency. Regarding the 'fastrun', I will play with that. The definition of 'updated' may need clarification. An updated "retrieved" data is also an update: it basically says, yes, the in the latest version this statement is still correct. So, fastrun may actually be what I'm looking for, but a granular tunable system sounds like the way forward. That aside, will 'fastrun' tell me if an item is not updated (at some level)? Because I like to count the number of changes it made, so that my I can tell my bot: change (max) X items in this run. So, the non-changes I need to detect and not count.

    (BTW, appreciating your prompt replies!)

  4. Sebastian Burgstaller

    The purpose of the fastrun mode is to speed up bot run times for very many items, because it only checks if a value or qualifier of a statement changed, it ignores the references completely. The reason apart from substantial bot runs speedups is that Wikidata users have been complaining to us repeatedly that we make updates to items where the only change is the 'retrieved on' date. So if the references should be updated for a whole corpus of data, a full run needs to be performed, because in such cases, every item needs to be written anyway and not speed gains can be achieved.

    The way to determine if a true write was performed is to just check if self.require_write of a WDItemEngine instance is False after a write call, if so, no write has been performed to the actual item in Wikidata. The diff method in order to complete this, I will try to implement by early next week.

    Thanks, I try to react as fast as possible, but this week is quite of busy. -sebastian

  5. Log in to comment