'None' authority resolution

Issue #19 new
Former user created an issue

Hi Thomas,
I came across a strange behavior of the program. I'm using the API, and sometimes getting a 'None' authority 'Matched Name' field, found in the resolved-names.csv output. The final 'Name' and 'Authority' fields are OK.
We looked into this matter a bit and narrowed down the cases in which we get a 'None' authority in the matched name field to two cases:
1. When the original name has no authority. This is totally acceptable and makes perfect sense. 2. When a fuzzy match is performed in the species name (not in the authority). For example, the original name was 'Adonis distortus Ten.' , the matched name was 'Adonis distorta None' and the final resolved name was 'Adonis distorta Ten.'. Why would the intermediate match include the None?
In both cases, this only happens for part of the names.
When looking into the code, we explored the options of the 'select' function under collection.py. It seems that when changing the 'prefer_accepted' parameter from 'all' to 'noauth', we do not get 'None' authority anymore. However, it doesn't really make sense to me, plus I want to keep this behavior for all matches, not just the ones with no authority.
Could you please explain why would a None authority ever be inserted and what should be done to avoid it?
Please see the attached files for examples of the two cases.
Thanks,
Lior

Comments (10)

  1. Thomas Kluyver

    Hi Lior,

    Can you describe the steps you're using that lead to that result - or better still, attach a script that demonstrates it using your example files. I'm having trouble replicating the problem on my machine.

    Thanks

  2. Thomas Kluyver

    By the way, I also noticed there's blank space after your authority names in the sample file, e.g. "'Adonis distortus Ten. ". I'm surprised that the code doesn't already ignore that, but it does appear to make a difference - I wonder if it could have caused what you're seeing.

  3. Former user Account Deleted

    Hi,
    Sorry for the delay. It took me some time to create a simple code that you can easily run and reproduce my problem.
    So this is a basic version of what I'm running. The code is a bit complex, and I'm not sure I need all the imports, but it works fine for me and 'None' authorities are produced. Just run it on the files that I provided (you can remove the spaces, although I don't think they actually matter).
    Hope that helps. Let me know if you need any other details.
    Thanks!

  4. Thomas Kluyver

    Ah, 'Matched name' is a field that you're adding in your own code. You're defining it as the newname value from the last name_transform event that had a newname and was not one of 'synonymy' or 'preferring accepted name'. If that event is a fuzzy matching event, then newname is a plain string, not a Name object, so there is no authority field. In that case, it generates a list of possibly matching name objects (which isn't recorded by an event), and the 'preferring accepted name' event is when it decides which of those to pick. Possibly you should allow a preferring accepted name event to set matched name?

    By the way, I finished up the SQLite store for taxa, and made a new release - I hope it's useful!

  5. Former user Account Deleted

    Thanks!
    Indeed there was a bug in our code. We located and fixed it, so we don't get 'None' anymore.
    We would most definitely be interested to start using SQLite, but it may take some time for us to integrate it into our pipelines.
    Speaking of additions to next releases, do you think it possible to make The Plant List one of the 'ready to use' databases in taxonome?
    Thanks again!

  6. Thomas Kluyver

    I've just gone and looked at the plant list's CSV download format, and remembered why I haven't already integrated it. As far as I can tell, in the CSV, synonyms do not have any reference to the accepted name for that taxon, so it's impossible to build a synonymy table from that CSV. I've got a vague memory that I emailed them to try to get that fixed, but I might have imagined that.

  7. Former user Account Deleted

    This is not exactly true. I'm using TPL as reference for all my name resolutions. Notice the Accepted ID column in their csvs. It can be used to refer to the accepted name of each synonym.

  8. Thomas Kluyver

    Aha, well spotted. That's not in their documentation of the format. I wonder if I did send the email, and the column got added in response to that.

    When I get some spare time, I'll add a wrapper around that, then. Or if you want to have a go at it, see the existing wrappers in taxonome.services. The API is a bit odd because there are different things we can do with different services, but I'm happy to provide pointers.

  9. Log in to comment