Matching with the API

Issue #12 new
Former user created an issue

Hi,
I'm trying to resolve a list of names using taxonome, using the python API and I wanted to know what is the best approach to do it.
At first, I the code below, but the problem was that sometime the "original name" column has been changed and wasn't the exact same string I provided as the input. Next, I tried using a different tracker, and so instead of "tracker.CSVListMatches(f)" I used "tracker.CSVTaxaTracker(f,["Id"])" and used the "id" column to match my original row.
The problem was that using this tracker I don't get the score. Also, looking at the code the two trackers look a little different so I wondered whom should I use.

Thanks,

Here is the code snippet:

Comments (4)

  1. Former user Account Deleted
    • edited description

    trackers = [] files = [] f = open(mappings_file, "w", encoding='utf-8', newline='') files.append(f) trackers.append(tracker.CSVListMatches(f))

    f = open(log_file, "w", encoding='utf-8', newline='')
    files.append(f)
    trackers.append(tracker.CSVTracker(f))
    
    with open(input_filename, encoding='utf-8', errors='ignore') as f:
        input_taxa = load_taxa(f, namefield=namefield, authfield=authfield)
    
    run_match_taxa (input_taxa,taxonset, tracker=trackers,nameselector=name_selector.NameSelector())
    
  2. Thomas Kluyver

    (Just formatting the code)

    trackers = []
    files = []
    f = open(mappings_file, "w", encoding='utf-8', newline='')
    files.append(f)
    trackers.append(tracker.CSVListMatches(f))
    
    f = open(log_file, "w", encoding='utf-8', newline='')
    files.append(f)
    trackers.append(tracker.CSVTracker(f))
    
    with open(input_filename, encoding='utf-8', errors='ignore') as f:
        input_taxa = load_taxa(f, namefield=namefield, authfield=authfield)
    
    run_match_taxa (input_taxa,taxonset, tracker=trackers,nameselector=name_selector.NameSelector())
    
  3. Thomas Kluyver

    If you use iter_taxa instead of load_taxa, then it should work through the records from your input file in strict order, so the rows in the CSVListMatches output file will correspond to the input file.

    load_taxa loads the taxa into an unordered collection, which you then iterate over. Actually, my newer implementation does preserve order, but it's best not to rely on that, because it may e.g. remove duplicate taxa.

    Another approach would be to write a specific tracker. You could copy CSVListMatches, but record the id field in place of the original name.

  4. Log in to comment