Can't get the mzIdentML schema for version

Issue #21 resolved
Anonymous created an issue

I'm trying to use the mzid function to open MzIdentMl 1.2 files and it says that is not possible to get the schema, and for that reason, the pyton program is unable to select the information that I need.

Is it pyteomics only suitable for MzIdentMl 1.1 files?

Comments (8)

  1. Esteban

    Hello. This is the code that I am trying to implemen. It's really simple:

    import pandas as pd
    from pyteomics import mzid
    pand = data('example.mzid')
    sequences = pand['PeptideSequence'].tolist()
    
    print sequences
    

    Since I only want the sequences from the file. When I try the program using the example available in your website works really well, but when I tried to use a different file I got this error:

    Can't get the mzIdentML schema for version `1.2.0` from <http://www.psidev.info/files/mzIdentML1.2.0.xsd> at the moment.
    Using defaults for 1.1.0.
    You can disable reading the schema by specifying `read_schema=False`.
    Traceback (most recent call last):
      File "mz.py", line 3, in <module>
        pand = mzid.DataFrame('example2.mzid')
      File "/usr/local/lib/python2.7/dist-packages/pyteomics/mzid.py", line 345, in DataFrame
        prot_descr.append(d['protein description'])
    KeyError: 'protein description'
    
  2. Lev Levitsky repo owner

    Thank you! I'll take a closer look, but I can already see that the problem is in the mzid.DataFrame function and not in the parser itself.

    Meanwhile, if all you need is a list of sequences, you don't necessarily need to build a DataFrame. You can create the list like this:

    with mzid.read('example2.mzid') as f:
        sequences = [item['SpectrumIdentificationItem'][0]['PeptideSequence'] for item in f]
    

    You can also retrieve any other information from the file in this way.

  3. Log in to comment