Error related to pandas library

Issue #1 resolved
Jose Manuel Molero created an issue

We are using the last version of pandas but we obtain the following error:

Loading data.
/cm/shared/apps/mageck-vispr/2015/lib/python3.4/site-packages/vispr/results/target.py:28: FutureWarning: sort(columns=....) is deprecated, use sort_values(by=.....)
  self.df.sort("p-value", inplace=True)
Traceback (most recent call last):
  File "/cm/shared/apps/mageck-vispr/2015/bin/vispr", line 6, in <module>
    sys.exit(main())
  File "/cm/shared/apps/mageck-vispr/2015/lib/python3.4/site-packages/vispr/cli.py", line 190, in main
    init_server(*args.config, port=args.port)
  File "/cm/shared/apps/mageck-vispr/2015/lib/python3.4/site-packages/vispr/cli.py", line 40, in init_server
    app.screens.add(config, parentdir=os.path.dirname(path))
  File "/cm/shared/apps/mageck-vispr/2015/lib/python3.4/site-packages/vispr/results/__init__.py", line 28, in add
    self.screens[screen] = Screen(config, parentdir=parentdir)
  File "/cm/shared/apps/mageck-vispr/2015/lib/python3.4/site-packages/vispr/results/__init__.py", line 83, in __init__
    posterior_efficiency=get_path(config["sgrnas"].get("results", None)))
  File "/cm/shared/apps/mageck-vispr/2015/lib/python3.4/site-packages/vispr/results/rna.py", line 43, in __init__
    posterior_efficiency.columns = ["gene", "eff"]
  File "/cm/shared/apps/mageck-vispr/2015/lib/python3.4/site-packages/pandas/core/generic.py", line 2257, in __setattr__
    return object.__setattr__(self, name, value)
  File "pandas/src/properties.pyx", line 65, in pandas.lib.AxisProperty.__set__ (pandas/lib.c:44611)
  File "/cm/shared/apps/mageck-vispr/2015/lib/python3.4/site-packages/pandas/core/generic.py", line 424, in _set_axis
    self._data.set_axis(axis, labels)
  File "/cm/shared/apps/mageck-vispr/2015/lib/python3.4/site-packages/pandas/core/internals.py", line 2460, in set_axis
    'new values have %d elements' % (old_len, new_len))
ValueError: Length mismatch: Expected axis has 13 elements, new values have 2 elements

EDIT: Package updated using miniconda3

conda list
# packages in environment at /cm/shared/apps/miniconda3:
#
appdirs                   1.4.0                    py35_0    bioconda
conda                     3.19.0                   py35_0    defaults
conda-env                 2.4.5                    py35_0    defaults
cutadapt                  1.9.1                    py35_0    bioconda
docutils                  0.12                     py35_0    defaults
fastqc                    0.11.4                        2    bioconda
flask                     0.10.1                   py35_1    defaults
itsdangerous              0.24                     py35_0    defaults
java-jdk                  8.0.45                        0    bioconda
jinja2                    2.8                      py35_0    defaults
libgfortran               1.0                           0    defaults
mageck                    0.5.2                    py35_0    bioconda
mageck-vispr              0.4.5                    py35_0    bioconda
markupsafe                0.23                     py35_0    defaults
numpy                     1.10.2                   py35_0    defaults
openblas                  0.2.14                        3    defaults
openssl                   1.0.2d                        0    defaults
pandas                    0.17.1              np110py35_0    defaults
pip                       7.1.2                    py35_0    defaults
pycosat                   0.6.1                    py35_0    defaults
pycrypto                  2.6.1                    py35_0    defaults
python                    3.5.1                         0    defaults
python-dateutil           2.4.2                    py35_0    defaults
pytz                      2015.7                   py35_0    defaults
pyyaml                    3.11                     py35_1    defaults
readline                  6.2                           2    defaults
requests                  2.9.0                    py35_0    defaults
scikit-learn              0.17                np110py35_1    defaults
scipy                     0.16.1              np110py35_0    defaults
setuptools                19.1.1                   py35_0    defaults
six                       1.10.0                   py35_0    defaults
snakemake                 3.5.4                    py35_1    bioconda
sqlite                    3.8.4.1                       1    defaults
tk                        8.5.18                        0    defaults
vispr                     0.4.8                    py35_0    bioconda
werkzeug                  0.11.3                   py35_0    defaults
wheel                     0.26.0                   py35_1    defaults
xz                        5.0.5                         0    defaults
yaml                      0.1.6                         0    defaults
zlib                      1.2.8                         0    defaults

EDIT:

Configuration file used:

assembly: hg38
experiment: myexperiment2
fastqc:
  1_1_YM155:
  - qc/1_1_YM155_0/trim5_1_1_D704_fastqc/fastqc_data.txt
  1_1_reference:
  - qc/1_1_reference_0/trim5_1_1_D702_fastqc/fastqc_data.txt
  1_1_time0:
  - qc/1_1_time0_0/trim5_1_1_D701_fastqc/fastqc_data.txt
sgrnas:
  annotation: sgrnas.bed
  counts: count/all.count_normalized.txt
  mapstats: count/all.countsummary.txt
  results: test/myexperiment2.sgrna_summary.txt
species: homo_sapiens
targets:
  genes: true
  results: test/myexperiment2.gene_summary.txt

Any suggestion? Thanks in advance and regards,

Comments (27)

  1. Johannes Köster

    Hi Jose!
    Thanks for using MAGeCK-VISPR. The mageck-vispr conda package needs Python >=3.3. From the error message I infer that you try to install it from Python 2.7. I recommend to reinstall the Python 3 version of Miniconda. Alternatively, you can create a new environment with Python 3. It depends on your level of computational experience which approach is best for you.

    Your error looks like something in your input data for VISPR is weird. Can you post the command you used? Also, was the input generated with mageck-vispr or generated manually?

    I have edited your post to correctly format the output.

  2. Jose Manuel Molero reporter

    Thanks,

    I will try to reinstall again (Miniconda and MAGeCK-VISPR) using python 3.

    Regarding the error, this was the input:

    vispr server myexperiment2.vispr.yaml

    This was executed by an user of the cluster I am managing, I will ask him in order to obtain more details.

    Thanks and best regards.

    PS: Sorry about the format of my previous post

  3. Johannes Köster

    Something in your setup is weird. Did you ever manage to install mageck-vispr via conda? Because your In your conda installation, you still have python 2.7 installed. You can switch to python 3 with

    conda install python=3
    

    After that, try to reinstall mageck-vispr with

    conda install --channel bioconda mageck-vispr
    
  4. Jose Manuel Molero reporter

    Post updated:

    The input used in the case of the error message was generated with mageck-vispr. The program runs without error until I issue the final comment:

    vispr server experiment.vispr.yaml

  5. Johannes Köster

    Ok, looks like everything is installed correctly and with the most up-to-date version now. Sorry for bugging you before, there was still some maybe old output in your description that told me that Python 2.7 was installed. The error you see comes from vispr. Can you give me the first few lines of test/myexperiment2.sgrna_summary.txt?

    As a quick workaround, you can comment out the line results: test/myexperiment2.sgrna_summary.txt.

  6. Johannes Köster

    Mhm, this looks quite unexpected. @davidliwei did you change something in the format of the sgrna_summary file? What columns are can be expected with which mode of mageck?

  7. Johannes Köster

    Ok, then I know how to handle this in VISPR. @jmlero I will provide a fix tomorrow. Until then, commenting out as shown above should work for you. Sorry for the inconvenience!

  8. Johannes Köster

    Hi, can you also show me your mageck-vispr config file? The file that is stored next to the Snakefile in your working directory.

  9. Enrico Girardi

    Hi, I am also working with mageck-vispr together with Jose. Thanks for all the help. Here is the mageck-vispr config file used to generate the error above.

    # General configuration:
    
    # Path to library design file (csv format, columns: id, sequence, gene)
    library: /scratch/users/egirardi/libraries/mageck_indexes/gRNA_ID-sequence-gene.csv
    # Species to use for linkouts in VISPR (e.g. mus_musculus, homo_sapiens, ...)
    species: homo_sapiens
    # Genome assembly to use for linkouts in VISPR (e.g. hg19, hg38, mm9, mm10, ...)
    assembly: hg38
    
    # Configuration of knockout target display in VISPR
    targets:
        # if screening genes, set this to true for proper linkouts to GeneMANIA and Ensembl in VISPR
        genes: true
        # file with genes to hide per default in VISPR (optional, one gene per line)
        #controls: ribosomal_genes.txt
    
    # Configuration of sgRNAs
    sgrnas:
        # estimate sgRNA knockout efficiency during EM-procedure of MAGeCK-MLE
        update-efficiency: false
        # trim the 5 prime end to get rid of barcode sequences in the reads
        trim-5: 0
        # specify the length of the sgRNAs (without PAM sequence)
        len: 20
        # sequencing adapter that shall be removed from reads before processing with MAGeCK (optional)
        #adapter: ACGGCTAGCTGA
    
    # Configuration of samples
    samples:
        # The following sample information was inferred from the given FASTQ files.
        # Adjust it according to your needs (e.g. providing descriptive sample names and grouping replicates together).
        1_1_treated:
            - ../../trim_5prime/1_1/trim5_1_1_D704.fastq
        1_1_reference:
            - ../../trim_5prime/1_1/trim5_1_1_D702.fastq
        1_1_time0:
            - ../../trim_5prime/1_1/trim5_1_1_D701.fastq
    
    # Configuration of experiments.
    # An experiment defines the comparison that shall be analyzed with MAGeCK.
    # You can define as many experiments as you want.
    # You can define both MAGeCK-RRA or MAGeCK-MLE experiments.
    experiments:
        # provide a descriptive name for your experiment (it will show up in VISPR)
        #"myexperiment1":
            # This is a MAGeCK-MLE experiment.
            # Here, a design matrix has to be given (see http://liulab.dfci.harvard.edu/Mageck for details).
            # Sample names in the design matrix must refer to the samples defined above.
         #   designmatrix: path/to/designmatrix.txt
        "myexperiment2":
            # This is a MAGeCK-RRA experiment.
            # You must specify treatment and control samples.
            # The sample names must refer to the samples defined above in the
            # samples section.
            treatment:
                - 1_1_treated
            control:
                - 1_1_reference
                - 1_1_time0
    

    I also tried the workaround suggested but it gave the same error.

    In the meantime I also run another RRA experiment with the 0.4.8 version and vispr completed successfully. The only problem is that the Results table does not show any entry (see attached pic), even though the tables are available in the results folder. Many thanks.Screenshot_vispr.png

  10. Johannes Köster

    Regarding your first error: can you please re-create the vispr config file with

    snakemake -R vispr
    

    This will re-run the rule that creates that config file. I propose this, because the bug should have been already fixed in the latest mageck-vispr version. Hence I guess that you updated mageck-vispr but still use the vispr config created by the old mageck-vispr that contained the bug.

    Regarding your second error:
    Can you show me the vispr output in the corresponding terminal?

  11. Enrico Girardi

    Reformatted output,sorry:

    As you suggested, the re-created vispr config file worked fine this time, thanks! The vispr output though has the same issue as above with the missing genes in the Results tab.

    The output from the mageck-vispr for the second error above is:

    Provided cores: 4
    Rules claiming more threads will be scaled down.
    Job counts:
        count   jobs
        1   all
        1   annotate_sgrnas
        8   fastqc
        1   mageck_count
        1   mageck_rra
        1   vispr
        13
    rule mageck_count:
        input: ../../5_trim/high_MTX_2_1_5-3trim.fastq, ../../5_trim/noMTX_2_2_5-3trim.fastq, ../../5_trim/noMTX_2_1_5-3trim.fastq, ../../5_trim/time0_2_2_5-3trim.fastq, ../../5_trim/time0_2_1_5-3trim.fastq, ../../5_trim/IC50_MTX_2_1_5-3trim.fastq, ../../5_trim/IC50_MTX_2_2_5-3trim.fastq, ../../5_trim/high_MTX_2_2_5-3trim.fastq, /scratch/users/egirardi/libraries/mageck_indexes/SLC_library_gRNA_ID-sequence-gene.csv
        output: results/count/all.count.txt, results/count/all.count_normalized.txt, results/count/all.countsummary.txt
        log: logs/mageck/count/all.log
    rule fastqc:
        input: ../../5_trim/noMTX_2_1_5-3trim.fastq
        output: results/qc/noMTX_2_0
        log: logs/fastqc/noMTX_2_0.log
    rule fastqc:
        input: ../../5_trim/time0_2_1_5-3trim.fastq
        output: results/qc/time0_2_0
        log: logs/fastqc/time0_2_0.log
    rule fastqc:
        input: ../../5_trim/time0_2_2_5-3trim.fastq
        output: results/qc/time0_2_1
        log: logs/fastqc/time0_2_1.log
    Analysis complete for noMTX_2_1_5-3trim.fastq
    Analysis complete for time0_2_2_5-3trim.fastq
    Analysis complete for time0_2_1_5-3trim.fastq
    1 of 13 steps (8%) done
    rule fastqc:
        input: ../../5_trim/high_MTX_2_1_5-3trim.fastq
        output: results/qc/high_MTX_2_1
        log: logs/fastqc/high_MTX_2_1.log
    Analysis complete for high_MTX_2_1_5-3trim.fastq
    2 of 13 steps (15%) done
    rule fastqc:
        input: ../../5_trim/IC50_MTX_2_1_5-3trim.fastq
        output: results/qc/IC50_MTX_2_0
        log: logs/fastqc/IC50_MTX_2_0.log
    Analysis complete for IC50_MTX_2_1_5-3trim.fastq
    3 of 13 steps (23%) done
    rule fastqc:
        input: ../../5_trim/high_MTX_2_2_5-3trim.fastq
        output: results/qc/high_MTX_2_0
        log: logs/fastqc/high_MTX_2_0.log
    Analysis complete for high_MTX_2_2_5-3trim.fastq
    4 of 13 steps (31%) done
    rule fastqc:
        input: ../../5_trim/IC50_MTX_2_2_5-3trim.fastq
        output: results/qc/IC50_MTX_2_1
        log: logs/fastqc/IC50_MTX_2_1.log
    Analysis complete for IC50_MTX_2_2_5-3trim.fastq
    5 of 13 steps (38%) done
    rule fastqc:
        input: ../../5_trim/noMTX_2_2_5-3trim.fastq
        output: results/qc/noMTX_2_1
        log: logs/fastqc/noMTX_2_1.log
    6 of 13 steps (46%) done
    rule annotate_sgrnas:
        input: /scratch/users/egirardi/libraries/mageck_indexes/SLC_library_gRNA_ID-sequence-gene.csv
        output: annotation/sgrnas.bed
        log: logs/annotation/sgrnas.log
    Analysis complete for noMTX_2_2_5-3trim.fastq
    7 of 13 steps (54%) done
    8 of 13 steps (62%) done
    9 of 13 steps (69%) done
    rule mageck_rra:
        input: results/count/all.count.txt
        output: results/test/IC50_vs_time0.gene_summary.txt, results/test/IC50_vs_time0.sgrna_summary.txt
        log: logs/mageck/test/IC50_vs_time0.log
    10 of 13 steps (77%) done
    11 of 13 steps (85%) done
    rule vispr:
        input: annotation/sgrnas.bed, results/test/IC50_vs_time0.gene_summary.txt, results/count/all.count_normalized.txt, results/count/all.countsummary.txt, results/test/IC50_vs_time0.sgrna_summary.txt, results/qc/high_MTX_2_1, results/qc/noMTX_2_1, results/qc/noMTX_2_0, results/qc/time0_2_1, results/qc/time0_2_0, results/qc/IC50_MTX_2_0, results/qc/IC50_MTX_2_1, results/qc/high_MTX_2_0
        output: results/IC50_vs_time0.vispr.yaml
    12 of 13 steps (92%) done
    localrule all:
        input: results/IC50_vs_time0.vispr.yaml
    13 of 13 steps (100%) done
    

    The output from the vispr server command is the standard one:

    [egirardi@n001 results]$ vispr server IC50_vs_time0.vispr.yaml Loading data. Starting server.
    
    Open: go to http://127.0.0.1:5000 in your browser. Note: Safari and Internet Explorer are currently unsupported. Close: hit Ctrl-C in this terminal.
    
  12. Johannes Köster

    Interesting, seems to be a browser issue then. Do you have javascript disabled in your browser?

  13. Enrico Girardi

    It seems to be, running vispr on an updated version of firefox solved the issue. Many thanks for all your help!

  14. Enrico Girardi

    Apologies for yet another post, but I am running into some additional issues with vispr.

    I did another RRA experiment (but the behaviour is the same with MLE experiments) and successfully completed the analysis. However, the vispr server command gives me this error:

    [egirardi@n001 results]$ vispr server test.vispr.yaml 
    Loading data.
    Traceback (most recent call last):
      File "/cm/shared/apps/miniconda3/bin/vispr", line 6, in <module>
        sys.exit(main())
      File "/cm/shared/apps/miniconda3/lib/python3.5/site-packages/vispr/cli.py", line 190, in main
        init_server(*args.config, port=args.port)
      File "/cm/shared/apps/miniconda3/lib/python3.5/site-packages/vispr/cli.py", line 40, in init_server
        app.screens.add(config, parentdir=os.path.dirname(path))
      File "/cm/shared/apps/miniconda3/lib/python3.5/site-packages/vispr/results/__init__.py", line 28, in add
        self.screens[screen] = Screen(config, parentdir=parentdir)
      File "/cm/shared/apps/miniconda3/lib/python3.5/site-packages/vispr/results/__init__.py", line 93, in __init__
        for sample, paths in config["fastqc"].items()
      File "/cm/shared/apps/miniconda3/lib/python3.5/site-packages/vispr/results/fastqc.py", line 44, in __init__
        self.gc_content = pd.concat(self.gc_content)
      File "/cm/shared/apps/miniconda3/lib/python3.5/site-packages/pandas/tools/merge.py", line 812, in concat
        copy=copy)
      File "/cm/shared/apps/miniconda3/lib/python3.5/site-packages/pandas/tools/merge.py", line 845, in __init__
        raise ValueError('No objects to concatenate')
    ValueError: No objects to concatenate
    

    Here is the test.vispr.yaml:

    assembly: hg38
    experiment: test
    fastqc:
      E680_treated:
      - qc/E680_treated_0/E680_treated_01_5-3trim_20bp_fastqc/fastqc_data.txt
      E680_Untreated:
      - qc/E680_Untreated_0/E680_Untreated_5-3trim_20bp_fastqc/fastqc_data.txt
    sgrnas:
      annotation: sgrnas.bed
      counts: count/all.count_normalized.txt
      mapstats: count/all.countsummary.txt
    species: homo_sapiens
    targets:
      genes: true
      results: test/test.gene_summary.txt
    

    Any idea what the problem could be? Other experiments using different datasets work just fine. Thanks!

  15. Johannes Köster

    Looks like your fastqc reports don't contain GC content information. I have just released a new version of VISPR that fixes the resulting error and simply does not display the corresponding plot.

    Thanks a lot for your reports, you are really helping to make VISPR and MAGeCK-VISPR more user friendly and robust!

  16. Enrico Girardi

    Thank you Johannes, that works perfectly. Fantastic work, really loving vispr for visualizing all the data!

  17. Log in to comment