iphop v1.3.2 not compatible with pandas > 2.0
I ran iphop v1.3.2 recently and came upon this error:
Looks like everything is now set up, we will first clean up the input file, and then we will start the host prediction steps themselves
[1/1/Skip] Skipping computation of blastn against microbial genomes...
[1/3/Skip] Skipping blast parsing...
[2/1/Skip] Skipping computation of blastn against CRISPR...
[2/2/Skip] Skipping crispr parsing...
[3/1/Skip] Skipping computation of WIsH scores...
[3/2/Skip] Skipping WIsH parsing...
[4/1/Skip] Skipping computation of VHM s2 similarities...
[4/2/Skip] Skipping VHM parsing...
[5/1/Skip] Skipping computation of PHP scores...
[5/2/Skip] Skipping PHP parsing...
[6/1/Skip] Skipping RaFAH...
[6/2/Skip] Skipping RaFAH parsing...
[6.5/1/Skip] Skipping diamond search against RaFAH refs...
[6.5/2/Skip] Skipping calculation of AAI to RaFAH refs...
[7] Aggregating all results and formatting for TensorFlow...
[7/1] Loading all parsed data...
[7/2] Loading corresponding host taxonomy...
[7/3] Link matching genomes to representatives and filter out redundant / useless matches...
Filtering blast data
### Welcome to iPHoP ###
Traceback (most recent call last):
File "/fs/project/PAS1117/modules/iPHoP/1.1.0/bin/iphop", line 10, in <module>
sys.exit(cli())
File "/users/PAS1573/riddell26/.local/lib/python3.8/site-packages/iphop/iphop.py", line 128, in cli
args["func"](args)
File "/users/PAS1573/riddell26/.local/lib/python3.8/site-packages/iphop/modules/master_predict.py", line 102, in main
dataprep.aggregate(args)
File "/users/PAS1573/riddell26/.local/lib/python3.8/site-packages/iphop/modules/dataprep.py", line 40, in aggregate
store_filtered = filter_hits(args,store,store_filtered,host_info)
File "/users/PAS1573/riddell26/.local/lib/python3.8/site-packages/iphop/modules/dataprep.py", line 215, in filter_hits
store_filtered = store_filtered.append(df) ## Append should work directly now that we have matched all column and names nicely
File "/users/PAS1573/riddell26/.local/lib/python3.8/site-packages/pandas/core/generic.py", line 5989, in __getattr__
return object.__getattribute__(self, name)
AttributeError: 'DataFrame' object has no attribute 'append'
I checked on stackoverflow and ‘append’ was deprecated in the newest version pandas, pandas 2.0.
I was able to solve it by loading an earlier version of pandas (1.5.3) but wanted to flag it in case anyone else has this same issue.
Here is the corrected output:
[7/3] Link matching genomes to representatives and filter out redundant / useless matches...
Filtering blast data
/users/PAS1573/riddell26/.local/lib/python3.8/site-packages/iphop/modules/dataprep.py:204: FutureWarning: The default value of numeric_only in DataFrameGroupBy.sum is deprecated. In a future version, numeric_only will default to False. Either specify numeric_only or select only columns which should be valid for the function.
tmp = df.groupby(['Virus','Repr','Host']).sum().reset_index() ## For each genome, we take the sum of all hits
/users/PAS1573/riddell26/.local/lib/python3.8/site-packages/iphop/modules/dataprep.py:215: FutureWarning: The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead.
store_filtered = store_filtered.append(df) ## Append should work directly now that we have matched all column and names nicely
Filtering crispr data
Filtering wish data
Filtering vhm data
May be worth changing from “append” to “concat” in future versions or make explicit pandas<=2.0 if this issue becomes more common.
Cheers,
James
Comments (15)
-
repo owner -
reporter Hi Simon,
I didn’t realize until after I posted this, but the version I was running was a little bit broken--I loaded a module Ben Bolduc created for 1.1.0 but it loaded 1.3.2 instead and reproducibly, so there may have been other issues with that installation. Please disregard my comment! We re-installed v1.3.2 and I’m currently running it, but will update again if I still have the same issue or am successful.
Sincerely,
James -
repo owner Good to know, thanks for the update ! And let me know if everything works with the new install (fingers crossed… !).
-
reporter Hi Simon,
My run didn’t finish so I tried restarting it and ended up with this error message:
Looks like everything is now set up, we will first clean up the input file, and then we will start the host prediction steps themselves [1/1/Skip] Skipping computation of blastn against microbial genomes... [1/3/Skip] Skipping blast parsing... [2/1/Skip] Skipping computation of blastn against CRISPR... [2/2/Skip] Skipping crispr parsing... [3/1/Run] Running (recoded)WIsH... ### Welcome to iPHoP ### Traceback (most recent call last): File "/fs/project/PAS1117/modules/iPHoP/1.3.2/bin/iphop", line 10, in <module> sys.exit(cli()) File "/users/PAS1573/riddell26/.local/lib/python3.8/site-packages/iphop/iphop.py", line 128, in cli args["func"](args) File "/users/PAS1573/riddell26/.local/lib/python3.8/site-packages/iphop/modules/master_predict.py", line 87, in main wish.run_and_parse_wish(args) File "/users/PAS1573/riddell26/.local/lib/python3.8/site-packages/iphop/modules/wish.py", line 44, in run_and_parse_wish run_rewish(args["fasta_file"],args["wishrawresult"],args["rewish_db_dir"],args["wish_negfit"],args["tmp"],threads_tmp,n_host_by_phage) File "/users/PAS1573/riddell26/.local/lib/python3.8/site-packages/iphop/modules/wish.py", line 106, in run_rewish utility.clean_files_in_dir(tmp_dir,logger) File "/users/PAS1573/riddell26/.local/lib/python3.8/site-packages/iphop/modules/utility.py", line 28, in clean_files_in_dir os.path.remove(file) AttributeError: module 'posixpath' has no attribute 'remove'
I’ll try running it from scratch again and see if that fixes it.
-James
-
repo owner Right, this looks like a bug similar to https://bitbucket.org/srouxjgi/iphop/issues/52/utilitypy-osremove-instead-of-ospathremove but should not be a major issue, because it’s only happening when iPHoP sees a partial folder and tries to clean it up. You can clean it up yourself by removing “wish_results/”, “wish.cmd”, and “wishparsed.csv” (if it exists) from the Wdir/ folder, then re-run (or re-run from scratch with longer walltime like you did).
-
reporter Update! I re-ran it with 24hr walltime and it timed out at 17:04:01:
Looks like everything is now set up, we will first clean up the input file, and then we will start the host prediction steps themselves [1/1/Run] Running blastn against genomes... [1/3/Run] Get relevant blast matches... [2/1/Run] Running blastn against CRISPR... [2/2/Run] Get relevant crispr matches... [3/1/Run] Running (recoded)WIsH... ### Welcome to iPHoP ### [3/1/Run] Running WIsH extra database... /users/PAS1573/riddell26/.local/lib/python3.8/site-packages/iphop/modules/wish.py:181: FutureWarning: Not prepending group keys to the result index of transform-like apply. In the future, the group keys will be included in the index, regardless of whether the applied function returns a like-indexed object. To preserve the previous behavior, use >>> .groupby(..., group_keys=False) To adopt the future behavior and silence this warning, use >>> .groupby(..., group_keys=True) final_output = final_output.sort_values(by='LL',ascending=False).groupby('Virus').apply(lambda x: x.nlargest(n=n_hostbyphage,columns='LL',keep='all')).reset_index(drop=True) [3/2/Run] Get relevant WIsH hits... [4/1/Run] Running VHM s2 similarities... [4/2/Run] Get relevant VHM hits... [5/1/Run] Running PHP... [5/2/Run] Get relevant PHP hits... [6/1/Run] Running RaFAH... [6/2/Run] Get relevant RaFAH scores... RaFAH results were empty this may be ok, but is still unusual, so you may want to check the rafah log (Wdir/rafah.log) [6.5/1/Run] Running Diamond comparison to RaFAH references... [6.5/2/Run] Get AAI distance to RaFAH refs... Traceback (most recent call last): File "/fs/project/PAS1117/modules/iPHoP/1.3.2/bin/iphop", line 10, in <module> sys.exit(cli()) File "/users/PAS1573/riddell26/.local/lib/python3.8/site-packages/iphop/iphop.py", line 128, in cli args["func"](args) File "/users/PAS1573/riddell26/.local/lib/python3.8/site-packages/iphop/modules/master_predict.py", line 95, in main aai_to_ref.run_and_parse_aai(args) File "/users/PAS1573/riddell26/.local/lib/python3.8/site-packages/iphop/modules/aai_to_ref.py", line 30, in run_and_parse_aai get_aai_results(faa_file,db_info,args["aai_out"],args["aai_parsed"]) File "/users/PAS1573/riddell26/.local/lib/python3.8/site-packages/iphop/modules/aai_to_ref.py", line 56, in get_aai_results for index, r in enumerate(SeqIO.parse(faa, 'fasta')): File "/users/PAS1573/riddell26/.local/lib/python3.8/site-packages/Bio/SeqIO/__init__.py", line 605, in parse return iterator_generator(handle) File "/users/PAS1573/riddell26/.local/lib/python3.8/site-packages/Bio/SeqIO/FastaIO.py", line 223, in __init__ super().__init__(source, mode="t", fmt="Fasta") File "/users/PAS1573/riddell26/.local/lib/python3.8/site-packages/Bio/SeqIO/Interfaces.py", line 45, in __init__ self.stream = open(source, "r" + mode) FileNotFoundError: [Errno 2] No such file or directory: '../phageHostPrediction/vOTU_iphop_67_target/Wdir/rafah_out/Full_CDS_Prediction.faa'
Not sure how to interpret this error, would love some help :)
-
repo owner Looks like RaFAH failed in its first step (cds prediction), which is often due to a problem with the Perl installation. We should know more by looking into “Wdir/rafah.log”, but typically this is a BioPerl issue or an issue with some libraries that Perl is using and are sometimes not correctly linked.
-
reporter Thanks, yeah it turns out we didn’t have the Bio/SeqIO module installed. Hopefully that’ll fix it! I’ll report back once we’ve made the fix.
-James
-
reporter Hi Simon,
After 6.5 hr walltime, the program errored out again with a python ValueError:
Looks like everything is now set up, we will first clean up the input file, and then we will start the host prediction steps themselves [1/1/Skip] Skipping computation of blastn against microbial genomes... [1/3/Skip] Skipping blast parsing... [2/1/Skip] Skipping computation of blastn against CRISPR... [2/2/Skip] Skipping crispr parsing... [3/1/Skip] Skipping computation of WIsH scores... [3/2/Skip] Skipping WIsH parsing... [4/1/Skip] Skipping computation of VHM s2 similarities... [4/2/Skip] Skipping VHM parsing... [5/1/Skip] Skipping computation of PHP scores... [5/2/Skip] Skipping PHP parsing... [6/1/Run] Running RaFAH... [6/2/Run] Get relevant RaFAH scores... [6.5/1/Run] Running Diamond comparison to RaFAH references... [6.5/2/Run] Get AAI distance to RaFAH refs... [7] Aggregating all results and formatting for TensorFlow... [7/1] Loading all parsed data... [7/2] Loading corresponding host taxonomy... [7/3] Link matching genomes to representatives and filter out redundant / useless matches... Filtering blast data /users/PAS1573/riddell26/.local/lib/python3.8/site-packages/iphop/modules/dataprep.py:204: FutureWarning: The default value of numeric_only in DataFrameGroupBy.sum is deprecated. In a future version, numeric_only will default to False. Either specify numeric_only or select only columns which should be valid for the function. tmp = df.groupby(['Virus','Repr','Host']).sum().reset_index() ## For each genome, we take the sum of all hits /users/PAS1573/riddell26/.local/lib/python3.8/site-packages/iphop/modules/dataprep.py:215: FutureWarning: The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead. store_filtered = store_filtered.append(df) ## Append should work directly now that we have matched all column and names nicely Filtering crispr data /users/PAS1573/riddell26/.local/lib/python3.8/site-packages/iphop/modules/dataprep.py:247: FutureWarning: The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead. store_filtered = store_filtered.append(df) ## Append should work directly now that we have matched all column and names nicely Filtering wish data /users/PAS1573/riddell26/.local/lib/python3.8/site-packages/iphop/modules/dataprep.py:267: FutureWarning: The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead. store_filtered = store_filtered.append(df) ## Append should work directly now that we have matched all column and names nicely Filtering vhm data /users/PAS1573/riddell26/.local/lib/python3.8/site-packages/iphop/modules/dataprep.py:286: FutureWarning: The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead. store_filtered = store_filtered.append(df) ## Append should work directly now that we have matched all column and names nicely Filtering PHP data /users/PAS1573/riddell26/.local/lib/python3.8/site-packages/iphop/modules/dataprep.py:306: FutureWarning: The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead. store_filtered = store_filtered.append(df) ## Append should work directly now that we have matched all column and names nicely [7/4] Write the matrices for TensorFlow... Starting to built the matrices for TensorFlow Loading trees /users/PAS1573/riddell26/.local/lib/python3.8/site-packages/iphop/modules/dataprep.py:147: FutureWarning: In a future version of pandas, a length 1 tuple will be returned when iterating over a groupby with a grouper equal to a list of length 1. Don't supply a list with a single grouper to avoid this warning. for virus, df_all_hits in store_filtered.groupby(['Virus']): Processing data for virus STM_0716_E_M_CoA1_E069_E065_IS1_megahit_k121_1004995||full /users/PAS1573/riddell26/.local/lib/python3.8/site-packages/iphop/modules/dataprep.py:157: FutureWarning: In a future version of pandas, a length 1 tuple will be returned when iterating over a groupby with a grouper equal to a list of length 1. Don't supply a list with a single grouper to avoid this warning. for host_pivot, tmp in df_all_hits.groupby(['Repr']): Processing data for virus STM_0716_E_M_CoA1_E069_E065_IS1_megahit_k121_1011211||full /users/PAS1573/riddell26/.local/lib/python3.8/site-packages/iphop/modules/dataprep.py:157: FutureWarning: In a future version of pandas, a length 1 tuple will be returned when iterating over a groupby with a grouper equal to a list of length 1. Don't supply a list with a single grouper to avoid this warning. ... Processing data for virus STM_0716_E_M_E069_megahit_k121_999936||full /users/PAS1573/riddell26/.local/lib/python3.8/site-packages/iphop/modules/dataprep.py:157: FutureWarning: In a future version of pandas, a length 1 tuple will be returned when iterating over a groupby with a grouper equal to a list of length 1. Don't supply a list with a single grouper to avoid this warning. for host_pivot, tmp in df_all_hits.groupby(['Repr']): [7.5] Aggregating all results and formatting for RF... /users/PAS1573/riddell26/.local/lib/python3.8/site-packages/iphop/modules/dataprep_rf.py:112: FutureWarning: The default value of numeric_only in DataFrameGroupBy.sum is deprecated. In a future version, numeric_only will default to False. Either specify numeric_only or select only columns which should be valid for the function. tmp = df_blast.groupby(['Virus','Repr','Host']).sum().reset_index() ## For each Repr, we take the sum of all hits /users/PAS1573/riddell26/.local/lib/python3.8/site-packages/iphop/modules/dataprep_rf.py:125: FutureWarning: In a future version of pandas, a length 1 tuple will be returned when iterating over a groupby with a grouper equal to a list of length 1. Don't supply a list with a single grouper to avoid this warning. for virus, all_labels in df_labels.groupby(['Virus']): /users/PAS1573/riddell26/.local/lib/python3.8/site-packages/iphop/modules/dataprep_rf.py:149: FutureWarning: In a future version of pandas, a length 1 tuple will be returned when iterating over a groupby with a grouper equal to a list of length 1. Don't supply a list with a single grouper to avoid this warning. for obs, obs_info in all_labels.groupby(['Observation_n']): ### Welcome to iPHoP ### Traceback (most recent call last): File "/fs/project/PAS1117/modules/iPHoP/1.3.2/bin/iphop", line 10, in <module> sys.exit(cli()) File "/users/PAS1573/riddell26/.local/lib/python3.8/site-packages/iphop/iphop.py", line 128, in cli args["func"](args) File "/users/PAS1573/riddell26/.local/lib/python3.8/site-packages/iphop/modules/master_predict.py", line 104, in main dataprep_rf.aggregate_rf(args) File "/users/PAS1573/riddell26/.local/lib/python3.8/site-packages/iphop/modules/dataprep_rf.py", line 35, in aggregate_rf compute_matrices(df_blast,df_crispr,df_labels,args) File "/users/PAS1573/riddell26/.local/lib/python3.8/site-packages/iphop/modules/dataprep_rf.py", line 154, in compute_matrices selected_blast = selected_blast.sort_values(by = ["Dist","N match","Id %"], ascending = ["False","False","False"]) File "/users/PAS1573/riddell26/.local/lib/python3.8/site-packages/pandas/util/_decorators.py", line 331, in wrapper return func(*args, **kwargs) File "/users/PAS1573/riddell26/.local/lib/python3.8/site-packages/pandas/core/frame.py", line 6878, in sort_values ascending = validate_ascending(ascending) File "/users/PAS1573/riddell26/.local/lib/python3.8/site-packages/pandas/util/_validators.py", line 457, in validate_ascending return [validate_bool_kwarg(item, "ascending", **kwargs) for item in ascending] File "/users/PAS1573/riddell26/.local/lib/python3.8/site-packages/pandas/util/_validators.py", line 457, in <listcomp> return [validate_bool_kwarg(item, "ascending", **kwargs) for item in ascending] File "/users/PAS1573/riddell26/.local/lib/python3.8/site-packages/pandas/util/_validators.py", line 261, in validate_bool_kwarg raise ValueError( ValueError: For argument "ascending" expected type bool, received type str.
Seems to be happening in the validation step and is likely a python-related error I’m assuming? When I load our iphop module it gives me python version 3.8.15 with the following packages and versions (in case it is helpful to know)
Package Version ---------------------------- ------------ absl-py 1.4.0 aiohttp 3.8.5 aiosignal 1.3.1 astunparse 1.6.3 async-timeout 4.0.2 attrs 23.1.0 biopython 1.81 blinker 1.6.2 boltons 23.0.0 Bottleneck 1.3.7 Brotli 1.0.9 cached-property 1.5.2 cachetools 5.3.1 certifi 2023.7.22 cffi 1.15.1 charset-normalizer 3.2.0 click 8.1.4 colorama 0.4.6 conda 23.7.2 conda-package-handling 2.0.2 conda_package_streaming 0.8.0 cryptography 39.0.0 flatbuffers 2.0.7 frozenlist 1.4.0 gast 0.4.0 google-auth 2.22.0 google-auth-oauthlib 0.4.6 google-pasta 0.2.0 grpcio 1.56.0 h5py 3.9.0 idna 3.4 importlib-metadata 6.8.0 iphop 1.3.2 joblib 1.3.1 jsonpatch 1.32 jsonpointer 2.0 keras 2.7.0 Keras-Preprocessing 1.1.2 libclang 16.0.0 libmambapy 0.24.0 mamba 0.24.0 Markdown 3.4.3 MarkupSafe 2.1.3 multidict 6.0.4 numexpr 2.8.4 numpy 1.23.5 oauthlib 3.2.2 opt-einsum 3.3.0 packaging 23.1 pandas 1.5.3 pip 23.1.2 platformdirs 3.10.0 pluggy 1.2.0 pooch 1.7.0 protobuf 3.19.0 pyasn1 0.5.0 pyasn1-modules 0.3.0 pycosat 0.6.4 pycparser 2.21 PyJWT 2.8.0 pyOpenSSL 23.2.0 PySocks 1.7.1 python-dateutil 2.8.2 pytz 2023.3 pyu2f 0.1.5 requests 2.31.0 requests-oauthlib 1.3.1 rsa 4.9 ruamel.yaml 0.17.32 ruamel.yaml.clib 0.2.7 scikit-learn 0.22.2.post1 scipy 1.10.1 setuptools 68.0.0 six 1.16.0 tensorboard 2.11.2 tensorboard-data-server 0.6.1 tensorboard-plugin-wit 1.8.1 tensorflow 2.7.0 tensorflow-decision-forests 0.2.2 tensorflow-estimator 2.7.0 tensorflow-io-gcs-filesystem 0.32.0 termcolor 2.3.0 threadpoolctl 3.1.0 toolz 0.12.0 tqdm 4.65.0 typing_extensions 4.7.1 tzdata 2023.3 urllib3 2.0.3 Werkzeug 2.3.6 wheel 0.40.0 wrapt 1.15.0 wurlitzer 3.0.3 yarl 1.9.2 zipp 3.16.0 zstandard 0.19.0
Sincerely,
James -
repo owner Hi James,
I’m not sure what is happening, but unfortunately it seems like iPHoP is trying to get run on a more modern version of python than what it’s designed for (that’s all the warnings, plus the error). We can try to work around the current error (although there may be others that come up later, and it will require you tinkering with the python scripts, sorry).
Right now, to fix the current error, you would need to edit the file “/users/PAS1573/riddell26/.local/lib/python3.8/site-packages/iphop/modules/dataprep_rf.py“, and change l. 154: “ascending = ["False","False","False"]” into “ascending = [False,False,False]”.
Let me know how this goes (fingers crossed !)
PS: Note that another potential option may be to run from a container, like Docker, as it may help using the “expected” python version and then remove all these errors ? Just something to keep in mind.
-
reporter Hi Simon,
Reviving this again as I have been slowly working through it.
I am using iphop 1.3.3 and downloaded the Aug_2023_pub_rw database, but when running iphop, I get this error:
FileNotFoundError: [Errno 2] No such file or directory: '/fs/project/PAS1117/modules/sequence_dbs/iPHoP/Aug_2023_pub_rw/db_infos/gtdbtk.ar122.decorated.tree'
I checked my $DB and it contains gtdbtk.ar53.decorated.tree, but not the ar122 decorated tree.
Is this a simple fix by renaming ar52 to ar122, or is there another way to get ar122? I didn’t see it in the tar archive.
Sincerely,
James
-
repo owner Hi James,
Sorry yes, this is a known issue with the latest GTDB, you can simply copy “gtdbtk.ar122.decorated.tree” to “gtdbtk.ar53.decorated.tree” and it should work.
Best,
Simon
-
reporter Hi Simon,
I got the custom database to build! Here is a summary of the steps I took:
- Installed iPHoP 1.3.3, made sure i had the correct versions of python and all dependencies
- Installed iPHoP Aug_2023_pub_rw
- Rename Aug_2023_pub_rw/db_infos/gtdbtk.ar53.decorated.tree to gtdbtk.ar122.decorated.tree
- Rename Aug_2023_pub_rw/db/GTDBtk_and_newrepr_s2_mat.pkl to GTDBtkr202_and_newrepr_s2_mat.pkl
- Install gtdbtk-2.3.2 and use de_novo_wf to build gtdbtk directory for new genomes
- run iphop add_to_db
Thanks again for all the help debugging!
Sincerely,
James
-
repo owner Great, thanks for the update and for the step-by-step explanation !
-
repo owner - changed status to closed
- Log in to comment
Hi James,
Thanks for reporting and for also providing the solution :-) What is weird is that the bioconda release of iPHoP specifically requests “pandas 1.3.*” (https://github.com/bioconda/bioconda-recipes/blob/master/recipes/iphop/meta.yaml). Were you running a custom installation, or did you install via bioconda and the version requirement is not working as I expect them to be ?
Thanks,
Simon