Disk space consumption

Issue #58 resolved
Former user created an issue

Hi, I'm running into storage problem while running kma in CCmetagen pipeline. I have 34GB of nanopore reads and I need to map them on database of around 150GB. I use 14TB SSD scratch and 1TB memory but I run into disk space error after 40~ min of computing. Is this expected behavior?

If this is not solvable, Is there any way how to safely join kma results if I run smaller jobs on several chucks of reads?

Thank you very much

Comments (5)

  1. ptlcc

    This is a known issue on large(r) datasets with large(r) databases. You should be able to avoid this by setting the “-tmp” option, and give it a suitable directory to store temporary files, a local SSD would be optimal.

    Best,
    Philip

  2. Marek Valt

    Thank you for you answer Philip, followup question, what is suitable -tmp path? I’m using cluster for computation but every option I gave it runs into “Invalid output directory specified.” error.

    Thank you very much

    Marek

  3. ptlcc

    Hi Marek

    Any directory, ie. the argument should end on a '/' and the directory should exist with rw permissions. If used without an argument it will use the outputfilename to create tmp-files.

    Best,
    Philip

  4. Marek Valt

    Hi Philip,
    everything works now, I was missing “/”.

    Thank you for your quick answers.

    Marek

  5. Log in to comment