v0.4 network limited reading of fingerprint-derivatives?

Issue #92 on hold
Jacob Boes created an issue

I am experiencing a crippling slowdown waiting for fingerprint-derivatives.db to be read onto one of the nodes on our server. The fingerprint-derivatives.db file is approximately 29 GBs in size and the log file of one such job (running for about 5 days now) is attached for reference.

Interestingly this does not occur when running the same training process on my local machine. In that case, the fingerprint-derivatives.db file takes a few hours to read to memory.

So, I suspect the slowdown is due to traffic across the servers network (which is very large at the moment). I'm not sure what the best way to manage this is, but any advice would be appreciated. This is a critical issue for continuing onto larger training sets.

Comments (6)

  1. andrew_peterson repo owner

    I notice you are using v0.4, which we are not focusing our development efforts on. Have you tried this with v0.5? (Note that the database formats are not the same between the two, so you will have to start the script from scratch.) But v0.5 uses python's shelve format so it should be easy to debug with a simple script outside of Amp.

  2. Jacob Boes reporter

    I've moved my fingerprint files directly to the node for the time being, but I will attempt this with v0.5 on my next attempt. I'll keep you posted. Thanks for the feedback.

  3. andrew_peterson repo owner

    Oops, I see you had "v0.4" right in your issue title. Did moving it to the node speed it up successfully?

  4. andrew_peterson repo owner

    I just changed this to on hold while you monitor it.

    Data storage and loading is a difficult issue. If you have good ideas for fast ways to store and load the data, please let us know!

  5. Log in to comment