Neighborlist database is huge, option not to save DB

Issue #26 duplicate
Zachary Ulissi created an issue

The neighborlists are quite fast to calculate (much faster than fingerprints), but take up a huge amount of space relative to the fingerprints. It gets to the many-GB size quite quickly.

Maybe there should be an option to just calculate those on the fly and not use a DB, or not save them to a database after?

Comments (4)

  1. Zachary Ulissi reporter

    The fingerprint-derivatives are even larger, but both are much larger than the fingerprint database. For example, a system with ~1000 images has: 65mb fingerprint file, 800mb neighborlist file, 4gb derivates file. Going to 50k images makes the neighborlist far too large to be loaded/unloaded each time (e.g. the first 5-10min of startup could be loading that file). fingerprint-derivatives can be avoided by turning off force training, but there's no way to disable saving of the neighborlists.

  2. Alireza Khorshidi

    I just looked back into a earlier instance of training. I saw for 1500 periodic images of 4 species with default cutoff 6.5 Ang, neighborlists is ~137MB, fingerprints is ~144MB, and fingerprint-derivatives is ~10GB. So you may either have a large cutoff or a dense system, so it might be fine to decrease your cutoff.

  3. Log in to comment