Performance issues for large datasets

Issue #662 new
Broes De Cat created an issue

E.g., reachability instances of ASP 2013 do not fit in memory due to inefficient set<vector<>> storage of enumerated tables.

Comments (4)

  1. Willem Van Onsem

    Where are these memory structures located? Perhaps I can take a look at it. However I seriously doubt that collections yield such overhead. In C++ Set and Vector have a small memory overhead. Perhaps the elements themselves are stored inefficient (for instance booleans can be stored as bitvectors)

  2. Broes De Cat reporter

    For the concrete problem, the instances are rediculously large. => 200+ Mb of facts "edge(string,string)" strings are stored by pointer and tuples (here arity 2) are stored as a vector. So we are storing a set of millions of vectors of size 2. In that case, the additional storage required for each vector turns out to be too large.

    This concrete problem is located in structure/StructureComponents.hpp => EnumeratedInternalPredTable.

    Note: I'm not sure we really want to do something about this. 1) It is only one benchmark where we notice it and not one of our own applications. 2) solving it at the above location would probably not be sufficient in general.

  3. Willem Van Onsem

    Are the same strings stored by two pointers pointing to the same data, or are these two separate records?

    Java for instance uses a weak flyweight pattern to reduce memory overhead of strings. The code is executed each time a string is created (without the programmer noticing it).

    Using such pattern can yield increase of performance (due to more efficient use of cache and for instance fast string comparisons) and reduces memory overhead. The price is however paid when a string is created.

  4. Broes De Cat reporter

    We implement our own type of flyweight for strings, any 2 equal strings have the same pointer.

  5. Log in to comment