git statistics do too much work making them slow

Issue #630 resolved
Jameson Nash
created an issue

in lib/celerylib/, every call to len(cs.changed) and len(cs.added) is initiating a complete extraction into memory of that git commit, instead of just computing the length. Avoiding this can dramatically speed up statistics computation time and reduce memory usage (in my repo, someone committed a large number of large files at several points).

e.g. i tested this by adding an additional method and changing the call sites:

    def len_added(self):
        if not self.parents:
           return len(list(self._get_file_nodes()))
        return len(self._get_paths_for_status('added'))

presumably added() could return instead return a lazy AddedFileNodesGenerator object, I was just not certain of this

Comments (2)

  1. Log in to comment