in lib/celerylib/tasks.py, every call to len(cs.changed) and len(cs.added) is initiating a complete extraction into memory of that git commit, instead of just computing the length. Avoiding this can dramatically speed up statistics computation time and reduce memory usage (in my repo, someone committed a large number of large files at several points).
e.g. i tested this by adding an additional method and changing the call sites:
@LazyProperty def len_added(self): if not self.parents: return len(list(self._get_file_nodes())) return len(self._get_paths_for_status('added'))
presumably added() could return instead return a lazy AddedFileNodesGenerator object, I was just not certain of this