Anonymous avatar Anonymous committed eede7b7

diffcore-rename: cache file deltas

We find rename candidates by computing a fingerprint hash of
each file, and then comparing those fingerprints. There are
inherently O(n^2) comparisons, so it pays in CPU time to
hoist the (rather expensive) computation of the fingerprint
out of that loop (or to cache it once we have computed it once).

Previously, we didn't keep the filespec information around
because then we had the potential to consume a great deal of
memory. However, instead of keeping all of the filespec
data, we can instead just keep the fingerprint.

This patch implements and uses diff_free_filespec_data_large
to accomplish that goal. We also have to change
estimate_similarity not to needlessly repopulate the
filespec data when we already have the hash.

Practical tests showed 4.5x speedup for a 10% memory usage

Signed-off-by: Jeff King <>;
Signed-off-by: Junio C Hamano <>;

Comments (0)

Files changed (3)

 	return 0;
-void diff_free_filespec_data(struct diff_filespec *s)
+void diff_free_filespec_data_large(struct diff_filespec *s)
 	if (s->should_free)
 		s->should_free = s->should_munmap = 0;
 		s->data = NULL;
+void diff_free_filespec_data(struct diff_filespec *s)
+	diff_free_filespec_data_large(s);
 	s->cnt_data = NULL;


 	if (base_size * (MAX_SCORE-minimum_score) < delta_size * MAX_SCORE)
 		return 0;
-	if (diff_populate_filespec(src, 0) || diff_populate_filespec(dst, 0))
+	if ((!src->cnt_data && diff_populate_filespec(src, 0))
+		|| (!dst->cnt_data && diff_populate_filespec(dst, 0)))
 		return 0; /* error but caught downstream */
 			m->score = estimate_similarity(one, two,
 			m->name_score = basename_same(one, two);
-			diff_free_filespec_data(one);
+			diff_free_filespec_data_large(one);
 		/* We do not need the text anymore */
-		diff_free_filespec_data(two);
+		diff_free_filespec_data_large(two);
 	/* cost matrix sorted by most to least similar pair */
 extern int diff_populate_filespec(struct diff_filespec *, int);
 extern void diff_free_filespec_data(struct diff_filespec *);
+extern void diff_free_filespec_data_large(struct diff_filespec *);
 extern int diff_filespec_is_binary(struct diff_filespec *);
 struct diff_filepair {
Tip: Filter by directory path e.g. /media app.js to search for public/media/app.js.
Tip: Use camelCasing e.g. ProjME to search for
Tip: Filter by extension type e.g. /repo .js to search for all .js files in the /repo directory.
Tip: Separate your search with spaces e.g. /ssh pom.xml to search for src/ssh/pom.xml.
Tip: Use ↑ and ↓ arrow keys to navigate and return to view the file.
Tip: You can also navigate files with Ctrl+j (next) and Ctrl+k (previous) and view the file with Ctrl+o.
Tip: You can also navigate files with Alt+j (next) and Alt+k (previous) and view the file with Alt+o.