Commits

Angel Ezquerra  committed e8ada1a

filedata: add new binary detection heuristic, based on the max line length

Displaying files with very long lines is very expensive (Scintilla takes a long
time to display them). Most text files do not have very long lines. On the other
hand, binary files can have very long "lines" since they do not often contain
'\n' or '\r' characters.

This patch adds a new heuristic that tries to detect binary files that do not
contain any null character. The heuristic works by splitting the file data into
'lines' (using python's splitlines method) and checking if any of the lines is
longer than 100000 characters. If this is the case the file is quite likely a
binary file.

This may yield false positives, thus when this situation is detected the file
view will show the following message:

File may be binary (maximum line length exceeded)

It may be nice to also make the maximum line length configurable.

  • Participants
  • Parent commits d3cad1c
  • Branches stable

Comments (0)

Files changed (1)

File tortoisehg/hgqt/filedata.py

             self.error = hglib.tounicode(str(e))
 
     def checkMaxDiff(self, ctx, wfile, maxdiff, status):
+        self.error = None
         p = _('File or diffs not displayed: ')
         try:
             fctx = ctx.filectx(wfile)
             self.error = p + _('File is larger than the specified max size.\n'
                                'maxdiff = %s KB') % (maxdiff // 1024)
             return None
+
+        def exceedsMaxLineLength(data, maxlength=100000):
+            if len(data) < maxlength:
+                return False
+            for line in data.splitlines():
+                if len(line) > maxlength:
+                    return True
+            return False
+
         try:
             data = fctx.data()
             if '\0' in data or ctx.isStandin(wfile):
                 self.error = p + _('File is binary')
+            elif exceedsMaxLineLength(data):
+                self.error = p + \
+                    _('File may be binary (maximum line length exceeded)')
+            if self.error:
                 if status != 'A':
                     return None