Ned Batchelder avatar Ned Batchelder committed af0593c

A utility to create Unicode ranges for classes of characters needed for full Javascript lexing. Not used yet.

Comments (0)

Files changed (1)

+import unicodedata
+
+ranges = {}
+
+classes = {
+    'letters': "Lu Ll Lt Lm Lo Nl",
+    'combining': "Mn Mc",
+    'digit': "Nd",
+    'connector': "Pc",
+    }
+
+cat_to_class = {}
+ranges = {}
+
+for klass, cats in classes.items():
+    for cat in cats.split():
+        cat_to_class[cat] = klass
+    ranges[klass] = []
+
+for i in range(0xFFFF):
+    cat = unicodedata.category(unichr(i))
+    try:
+        klass = cat_to_class[cat]
+    except KeyError:
+        continue
+    r = ranges[klass]
+    if r and r[-1][1] == i-1:
+        r[-1][1] = i
+    else:
+        r.append([i, i])
+
+for k, r in ranges.items():
+    reg = "["
+    for a,b in r:
+        if a == b:
+            reg += r"\u%04x" % a
+        else:
+            reg += r"\u%04x-\u%04x" % (a, b)
+    reg += "]"
+    print "%s = %s" % (k, reg)
+
Tip: Filter by directory path e.g. /media app.js to search for public/media/app.js.
Tip: Use camelCasing e.g. ProjME to search for ProjectModifiedEvent.java.
Tip: Filter by extension type e.g. /repo .js to search for all .js files in the /repo directory.
Tip: Separate your search with spaces e.g. /ssh pom.xml to search for src/ssh/pom.xml.
Tip: Use ↑ and ↓ arrow keys to navigate and return to view the file.
Tip: You can also navigate files with Ctrl+j (next) and Ctrl+k (previous) and view the file with Ctrl+o.
Tip: You can also navigate files with Alt+j (next) and Alt+k (previous) and view the file with Alt+o.