Anonymous avatar Anonymous committed baea9ec Draft

Skip test unicode-internal codec & add documentation
After much study, it is apparent that the unicode-internal codec exposes CPython
implementation unhelpfully. Our best course is to equate it to UTF-32BE, skipping
the test until we have one of those. Note Python v3.3 deprecates it anyway.

Comments (0)

Files changed (3)

Lib/test/test_codecs.py

         for uni, puny in punycode_testcases:
             self.assertEqual(uni, puny.decode("punycode"))
 
+@unittest.skipIf(test_support.is_jython, "FIXME: equates to UTF-32BE in Jython")
 class UnicodeInternalTest(unittest.TestCase):
     def test_bug1251300(self):
         # Decoding with unicode_internal used to not correctly handle "code
                 self.assertEqual(4, ex.start)
                 self.assertEqual(8, ex.end)
             else:
-                self.fail()
+                self.fail("UnicodeDecodeError not raised")
 
     def test_decode_callback(self):
         if sys.maxunicode > 0xffff:

src/org/python/core/codecs.java

 /*
- * Copyright 2000 Finn Bock
+ * Copyright (c)2012 Jython Developers Original Java version copyright 2000 Finn Bock
  *
  * This program contains material copyrighted by: Copyright (c) Corporation for National Research
  * Initiatives. Originally written by Marc-Andre Lemburg (mal@lemburg.com).
 import org.python.core.util.StringUtil;
 
 /**
- * Contains the implementation of the builtin codecs.
+ * This class implements the codec registry and utility methods supporting codecs, such as those
+ * providing the standard replacement strategies ("ignore", "backslashreplace", etc.). The _codecs
+ * module relies heavily on apparatus implemented here, and therefore so does the Python
+ * <code>codecs</code> module (in <code>Lib/codecs.py</code>). It corresponds approximately to
+ * CPython's <code>Python/codecs.c</code>.
+ * <p>
+ * The class also contains the inner methods of the standard Unicode codecs, available for
+ * transcoding of text at the Java level. These also are exposed through the <code>_codecs</code>
+ * module. In CPython, the implementation are found in <code>Objects/unicodeobject.c</code>.
  *
  * @since Jython 2.0
  */

src/org/python/modules/_codecs.java

 /*
- * Copyright 2000 Finn Bock
+ * Copyright (c)2012 Jython Developers Original Java version copyright 2000 Finn Bock
  *
  * This program contains material copyrighted by: Copyright (c) Corporation for National Research
  * Initiatives. Originally written by Marc-Andre Lemburg (mal@lemburg.com).
 import org.python.core.codecs;
 import org.python.expose.ExposedType;
 
+/**
+ * This class corresponds to the Python _codecs module, which in turn lends its functions to the
+ * codecs module (in Lib/codecs.py). It exposes the implementing functions of several codec families
+ * called out in the Python codecs library Lib/encodings/*.py, where it is usually claimed that they
+ * are bound "as C functions". Obviously, C stands for "compiled" in this context, rather than
+ * dependence on a particular implementation language. Actual transcoding methods often come from
+ * the related {@link codecs} class.
+ */
 public class _codecs {
 
     public static void register(PyObject search_function) {
     }
 
     private static PyTuple decode_tuple_str(String s, int len) {
-        // XXX should this be PyUnicode(s) ?
         return new PyTuple(new PyString(s), Py.newInteger(len));
     }
 
         return codecs.calcNewPosition(size, replacement) - 1;
     }
 
+    /* --- ascii Codec ---------------------------------------------- */
     public static PyTuple ascii_decode(String str) {
         return ascii_decode(str, null);
     }
     }
 
     /* --- UnicodeInternal Codec ------------------------------------------ */
+    // XXX Should deprecate unicode-internal codec and delegate to UTF-32BE (when we have one)
+    /*
+     * This codec is supposed to deal with an encoded form equal to the internal representation of
+     * the unicode object considered as bytes in memory. This was confusing in CPython as it varied
+     * with machine architecture (width and endian-ness). In Jython, the most compatible choice
+     * would be UTF-32BE since unicode objects report their length as if UCS-4 and
+     * sys.byteorder=='big'. The codec is deprecated in v3.3 as irrelevant, or impossible, in view
+     * of the flexible string representation (which Jython emulates in its own way).
+     *
+     * See http://mail.python.org/pipermail/python-dev/2011-November/114415.html
+     */
     public static PyTuple unicode_internal_encode(String str) {
         return unicode_internal_encode(str, null);
     }
Tip: Filter by directory path e.g. /media app.js to search for public/media/app.js.
Tip: Use camelCasing e.g. ProjME to search for ProjectModifiedEvent.java.
Tip: Filter by extension type e.g. /repo .js to search for all .js files in the /repo directory.
Tip: Separate your search with spaces e.g. /ssh pom.xml to search for src/ssh/pom.xml.
Tip: Use ↑ and ↓ arrow keys to navigate and return to view the file.
Tip: You can also navigate files with Ctrl+j (next) and Ctrl+k (previous) and view the file with Ctrl+o.
Tip: You can also navigate files with Alt+j (next) and Alt+k (previous) and view the file with Alt+o.