Pull requests

#61 Merged
Repository
shimizukawa/sphinx-multibyte-filename-fork sphinx-multibyte-filename-fork
Branch
default
Repository
birkenfeld/sphinx sphinx
Branch
default

support multibyte filename handling. (Issue #703)

Author
  1. Takayuki Shimizukawa avatarTakayuki Shimizukawa
Reviewers
Description

support multibyte filename handling. https://bitbucket.org/birkenfeld/sphinx/issue/703

concept:

  • Use Unicode for filename and filepath on Python2 (Python3 already): os moduce detect unicode and encode by sys.filesystemencoding() automatically.

changes:

  • sys.argv parameters (src, dest paths) convert into unicode.
  • side-effect avoidance: ZipFile arcname unicode detection are not supported at Python2.4, 2.5, then encode arcname by utf-8 explicitly at these python versions.
  • toctree refs hold reference name by unicode.
  • path class derived from unicode instead of str to holding path by unicode.
  • add a test and tested by Python-2.5, 2.6, 2.7, 3.1, 3.2, 3.3.

Comments (11)

      1. Jon Waltman

        I think this might need more testing as well. The changes are fairly significant and some unexpected unicode issues will likely arise.

        Having looked at it a bit more...

        Changing the path class seems like the wrong approach for testing. It doesn't involve the argv parameter handling and unicode conversion. What about creating a new document set using the normal functions and then calling sphinx-build in a subprocess or something?

        In cmdline.abspath(): bytes is undefined in python2.5

        On python3.1, lots of errors like this:

        ======================================================================
        ERROR: test_config.test_core_config
        ----------------------------------------------------------------------
        Traceback (most recent call last):
          File "/home/jon/sphinx-dev/sphinx/.tox/py31/lib/python3.1/site-packages/nose/case.py", line 198, in runTest
            self.test(*self.arg)
          File "/home/jon/sphinx-dev/sphinx/.tox/py31/tests/util.py", line 188, in deco
            app = TestApp(*args, **kwargs)
          File "/home/jon/sphinx-dev/sphinx/.tox/py31/tests/util.py", line 171, in __init__
            freshenv, warningiserror, tags)
          File "/home/jon/sphinx-dev/sphinx/.tox/py31/lib/python3.1/site-packages/sphinx/application.py", line 102, in __init__
            confoverrides or {}, self.tags)
          File "/home/jon/sphinx-dev/sphinx/.tox/py31/lib/python3.1/site-packages/sphinx/config.py", line 216, in __init__
            code = compile(source, config_file_enc, 'exec')
        TypeError: compile() argument 2 must be string, not bytes
        

        Was the original problem limited to the conf.py compile call and when creating the epub zip file?

        The filename is just used for tracebacks in compile right? Doesn't Sphinx prevent these from being shown anyway by catching syntax errors and raising a ConfigError with a simplified error message? Would it be a big issue to just "replace" the offending bytes?

        The config file's name "conf.py" is hard-coded in the Sphinx app and its not configurable. When it's being read and compiled, it's located in the current working directory. Couldn't you keep it all ascii by using compile(source, os.path.basename(config_file), "exec")?

        1. Takayuki Shimizukawa author

          Changing the path class seems like the wrong approach for testing. It doesn't involve the argv parameter handling and unicode conversion. What about creating a new document set using the normal functions and then calling sphinx-build in a subprocess or something?

          Ok, I'll trying.

          1. Takayuki Shimizukawa author

            I remembered that path class need to change to derive from unicode because sphinx.application.Application class need to receive unicoded *dir values. In this change, cmdline.py is doing that.

        2. Takayuki Shimizukawa author

          Was the original problem limited to the conf.py compile call and when creating the epub zip file?

          No.

          Original problem was the Sphinx can't recognize 'multibyte-directory-name' and 'multibyte-filename'. Current Sphinx limited the directory/file name of the document to ASCII. But many people want to write document/directory/filename by own language (multibyte in Japan).

          This patch change sphinx to hold all directory/file name by unicode object. Also sphinx hold toctree entries by unicode object for recognize multibyte name.

          "epub zip (before python-2.6)" and "conf.py cpmpile" ware affected by the change. They need ascii encoded directory/filename.

          The change impact is large because it is changing the way to hold the filename inside sphinx. Then I agree that must be tested by many people before the official release.

          1. Jon Waltman

            In cmdline.abspath(): bytes is undefined in python2.5

            bytes imported from sphinx.util.pycompat that implemented at 2 years ago.

            I got an exception somewhere from bytes... but your right, it wasn't an issue there. Sorry.

            Original problem was the Sphinx can't recognize 'multibyte-directory-name' and 'multibyte-filename'.

            Thats what I figured.

            But many people want to write document/directory/filename by own language (multibyte in Japan).

            As a English speaking ascii-centric American, I don't really have this issue... :)

            I agree this is important feature to get working though.

            I haven't ran into an issue yet actually using it so its off to a good start.

Tip: Filter by directory path e.g. /media app.js to search for public/media/app.js.
Tip: Use camelCasing e.g. ProjME to search for ProjectModifiedEvent.java.
Tip: Filter by extension type e.g. /repo .js to search for all .js files in the /repo directory.
Tip: Separate your search with spaces e.g. /ssh pom.xml to search for src/ssh/pom.xml.
Tip: Use ↑ and ↓ arrow keys to navigate and return to view the file.
Tip: You can also navigate files with Ctrl+j (next) and Ctrl+k (previous) and view the file with Ctrl+o.
Tip: You can also navigate files with Alt+j (next) and Alt+k (previous) and view the file with Alt+o.