Change Pipelines default image locale to C.UTF-8

Issue #13085 open
Sigurdur Birgisson
staff created an issue

To be able to handle unicode characters, I want the default build environment to use a better locale. Right now it uses POSIX, and using en_US.UTF-8 would help.

pipelines:
  branches:
    master:
      - step:
          script:
            - locale

gives

+ locale
LANG=
LANGUAGE=
LC_CTYPE="POSIX"
LC_NUMERIC="POSIX"
LC_TIME="POSIX"
LC_COLLATE="POSIX"
LC_MONETARY="POSIX"
LC_MESSAGES="POSIX"
LC_PAPER="POSIX"
LC_NAME="POSIX"
LC_ADDRESS="POSIX"
LC_TELEPHONE="POSIX"
LC_MEASUREMENT="POSIX"
LC_IDENTIFICATION="POSIX"
LC_ALL=

Workarounds

One workaround is to create your own docker image like this https://answers.atlassian.com/questions/39140980/how-do-i-create-a-docker-image-for-bitbucket-pipelines

Using a Docker image that can looks something like this:

FROM gcc #Some smart base image

# Whatever you need more than what is on the base image required by your project

# Set the locale
RUN locale-gen en_US.UTF-8  
ENV LANG en_US.UTF-8  
ENV LANGUAGE en_US:en  
ENV LC_ALL en_US.UTF-8     

Another even easier workaround is to set these environment variables in your build script or Pipelines settings.

Comments (6)

  1. Nick Coghlan

    I came to report the same problem, but my suggested resolution would be different: use the C.UTF-8 locale, as this scenario is exactly what it's for (i.e. properly handling UTF-8 encoded text without making any other locale specific assumptions)

  2. Matt Ryall staff
    • changed status to open

    We'd like to see additional information on specific problems that this causes. If you hit issues with locale/character encoding in your pipeline, please provide information here.

    Opening to consider for work in the future.

  3. Nick Coghlan

    For me, the main issue is the fact that having the legacy "C" locale configured tells tools like Python 3 that they should use ASCII to interface with the operating system for things like filesystem paths and environment variables. Armin Ronacher has a decent write-up of the problems this can cause in the click documentation: http://click.pocoo.org/5/python3/#python-3-surrogate-handling

    While I have some changes in the works to tell Python 3 to coerce the C locale to C.UTF-8 instead (as discussed at http://bugs.python.org/issue28180 ), a more immediate solution is for infrastructure providers to configure C.UTF-8 as their default rather than relying on components to either override or ignore the locale setting.

  4. Matt Ryall staff

    Thanks for your patience on this issue. We're planning a few updates to the default image in the coming months, so I'll see if we can get this one in as well (cc @Raul Gomis).

    Agree that setting LC_ALL=C.UTF-8 seems like the correct fix to use UTF-8 without selecting a specific language. We'll go with that if we do it.

    Also a reminder that there are two good workarounds for this issue in the meantime:

    • Use or build a proper Docker image for your build environment. Our default image doesn't get frequently updated, so if you want the latest tool versions, you should be using another image.
    • Alternatively, include LC_ALL (and others as needed) in your pipeline environment variables, either via Pipelines settings or as an export line in your build script.
  5. Log in to comment