Accented or extended UTF-8 characters cause "Malformed input or input contains unmappable characters" error

Platform Notice: Data Center Only - This article only applies to Atlassian products on the Data Center platform.

Note that this KB was created for the Data Center version of the product. Data Center KBs for non-Data-Center-specific features may also work for Server versions of the product, however they have not been tested. Support for Server* products ended on February 15th 2024. If you are running a Server product, you can visit the Atlassian Server end of support announcement to review your migration options.

*Except Fisheye and Crucible

Summary

Problem

Extended UTF-8 or accented characters could cause unexpected behaviour in the Bitbucket Data Center. For example, a branch with these characters can cause unexpected behaviour and errors similar to the following one in the <Bitbucket-home>/mesh/log/atlassian-mesh.log .

1 2 3 4 5 6 7 8 9 10 11 12 13 java.nio.file.InvalidPathException: Malformed input or input contains unmappable characters: <repo path>/1052/refs/heads/大家好 at java.base/sun.nio.fs.UnixPath.encode(UnixPath.java:145) at java.base/sun.nio.fs.UnixPath.<init>(UnixPath.java:69) at java.base/sun.nio.fs.UnixFileSystem.getPath(UnixFileSystem.java:280) at java.base/java.io.File.toPath(File.java:2290) at com.atlassian.stash.internal.scm.git.RawGitAgent.execute(RawGitAgent.java:437) at com.atlassian.stash.internal.scm.git.RawGitAgent.execute(RawGitAgent.java:433) at com.atlassian.stash.internal.scm.git.RawGitAgent.resolveBranch(RawGitAgent.java:585) at com.atlassian.stash.internal.scm.git.RawGitAgent.resolveHead(RawGitAgent.java:222) at com.atlassian.stash.internal.scm.git.DefaultGitCommandFactory$2.call(DefaultGitCommandFactory.java:297) at com.atlassian.stash.internal.scm.git.DefaultGitCommandFactory$2.call(DefaultGitCommandFactory.java:293) at com.atlassian.stash.internal.repository.DefaultRefService.getDefaultBranch(DefaultRefService.java:191) ...

Diagnosis

Environment

  • Bitbucket is hosted on Windows and MacOS is unaffected.

  • Impacts Bitbucket Server / Data Center 6.0+ installed on Linux servers:

    • Bitbucket application is running on Java 11 and above.

    • LANG environment variable set to a non-utf8locale.

      OR

      LC_CTYPE environment variable set to a non-utf8locale.

Cause

Java 11 won't support setting sun.jnu.encoding to UTF-8 via the JVM argument to use UTF-8 for encoding file paths. It will silently ignore it and have no effect.

Solution

  1. Update LANG to utf8:

    1. If Bitbucket is running as service set LANG="en_US.UTF-8" in /etc/init.d/atlbitbucket and will be honoured.

    2. Set LANG="en_US.UTF-8" in the environment of the user with which Bitbucket is started.

  2. If this does not work, please check what's the value for LC_CTYPE environment variable - it should be en_US.UTF-8 as well.

    1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 $ env | grep LC_CTYPE # If you did not set this configuration explicitly, then this command will return nothing. LC_CTYPE=en_US.UTF-8 # locale # Use this command to check if all the locale settings are set to UTF-8 LANG=en_US.UTF-8 LANGUAGE= LC_CTYPE="en_US.UTF-8" LC_NUMERIC="en_US.UTF-8" LC_TIME="en_US.UTF-8" LC_COLLATE="en_US.UTF-8" LC_MONETARY="en_US.UTF-8" LC_MESSAGES="en_US.UTF-8" LC_PAPER="en_US.UTF-8" LC_NAME="en_US.UTF-8" LC_ADDRESS="en_US.UTF-8" LC_TELEPHONE="en_US.UTF-8" LC_MEASUREMENT="en_US.UTF-8" LC_IDENTIFICATION="en_US.UTF-8" LC_ALL=
Updated on April 11, 2025

Still need help?

The Atlassian Community is here for you.