Accented or extended UTF-8 characters cause "Malformed input or input contains unmappable characters" error
Platform Notice: Data Center Only - This article only applies to Atlassian products on the Data Center platform.
Note that this KB was created for the Data Center version of the product. Data Center KBs for non-Data-Center-specific features may also work for Server versions of the product, however they have not been tested. Support for Server* products ended on February 15th 2024. If you are running a Server product, you can visit the Atlassian Server end of support announcement to review your migration options.
*Except Fisheye and Crucible
Summary
Problem
Extended UTF-8 or accented characters could cause unexpected behaviour in the Bitbucket Data Center. For example, a branch with these characters can cause unexpected behaviour and errors similar to the following one in the <Bitbucket-home>/mesh/log/atlassian-mesh.log
.
1
2
3
4
5
6
7
8
9
10
11
12
13
java.nio.file.InvalidPathException: Malformed input or input contains unmappable characters: <repo path>/1052/refs/heads/大家好
at java.base/sun.nio.fs.UnixPath.encode(UnixPath.java:145)
at java.base/sun.nio.fs.UnixPath.<init>(UnixPath.java:69)
at java.base/sun.nio.fs.UnixFileSystem.getPath(UnixFileSystem.java:280)
at java.base/java.io.File.toPath(File.java:2290)
at com.atlassian.stash.internal.scm.git.RawGitAgent.execute(RawGitAgent.java:437)
at com.atlassian.stash.internal.scm.git.RawGitAgent.execute(RawGitAgent.java:433)
at com.atlassian.stash.internal.scm.git.RawGitAgent.resolveBranch(RawGitAgent.java:585)
at com.atlassian.stash.internal.scm.git.RawGitAgent.resolveHead(RawGitAgent.java:222)
at com.atlassian.stash.internal.scm.git.DefaultGitCommandFactory$2.call(DefaultGitCommandFactory.java:297)
at com.atlassian.stash.internal.scm.git.DefaultGitCommandFactory$2.call(DefaultGitCommandFactory.java:293)
at com.atlassian.stash.internal.repository.DefaultRefService.getDefaultBranch(DefaultRefService.java:191)
...
Diagnosis
Environment
Bitbucket is hosted on Windows and MacOS is unaffected.
Impacts Bitbucket Server / Data Center 6.0+ installed on Linux servers:
Bitbucket application is running on Java 11 and above.
LANG
environment variable set to a non-utf8locale.OR
LC_CTYPE
environment variable set to a non-utf8locale.
Cause
Java 11 won't support setting sun.jnu.encoding to UTF-8 via the JVM argument to use UTF-8 for encoding file paths. It will silently ignore it and have no effect.
Solution
Update LANG to utf8:
If Bitbucket is running as service set
LANG="en_US.UTF-8"
in/etc/init.d/atlbitbucket
and will be honoured.Set LANG="en_US.UTF-8" in the environment of the user with which Bitbucket is started.
If this does not work, please check what's the value for
LC_CTYPE
environment variable - it should be en_US.UTF-8 as well.1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
$ env | grep LC_CTYPE # If you did not set this configuration explicitly, then this command will return nothing. LC_CTYPE=en_US.UTF-8 # locale # Use this command to check if all the locale settings are set to UTF-8 LANG=en_US.UTF-8 LANGUAGE= LC_CTYPE="en_US.UTF-8" LC_NUMERIC="en_US.UTF-8" LC_TIME="en_US.UTF-8" LC_COLLATE="en_US.UTF-8" LC_MONETARY="en_US.UTF-8" LC_MESSAGES="en_US.UTF-8" LC_PAPER="en_US.UTF-8" LC_NAME="en_US.UTF-8" LC_ADDRESS="en_US.UTF-8" LC_TELEPHONE="en_US.UTF-8" LC_MEASUREMENT="en_US.UTF-8" LC_IDENTIFICATION="en_US.UTF-8" LC_ALL=
Was this helpful?