Uploading PDFs containing different Unicode Prime symbols results in "Malformed input or input contains unmappable characters" errors

Platform Notice: Data Center Only - This article only applies to Atlassian products on the Data Center platform.

Note that this KB was created for the Data Center version of the product. Data Center KBs for non-Data-Center-specific features may also work for Server versions of the product, however they have not been tested. Support for Server* products ended on February 15th 2024. If you are running a Server product, you can visit the Atlassian Server end of support announcement to review your migration options.

*Except Fisheye and Crucible

Summary

Uploading PDFs containing different Unicode Prime symbols results in "Malformed input or input contains unmappable characters" errors

Environment

Java 17

Diagnosis

When attaching/uploading PDFs containing Unicode Prime symbols (e.g: ' `→) and saving the page, it shows a pink error rectangle with the following error message:

1 Error rendering macro 'view-file' Malformed input or input contains unmappable characters: <FILE-NAME>

Cause

Java is pulling the wrong encoding (ANSI_X3.4-1968) even with the LANG=en_US.UTF-8 setup:

1 2 3 4 5 6 7 8 <java-runtime-environment> <confluence.child-macro.max-depth>4</confluence.child-macro.max-depth> <java.specification.version>17</java.specification.version> <sun.jnu.encoding>ANSI_X3.4-1968</sun.jnu.encoding> ... <file.encoding>ANSI_X3.4-1968</file.encoding> ... <native.encoding>ANSI_X3.4-1968</native.encoding>

Background

💡Similar to Bitbucket KB: Accented or extended UTF-8 characters cause "Malformed input or input contains unmappable characters" error

ℹ️ To make the solution persistent, we apply the setup differently in Confluence to Java 17 by using the LC_ALL= variable. If we use the LANG= setup, Java will rollback the change soon after exporting the variable:

When we try to use the LANG=en_US.UTF-8 variable, Java seems to ignore the configuration so it doesn't work:

1 2 3 4 5 @HKGGFCQWPG java % export LANG=en_US.UTF-8 @HKGGFCQWPG java % java getcharset.java Default Charset: US-ASCII Default Charset by InputStreamReader: ASCII Default Charset: US-ASCII

✔️ On the other hand when using the LC_ALL=en_US.UTF-8 variable, we see Java persistently using the UTF-8 as required.

1 2 3 4 5 @HKGGFCQWPG java % export LC_ALL=en_US.UTF-8 @HKGGFCQWPG java % java getcharset.java Default Charset: UTF-8 Default Charset by InputStreamReader: UTF8 Default Charset: UTF-8

Solution

Setting up the LC_ALL=en_US.UTF-8 variable in setenv.sh file or in the user profile that runs Confluence. Restart the application.

Updated on April 24, 2025

Still need help?

The Atlassian Community is here for you.