Non english characters, umlauts and diaeresis missing or appear as boxes in Confluence Data Center PDF export

Platform Notice: Data Center Only - This article only applies to Atlassian products on the Data Center platform.

Note that this KB was created for the Data Center version of the product. Data Center KBs for non-Data-Center-specific features may also work for Server versions of the product, however they have not been tested. Support for Server* products ended on February 15th 2024. If you are running a Server product, you can visit the Atlassian Server end of support announcement to review your migration options.

*Except Fisheye and Crucible

Summary

When exporting a Confluence page out to PDF, non-English characters, umlauts, and diaeresis might appear missing, as boxes, or even garbled in the PDF content.

Environment

Confluence Data Center installation on a Linux Server.

Diagnosis

  1. The following encoding are already set to UTF-8

    1. -Dsun.jnu.encoding=UTF-8

    2. -Dfile.encoding=UTF-8

  2. Database encoding set correctly.

  3. Issue does not persist when the PDF Conversion Sandbox process for Confluence Data Center is disabled.

    • -Dpdf.export.sandbox.disable=true

Cause

  1. The PDF conversion process in Confluence Data Center is controlled by a separate sandbox process, even on a DC instance running on a single node.

  2. In some cases, this process might not pick up the encoding that is set for Confluence, and we will need to manually parse these values.

Solution

  1. Add the following parameter to the setenv.sh file for each node and restart Confluence.

    • 1 CATALINA_OPTS="-Dconversion.sandbox.java.options=-Xmx512m,-Xss2m,-Dsun.jnu.encoding=UTF-8,-Dfile.encoding=UTF-8 ${CATALINA_OPTS}"

Updated on March 17, 2025

Still need help?

The Atlassian Community is here for you.