SAXException error when running content anonymizer for confluence

Platform Notice: Data Center Only - This article only applies to Atlassian products on the Data Center platform.

Note that this KB was created for the Data Center version of the product. Data Center KBs for non-Data-Center-specific features may also work for Server versions of the product, however they have not been tested. Support for Server* products ended on February 15th 2024. If you are running a Server product, you can visit the Atlassian Server end of support announcement to review your migration options.

*Except Fisheye and Crucible

Summary

Atlassian may request XML backup to troubleshoot bugs in Confluence. To protect the customer' data from leaking, the tool of Content Anonymizer can be used to clean backup data(entities.xml). However, some special characters may cause SAXException during cleaning.

For example, special character (code 55357: emoji of smiling face) caused below error.

1 2 3 $java -jar confluence-export-cleaner-1.1-jar-with-dependencies.jar entities.xml cleaned.xml 2021-04-14 21:40:12,157 INFO Starting to clean export file 'entities.xml'. This may take a few minutes. Exception in thread "main" java.lang.RuntimeException: org.xml.sax.SAXException: Cannot output character with code 55357 in the encoding UTF-8' within a CDATA section javax.xml.transform.TransformerException: Cannot output character with code 55357 in the encoding UTF-8' within a CDATA section

Cause

Anonymizer tool is not able to deal with special characters (like smiling face) included in the backup file (entities.xml) of confluence.

Solution

If the size of entities.xml is small, special characters can be removed via editor manually.

However, if the size is too large to edit directly, below method can be used.

1 java -jar atlassian-xml-cleaner-0.1.jar entities.xml > entities-clean.xml
  • Then running anonymizer tool to clean entities.xml.

Reference

The tool of cleaning special characters is originally used to for Jira, see detail at : Removing invalid characters from XML backups.

Updated on April 8, 2025

Still need help?

The Atlassian Community is here for you.