'Incorrect string value' error thrown when restoring XML backup in Confluence

Platform Notice: Data Center Only - This article only applies to Atlassian products on the Data Center platform.

Note that this KB was created for the Data Center version of the product. Data Center KBs for non-Data-Center-specific features may also work for Server versions of the product, however they have not been tested. Support for Server* products ended on February 15th 2024. If you are running a Server product, you can visit the Atlassian Server end of support announcement to review your migration options.

*Except Fisheye and Crucible

Problem

When attempting to restore an XML backup in Confluence, the process stops and an error is thrown.

The following appears in the atlassian-confluence.log

1 logExceptions Incorrect string value: '\xF0\x9F\x98\x80</...' for column 'BODY' at row 1

Or:

1 2 Caused by: java.sql.SQLException: Incorrect string value: '\xF0\x9F\x8D\xBA ...' for column 'BODY' at row 1 at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:998)

Or:

1 An invalid XML character (Unicode: 0xffff) was found in the CDATA section

Cause

The XML backup contains an invalid character, for example, a 4-byte Unicode character like above, that is not compatible with the database being used. The reason for this invalid characters is the bug in MySQL. However, there is an improvement request for Confluence to handle 4byte UTF-8 Characters gracefully:

There's also a Bug raised for Confluence in regards to the 0xFFFF error:

Workaround

Remove the invalid characters from the database:

  1. Download atlassian-xml-cleaner-0.1.jar from Removing invalid characters from XML backups

  2. Open a command prompt and locate the XML or ZIP backup file on your computer, ensuring that it is extracted if it's within a ZIP file. In this example, we will use entities.xml.

  3. Run the cleaner as shown:

    1 $ java -jar atlassian-xml-cleaner-0.1.jar entities.xml > entities-clean.xml

    ℹ️ Sometimes the invalid characters can also exist in the Plugin Data which located in the activeObjectsBackupRestoreProvider.pdata file that needs to be cleaned using the cleaner as well (file is included the zip file however sometimes not visible before extracting the zip file).

  4. This will create a copy of entities.xml as entities-clean.xml with the invalid characters removed.

  5. Copy the entities-clean.xml file into another directory, rename it back to entities.xml and create a new ZIP with the entities.xml file.

🚩Note: The zip file should contain the Attachments folder as well. If this folder is not included the attachments will show up as broken links or broken images.

Import the new ZIP file

If you are on a Linux server, the commands below will do the trick:

1 2 3 4 5 6 7 8 9 10 11 12 13 14 # this will recreate entities.xml in the current directory: unzip <path>/Confluence-backup.zip entities.xml # fix the entities file, saving its output to a new file (entities-clean.xml): java -jar atlassian-xml-cleaner-0.1.jar entities.xml > entities-clean.xml # rename the original entities file mv entities.xml entities-original.xml # rename the fixed entities file to the expected name mv entities-clean.xml entities.xml # update the zip file with the new entities.xml file zip -u <path>/Confluence-backup.zip entities.xml

ℹ️ For reference:

If you are seeing an error specifically with 0xffff as the affected character, please use this perl command to fix the file:

1 perl -i -pe 's/\xef\xbf\xbf//g' entities.xml

And if experiencing the error with 0xfffe, use the below perl command:

1 perl -i -pe 's/\xef\xbf\xbe//g' entities.xml

And in case you are running Windows and the above Perl command doesn't work, here's a Power Shell script to fix the problem:

1 2 3 $yourfile = "PATH_TO_THE_XML\entities.xml" $outputfile = "PATH_TO_SAVE_NEW_XML\entities_clean.xml" get-content -path $yourfile | out-file $outputfile -encoding utf8
Updated on April 24, 2025

Still need help?

The Atlassian Community is here for you.