com.atlassian.bonnie.search.extractor.ExtractorException: Error reading content of Excel document

Platform Notice: Data Center Only - This article only applies to Atlassian products on the Data Center platform.

Note that this KB was created for the Data Center version of the product. Data Center KBs for non-Data-Center-specific features may also work for Server versions of the product, however they have not been tested. Support for Server* products ended on February 15th 2024. If you are running a Server product, you can visit the Atlassian Server end of support announcement to review your migration options.

*Except Fisheye and Crucible

Symptoms

Warnings in the confluence log when indexing/reindexing a csv file attachment.

1 2 3 4 5 WARN [Indexer: 1] [bonnie.search.extractor.BaseAttachmentContentExtractor] addFields Error indexing attachment (Attachment: meta_mailinfo_sec01.csv v.1 (59509056) g6922) com.atlassian.bonnie.search.extractor.ExtractorException: Error reading content of Excel document: Invalid header signature; read 0x6D453B74726F6853, expected 0xE11AB1A1E011CFD0 at com.atlassian.confluence.extra.officeconnector.index.excel.ExcelTextExtractor.extractText(ExcelTextExtractor.java:103) at com.atlassian.bonnie.search.extractor.BaseAttachmentContentExtractor.addFields(BaseAttachmentContentExtractor.java:40) ...

Cause

In Confluence versions before 3.5.11, when a user uploaded a csv file, it was marked as having Content-Type application/vnd.ms-excel, so the ExcelTextExtractor is used to index it. Since it's not a Excel file, ExcelTextExtractor cannot index it and will log an warning message.

This issue has been fixed in Confluence since version 3.5.11 so when user uploads new CSV files, those warning messages will not appear any longer. But for those old CSV files which still have the incorrect Content-type, when Confluence performs a re-index those warning messages still occur.

Resolution

Atlassian Support Offerings

The following approaches that involves SQL queries are beyond Atlassian Support Offerings. Please note that Atlassian does not support direct database INSERT, UPDATE or DELETE queries, as they can easily lead to data integrity problems. Atlassian will not be held liable for any errors or other unexpected events resulting from the use of the following SQL queries.

Backup your Database

Always backup your data before performing any modifications to the database.

  • Use the SQL script below (may need slight adjustment depending on the syntax of your DBMS) to correct the Content-Type of old csv files

    1 2 3 4 5 6 7 8 9 10 11 UPDATE CONTENTPROPERTIES SET STRINGVAL = 'text/csv' WHERE PROPERTYNAME = 'MEDIA_TYPE' AND PROPERTYID IN ( SELECT PROPERTYID FROM CONTENTPROPERTIES WHERE CONTENTID IN ( SELECT CONTENTID FROM CONTENT WHERE CONTENTTYPE = 'ATTACHMENT' AND (TITLE LIKE '%.csv' OR TITLE LIKE '%.CSV') ) );
Updated on April 8, 2025

Still need help?

The Atlassian Community is here for you.