com.atlassian.bonnie.search.extractor.ExtractorException: Error reading content of Excel document
Platform Notice: Data Center Only - This article only applies to Atlassian products on the Data Center platform.
Note that this KB was created for the Data Center version of the product. Data Center KBs for non-Data-Center-specific features may also work for Server versions of the product, however they have not been tested. Support for Server* products ended on February 15th 2024. If you are running a Server product, you can visit the Atlassian Server end of support announcement to review your migration options.
*Except Fisheye and Crucible
Symptoms
Warnings in the confluence log when indexing/reindexing a csv file attachment.
1
2
3
4
5
WARN [Indexer: 1] [bonnie.search.extractor.BaseAttachmentContentExtractor] addFields Error indexing attachment (Attachment: meta_mailinfo_sec01.csv v.1 (59509056) g6922)
com.atlassian.bonnie.search.extractor.ExtractorException: Error reading content of Excel document: Invalid header signature; read 0x6D453B74726F6853, expected 0xE11AB1A1E011CFD0
at com.atlassian.confluence.extra.officeconnector.index.excel.ExcelTextExtractor.extractText(ExcelTextExtractor.java:103)
at com.atlassian.bonnie.search.extractor.BaseAttachmentContentExtractor.addFields(BaseAttachmentContentExtractor.java:40)
...
Cause
In Confluence versions before 3.5.11, when a user uploaded a csv file, it was marked as having Content-Type application/vnd.ms-excel
, so the ExcelTextExtractor
is used to index it. Since it's not a Excel file, ExcelTextExtractor
cannot index it and will log an warning message.
This issue has been fixed in Confluence since version 3.5.11 so when user uploads new CSV files, those warning messages will not appear any longer. But for those old CSV files which still have the incorrect Content-type, when Confluence performs a re-index those warning messages still occur.
Resolution
Atlassian Support Offerings
The following approaches that involves SQL queries are beyond Atlassian Support Offerings. Please note that Atlassian does not support direct database INSERT, UPDATE or DELETE queries, as they can easily lead to data integrity problems. Atlassian will not be held liable for any errors or other unexpected events resulting from the use of the following SQL queries.
Backup your Database
Always backup your data before performing any modifications to the database.
Use the SQL script below (may need slight adjustment depending on the syntax of your DBMS) to correct the Content-Type of old csv files
1 2 3 4 5 6 7 8 9 10 11
UPDATE CONTENTPROPERTIES SET STRINGVAL = 'text/csv' WHERE PROPERTYNAME = 'MEDIA_TYPE' AND PROPERTYID IN ( SELECT PROPERTYID FROM CONTENTPROPERTIES WHERE CONTENTID IN ( SELECT CONTENTID FROM CONTENT WHERE CONTENTTYPE = 'ATTACHMENT' AND (TITLE LIKE '%.csv' OR TITLE LIKE '%.CSV') ) );
Was this helpful?