Document Contents Are Not Searchable

Platform Notice: Data Center Only - This article only applies to Atlassian apps on the Data Center platform.

Note that this KB was created for the Data Center version of the product. Data Center KBs for non-Data-Center-specific features may also work for Server versions of the product, however they have not been tested. Support for Server* products ended on February 15th 2024. If you are running a Server product, you can visit the Atlassian Server end of support announcement to review your migration options.

*Except Fisheye and Crucible

Symptoms

Following errors are shown in the logs:

2012-06-29 14:41:00,327 WARN [scheduler_Worker-2] [bonnie.search.extractor.BaseAttachmentContentExtractor] addFields Error indexing attachment (Attachment: My_PDF_Examplem.pdf v.2 (8912924) admin)
com.atlassian.bonnie.search.extractor.ExtractorException: Error getting content of PDF document
        at com.atlassian.bonnie.search.extractor.PdfContentExtractor.extractText(PdfContentExtractor.java:66)
        at com.atlassian.bonnie.search.extractor.BaseAttachmentContentExtractor.addFields(BaseAttachmentContentExtractor.java:40)
        at com.atlassian.confluence.plugin.descriptor.ExtractorModuleDescriptor$BackwardsCompatibleExtractor.addFields(ExtractorModuleDescriptor.java:36)
        at com.atlassian.bonnie.search.BaseDocumentBuilder.getDocument(BaseDocumentBuilder.java:104)
        at com.atlassian.confluence.search.lucene.ConfluenceDocumentBuilder.getDocument(ConfluenceDocumentBuilder.java:97)
        at com.atlassian.confluence.search.lucene.tasks.AddDocumentIndexTask.perform(AddDocumentIndexTask.java:43)
...
Caused by: java.io.IOException: Error: Expected an integer type, actual=''
        at org.apache.pdfbox.pdfparser.BaseParser.readInt(BaseParser.java:1310)
        at org.apache.pdfbox.pdfparser.PDFObjectStreamParser.parse(PDFObjectStreamParser.java:81)
        at org.apache.pdfbox.cos.COSDocument.dereferenceObjectStreams(COSDocument.java:449)
        at org.apache.pdfbox.pdmodel.PDDocument.openProtection(PDDocument.java:1112)
        at org.apache.pdfbox.pdmodel.PDDocument.decrypt(PDDocument.java:591)
        at com.atlassian.bonnie.search.extractor.PdfContentExtractor.extractText(PdfContentExtractor.java:45)
        ... 30 more

Cause

Confluence is not able to index some attachments. The files in question may be corrupt or Confluence could be experiencing OOM problems during the indexing task.

Workaround

Disable indexing of attachments following the instructions in How to disable indexing of attachments. That will stop Confluence from indexing the content of the attachments, so the contents will no longer be visible in search. The title of the attachment however will still be indexed and searchable.
After the above is done, Rebuild the Content Indexes from scratch.

Updated on September 25, 2025

Was this helpful?

It wasn't accurateIt wasn't clearIt wasn't relevant

Atlassian Support

Document Contents Are Not Searchable

Symptoms

Cause

Workaround

Still need help?