Content of larger Office files not searchable

Platform Notice: Data Center Only - This article only applies to Atlassian apps on the Data Center platform.

Note that this KB was created for the Data Center version of the product. Data Center KBs for non-Data-Center-specific features may also work for Server versions of the product, however they have not been tested. Support for Server* products ended on February 15th 2024. If you are running a Server product, you can visit the Atlassian Server end of support announcement to review your migration options.

*Except Fisheye and Crucible

Problem

For Office documents (*.xlsx, *.pptx, *.docx) that exceed certain thresholds, Confluence can't extract text and make it available in searches. Some of the attachment size limitations for indexing are outlined here :

Configuring Attachment Size

As per the knowledge base article linked above, the process of extracting attachment content for indexing is memory intensive and can cause out of memory errors when large files are uploaded. The size limit here is a safeguard built into Confluence to prevent this happening.

Diagnosis

Place the below classes in DEBUG

com.atlassian.confluence.internal.index.attachment
com.atlassian.confluence.internal.index
com.atlassian.confluence.search.lucene
com.atlassian.bonnie.search.extractor

After reproducing the issue, we see below entries where Confluence is complaining about Document being too big for text extraction which explains why the contents of these files are not searchable.

2021-03-06 18:25:56,021 DEBUG [attachment-text-extraction-worker-1] [internal.index.attachment.DefaultAttachmentExtractedTextManager] getContent Can't read extracted text of attachment 884741
2021-03-06 18:25:56,055 WARN [attachment-text-extraction-worker-1] [confluence.impl.hibernate.ConfluenceHibernateTransactionManager] doRollback Performing rollback. Transactions:
  ->[com.atlassian.confluence.internal.index.attachment.AttachmentTextExtractionFunction.apply]: PROPAGATION_REQUIRES_NEW,ISOLATION_DEFAULT (Session #674982162)
 -- referer: http://localhost:8090/pages/resumedraft.action?draftId=884737&draftShareId=ca5fced8-89dd-4fff-8660-4aa3d2903ce3& | url: /rest/documentConversion/latest/conversion/thumbnail/results | traceId: e9c077ee7d66b117 | userName: admin
2021-03-06 18:25:56,056 DEBUG [Caesium-1-1] [search.lucene.extractor.AttachmentExtractedTextExtractor] addFields Error when extracting text for 884741
java.util.concurrent.CompletionException: java.lang.RuntimeException: com.atlassian.bonnie.search.extractor.ExtractorException: Error reading content of PowerPoint document: Document too big for text extraction, bailing out
    at java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:273)
    at java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:280)
    at java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1606)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.RuntimeException: com.atlassian.bonnie.search.extractor.ExtractorException: Error reading content of PowerPoint document: Document too big for text extraction, bailing out
    at com.atlassian.confluence.extra.officeconnector.index.AbstractAttachmentExtractor.extract(AbstractAttachmentExtractor.java:33)
    at com.atlassian.confluence.internal.index.attachment.DelegatingAttachmentTextExtractor.lambda$extract$1(DelegatingAttachmentTextExtractor.java:35)
    at java.util.Optional.flatMap(Optional.java:241)
    at com.atlassian.confluence.internal.index.attachment.DelegatingAttachmentTextExtractor.extract(DelegatingAttachmentTextExtractor.java:35)
    at com.atlassian.confluence.internal.index.attachment.AttachmentTextExtractionFunction.apply(AttachmentTextExtractionFunction.java:70)
    at com.atlassian.confluence.internal.index.attachment.AttachmentTextExtractionFunction.apply(AttachmentTextExtractionFunction.java:22)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.springframework.aop.support.AopUtils.invokeJoinpointUsingReflection(AopUtils.java:343)
    at org.springframework.aop.framework.ReflectiveMethodInvocation.invokeJoinpoint(ReflectiveMethodInvocation.java:198)
    at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:163)
    at org.springframework.transaction.interceptor.TransactionAspectSupport.invokeWithinTransaction(TransactionAspectSupport.java:295)
    at org.springframework.transaction.interceptor.TransactionInterceptor.invoke(TransactionInterceptor.java:98)

We also have a known issue : CONFSERVER-58824 - Failed to extract index from Word doc file smaller than 16MB.

Solution

You can override the thresholds by setting a Java sysprop and restarting your instance.

⚠️ Note: We have used the value 6mb as an example, be sure to test this in your lower instance prior to rolling this onto Production. Also, please be mindful when increasing the size here, as shared earlier in this comment, text extraction is a powerful operation and resource intensive.

Extension	System prop key	Example
.docx	`officeconnector.textextract.word.docxmaxsize`	`-Dofficeconnector.textextract.word.docxmaxsize=6052413`
.xlsx	`officeconnector.excel.extractor.maxlength`	`-Dofficeconnector.excel.extractor.maxlength=6052413`
.pptx	`officeconnector.powerpoint.extractor.maxlength`	`-Dofficeconnector.powerpoint.extractor.maxlength=6052413`

Updated on September 25, 2025

Was this helpful?

It wasn't accurateIt wasn't clearIt wasn't relevant

Atlassian Support

Content of larger Office files not searchable

Problem

Diagnosis

Solution

Still need help?