Content of larger Office files not searchable

Platform Notice: Data Center Only - This article only applies to Atlassian products on the Data Center platform.

Note that this KB was created for the Data Center version of the product. Data Center KBs for non-Data-Center-specific features may also work for Server versions of the product, however they have not been tested. Support for Server* products ended on February 15th 2024. If you are running a Server product, you can visit the Atlassian Server end of support announcement to review your migration options.

*Except Fisheye and Crucible

Problem

For Office documents (*.xlsx, *.pptx, *.docx) that exceed certain thresholds, Confluence can't extract text and make it available in searches. Some of the attachment size limitations for indexing are outlined here :

As per the knowledge base article linked above, the process of extracting attachment content for indexing is memory intensive and can cause out of memory errors when large files are uploaded. The size limit here is a safeguard built into Confluence to prevent this happening.

Diagnosis

  • Place the below classes in DEBUG

com.atlassian.confluence.internal.index.attachment com.atlassian.confluence.internal.index com.atlassian.confluence.search.lucene com.atlassian.bonnie.search.extractor
  • After reproducing the issue, we see below entries where Confluence is complaining about Document being too big for text extraction which explains why the contents of these files are not searchable.

2021-03-06 18:25:56,021 DEBUG [attachment-text-extraction-worker-1] [internal.index.attachment.DefaultAttachmentExtractedTextManager] getContent Can't read extracted text of attachment 884741 2021-03-06 18:25:56,055 WARN [attachment-text-extraction-worker-1] [confluence.impl.hibernate.ConfluenceHibernateTransactionManager] doRollback Performing rollback. Transactions:   ->[com.atlassian.confluence.internal.index.attachment.AttachmentTextExtractionFunction.apply]: PROPAGATION_REQUIRES_NEW,ISOLATION_DEFAULT (Session #674982162)  -- referer: http://localhost:8090/pages/resumedraft.action?draftId=884737&draftShareId=ca5fced8-89dd-4fff-8660-4aa3d2903ce3& | url: /rest/documentConversion/latest/conversion/thumbnail/results | traceId: e9c077ee7d66b117 | userName: admin 2021-03-06 18:25:56,056 DEBUG [Caesium-1-1] [search.lucene.extractor.AttachmentExtractedTextExtractor] addFields Error when extracting text for 884741 java.util.concurrent.CompletionException: java.lang.RuntimeException: com.atlassian.bonnie.search.extractor.ExtractorException: Error reading content of PowerPoint document: Document too big for text extraction, bailing out     at java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:273)     at java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:280)     at java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1606)     at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)     at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)     at java.lang.Thread.run(Thread.java:748) Caused by: java.lang.RuntimeException: com.atlassian.bonnie.search.extractor.ExtractorException: Error reading content of PowerPoint document: Document too big for text extraction, bailing out     at com.atlassian.confluence.extra.officeconnector.index.AbstractAttachmentExtractor.extract(AbstractAttachmentExtractor.java:33)     at com.atlassian.confluence.internal.index.attachment.DelegatingAttachmentTextExtractor.lambda$extract$1(DelegatingAttachmentTextExtractor.java:35)     at java.util.Optional.flatMap(Optional.java:241)     at com.atlassian.confluence.internal.index.attachment.DelegatingAttachmentTextExtractor.extract(DelegatingAttachmentTextExtractor.java:35)     at com.atlassian.confluence.internal.index.attachment.AttachmentTextExtractionFunction.apply(AttachmentTextExtractionFunction.java:70)     at com.atlassian.confluence.internal.index.attachment.AttachmentTextExtractionFunction.apply(AttachmentTextExtractionFunction.java:22)     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)     at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)     at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)     at java.lang.reflect.Method.invoke(Method.java:498)     at org.springframework.aop.support.AopUtils.invokeJoinpointUsingReflection(AopUtils.java:343)     at org.springframework.aop.framework.ReflectiveMethodInvocation.invokeJoinpoint(ReflectiveMethodInvocation.java:198)     at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:163)     at org.springframework.transaction.interceptor.TransactionAspectSupport.invokeWithinTransaction(TransactionAspectSupport.java:295)     at org.springframework.transaction.interceptor.TransactionInterceptor.invoke(TransactionInterceptor.java:98)

Solution

You can override the thresholds by setting a Java sysprop and restarting your instance.

⚠️ Note: We have used the value 6mb as an example, be sure to test this in your lower instance prior to rolling this onto Production. Also, please be mindful when increasing the size here, as shared earlier in this comment, text extraction is a powerful operation and resource intensive.

Extension

System prop key

Example

.docx

officeconnector.textextract.word.docxmaxsize

-Dofficeconnector.textextract.word.docxmaxsize=6052413

.xlsx

officeconnector.excel.extractor.maxlength

-Dofficeconnector.excel.extractor.maxlength=6052413

.pptx

officeconnector.powerpoint.extractor.maxlength

-Dofficeconnector.powerpoint.extractor.maxlength=6052413

Updated on May 22, 2025

Still need help?

The Atlassian Community is here for you.