Periodic index snapshot job and full re-index running at the same time breaks both

Platform Notice: Data Center Only - This article only applies to Atlassian products on the Data Center platform.

Note that this KB was created for the Data Center version of the product. Data Center KBs for non-Data-Center-specific features may also work for Server versions of the product, however they have not been tested. Support for Server* products ended on February 15th 2024. If you are running a Server product, you can visit the Atlassian Server end of support announcement to review your migration options.

*Except Fisheye and Crucible

Summary

If the periodic index snapshot job runs when a full foreground re-index is started, both can fail as their actions are incompatible and both need to aquire a lock on the index. It will fail at the very beginning of the full re-index job, during the first minute.

Environment

Jira Data Center. Verified in at least 8.20.10 and 9.12.11.

Diagnosis

If that happens you will see both the below messages in the log. (If you only see one, then the cause is something else.)

2024-07-15 20:01:11,826-0500 JiraTaskExecutionThread-1 INFO testuser [c.a.j.r.v2.index.ReindexResource] Re-indexing finished

2024-07-15 20:01:11,827-0500 JiraTaskExecutionThread-1 ERROR testuser [c.a.jira.task.TaskManagerImpl] Task 'Jira Indexing' failed.

and

2024-07-15 20:01:04,334-0500 Caesium-1-4 ERROR anonymous JIRA Index Snapshot Service [c.a.j.index.ha.SnapshotDeletionPolicyContributionStrategy] Exception thrown during closure of commit point in changes, proceeding...

com.atlassian.jira.util.RuntimeIOException: org.apache.lucene.store.LockObtainFailedException: Lock held by this virtual machine: /local/jira-home/caches/indexesV2/changes/write.lock

...

2024-07-15 20:01:04,340-0500 Caesium-1-4 ERROR anonymous JIRA Index Snapshot Service [c.a.jira.service.ServiceRunner] An error occurred while trying to run service 'JIRA Index Snapshot Service'. java.io.FileNotFoundException: /local/jira-home/caches/indexesV2/comments/_1gfz5_Lucene70_0.dvd (No such file or directory)

com.atlassian.jira.util.RuntimeIOException: java.io.FileNotFoundException: /local/jira-home/caches/indexesV2/comments/_1gfz5_Lucene70_0.dvd (No such file or directory)

at com.atlassian.jira.index.ha.backup.CloseableBackupBuilder.addToBackup(CloseableBackupBuilder.java:72)

...

at com.atlassian.jira.index.ha.IndexSnapshotService.run(IndexSnapshotService.java:39)

...

at com.atlassian.scheduler.caesium.impl.SchedulerQueueWorker.executeJob(SchedulerQueueWorker.java:66)

...

Cause

This is caused by two jobs being incompatible with each other:

The index snapshot job

A default scheduled job is creating an index snapshot every day at 2 am. This creates a zipped copy of the current state of the index and is stored on the filesystem to facilitate fast indexing recovery in case it's needed. (see Backup and recovering your index in https://confluence.atlassian.com/adminjiraserver/search-indexing-938847710.html)

While accessing the index document files an write lock is placed on that file to prevent something from writing to this document file while it is being taken a snapshot of.

Full foreground re-index

When a foreground full re-indexing is performed, the existing index is deleted and a new one is built from scratch. To prevent anything from writing to the index while this is happening, a lock is put on the index.

When the index snapshot job begins it first gets a list of the files to copy. If the snapshot job has not finished by the time the full re-index begins and it manages to get hold of the index lock to continue, it then tries to copy files that have now been deleted by the full re-index job and fails.

If a full re-index is started while the index snapshot job is running, there is a possibility that the snapshot job grabs the index write lock during a very short period of time when it had to be released by the full re-index job.

The result is:

  • The snapshot job tries to copy files that have now been deleted by the full re-index job and fails

  • The full re-index job can not get the index lock held by the snapshot job and fails

  • You may end up with an orphaned index write lock file at [JIRA-HOME-DIR]/caches/indexesV2/changes/write.lock

This is a bug, JRASERVER-74524. At the time of writing (July, 2024) there is not yet a fix for it.

Solution

If you because of this end up with a broken index, for a quick fix you can stop that node, delete the files under '/local/jira-home/caches/indexesV2/' on the same node and restart it. It will then rebuild the index on that from the latest snapshot from another node.

Workaround

Avoid to run a foreground full re-index when the index snapshot job runs at 2 am daily.

Updated on April 2, 2025

Still need help?

The Atlassian Community is here for you.