Periodic index snapshot job and full re-index running at the same time breaks both
Platform Notice: Data Center Only - This article only applies to Atlassian products on the Data Center platform.
Note that this KB was created for the Data Center version of the product. Data Center KBs for non-Data-Center-specific features may also work for Server versions of the product, however they have not been tested. Support for Server* products ended on February 15th 2024. If you are running a Server product, you can visit the Atlassian Server end of support announcement to review your migration options.
*Except Fisheye and Crucible
Summary
If the periodic index snapshot job runs when a full foreground re-index is started, both can fail as their actions are incompatible and both need to aquire a lock on the index. It will fail at the very beginning of the full re-index job, during the first minute.
Environment
Jira Data Center. Verified in at least 8.20.10 and 9.12.11.
Diagnosis
If that happens you will see both the below messages in the log. (If you only see one, then the cause is something else.)
|
---|
and
|
---|
Cause
This is caused by two jobs being incompatible with each other:
The index snapshot job
A default scheduled job is creating an index snapshot every day at 2 am. This creates a zipped copy of the current state of the index and is stored on the filesystem to facilitate fast indexing recovery in case it's needed. (see Backup and recovering your index in https://confluence.atlassian.com/adminjiraserver/search-indexing-938847710.html)
While accessing the index document files an write lock is placed on that file to prevent something from writing to this document file while it is being taken a snapshot of.
Full foreground re-index
When a foreground full re-indexing is performed, the existing index is deleted and a new one is built from scratch. To prevent anything from writing to the index while this is happening, a lock is put on the index.
When the index snapshot job begins it first gets a list of the files to copy. If the snapshot job has not finished by the time the full re-index begins and it manages to get hold of the index lock to continue, it then tries to copy files that have now been deleted by the full re-index job and fails.
If a full re-index is started while the index snapshot job is running, there is a possibility that the snapshot job grabs the index write lock during a very short period of time when it had to be released by the full re-index job.
The result is:
The snapshot job tries to copy files that have now been deleted by the full re-index job and fails
The full re-index job can not get the index lock held by the snapshot job and fails
You may end up with an orphaned index write lock file at
[JIRA-HOME-DIR]/caches/indexesV2/changes/write.lock
This is a bug, JRASERVER-74524. At the time of writing (July, 2024) there is not yet a fix for it.
Solution
If you because of this end up with a broken index, for a quick fix you can stop that node, delete the files under '/local/jira-home/caches/indexesV2/'
on the same node and restart it. It will then rebuild the index on that from the latest snapshot from another node.
Workaround
Avoid to run a foreground full re-index when the index snapshot job runs at 2 am daily.
Was this helpful?