Index Replication Jira Data Center Troubleshooting
Platform Notice: Data Center Only - This article only applies to Atlassian products on the Data Center platform.
Note that this KB was created for the Data Center version of the product. Data Center KBs for non-Data-Center-specific features may also work for Server versions of the product, however they have not been tested. Support for Server* products ended on February 15th 2024. If you are running a Server product, you can visit the Atlassian Server end of support announcement to review your migration options.
*Except Fisheye and Crucible
Summary
Jira keeps all the copies of the indexes up to date automatically. The synchronization aims for eventual consistency and is not synchronous, which means that there can be some delay before the index changes are seen on other nodes in the cluster. For instance, during busy hours or during issue/project import, a temporary gap in the index count between nodes is expected, though the gap should be closing over time. For details on how nodes keep the index in sync, see Keeping Lucene Index Synchronised.
Solution
Where indexes are stored
Local Home
Like Jira Server, the indexes are stored in the Jira Home or Jira Local Home which is local to the node. This will contain the indexes files. It is recommended that they be placed on the fastest available disk and within the range of “Excellent” and “OK” grade reference to Disk Access Speed.
Shared Home
Jira Data Center introduced Jira Shared Home, which is shared among the nodes in the same cluster and will contain the Index Snapshot ZIP files. This shared location can be stored on an NFS filesystem with sufficient read/write permission by the user running the Jira application on each node. This is configured in cluster.properties.
Common Problems
The following is the list of common problems found with Index Replications in the Jira Data Center.
Health-check failing on Index Replication
Inconsistency of search results between the nodes in the same cluster. Examples:
Issue "A" appears on Node-1 but not on Node-3
Searching "Project Taco" returns 100 issues on Node-1 but only 89 issues on Node-2
Gadget returning different results between nodes
Sending Information to Support
Please raise a case at https://getsupport.atlassian.com and provide the following information:
Clarify the behavior of “Common problems”.
Produce Thread Dumps with the command provided in Generating a thread dump.
1
for i in $(seq 6); do top -b -H -p $Jira_PROCESSID -n 1 > app_cpu_usage.`date +%s`.txt; kill -3 $Jira_PROCESSID; sleep 10; done
ℹ️ In the example above, you would replace $Jira_PROCESSID with the Process ID of Jira.
Generate Jira support zip of all nodes in the cluster.
Share the screenshot result of the Jira Data Center Health Check.
Share the output of http://<node-url>/rest/api/2/index/summary from each node.
Use the indexsummary script in the section below to gather the data.
indexsummary script
Update
nodeurl.config
and list the bypassed URL of each node.nocodeurl.conf
1 2 3 4
node1= node2= node3= node4=
Run
./index.sh -m
to gather Index summary data for each node.index.sh
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205
#!/usr/local/bin/bash #!/bin/bash # # A script to curl against IndexSummary REST API and parse the data to a readable format # # Author: Zulfadli Noor Sazali # Created: 2019-02-19 # Updated: 2019-08-19 # Version: 1.4 #----------- # # Pre-requisite: # - Command "index.sh -a" requires Bash 4+ # - JQ and CURL # #----------- # IMPORTANT !!! # Update the node URL in nodeurl.conf # Content example: # node1=http://localhost:47134/j7134 # node2=http://localhost:47135/j7134 # node3=http://localhost:47136/j7134 # node4= # # Once added, please save the file #----------- #---------- # Script starts #---------- ERR=0 MAN=0 ANALYZ=0 while getopts ":am" opt; do case $opt in a) if [ -z "${BASH_VERSINFO}" ] || [ -z "${BASH_VERSINFO[0]}" ] || [ ${BASH_VERSINFO[0]} -lt 4 ]; then echo "Option -a requires Bash version >= 4"; exit 1; fi echo "Analysing Index Summary data..." >&2 ANALYZ=1 ;; m) echo "Manual Index Summary gathering..." >&2 MAN=1 ;; \?) echo "Invalid option: -$OPTARG" >&2 exit 1 ;; esac done declare -A reportTime nodeId countInDatabase countInIndex countInArchive replicationQueues queueSize declare -a nodeurl declare f i #---------- # Check if JQ and CURL is installed #---------- if ! which curl >/dev/null; then echo -e " You do not have \"curl\" installed (or not found in PATH)."; ERR=1; fi if ! which jq >/dev/null; then echo -e " You do not have \"jq\" installed (or not found in PATH). Please run ./index.sh -m"; ERR=1; fi if [ "$ERR" = "1" ]; then exit 1; fi # Ask for Jira admin credentials if [ "$ANALYZ" != "1" ]; then echo "Jira Administrator login credentials:" read -p 'Username: ' uservar read -p 'Password: ' -s passvar echo fi #---------- # Check URL #---------- getNodeUrlfromConfig (){ while IFS="=" read -r value var; do if [ -z var ]; then echo "" else node+=("$var") fi done < nodeurl.conf nodeurlCheck } nodeurlCheck (){ # Checks for node[0] if its empty if [ -z "${node[0]}" ]; then echo "Please update the node URL for each nodes in the cluster by editing nodeurl.conf" exit elif [ "$ANALYZ" != "1" ]; then echo "The script will check the indexes count on below nodes:" for i in "${!node[@]}" ;do echo ${node[$i]}; done else echo "Starting analyzing" fi } #---------- # Generating Index summary #---------- indexStoreNoJq () { echo " " echo Time: `date '+%Y-%m-%d %H:%M:%S'` for i in "${!node[@]}" do content="$(curl -u $uservar:$passvar -s "${node[$i]}/rest/api/2/index/summary")" echo "$content" >> indexsummary.node$i.txt done } indexSumOnScreen () { for i in "${!node[@]}" do ## Run the curl against all nodes content="$(curl -u $uservar:$passvar -s "${node[$i]}/rest/api/2/index/summary")" echo "$content" > temp.out ## Store the data to var using jq nodeId=`cat temp.out | jq '.nodeId'` countInDatabase=`cat temp.out | jq '.issueIndex .countInDatabase'` countInIndex=`cat temp.out | jq '.issueIndex .countInIndex'` countInArchive=`cat temp.out | jq '.issueIndex .countInArchive'` replicationQueues=`cat temp.out | jq '.replicationQueues | keys_unsorted[]'` queueSize=`cat temp.out | jq '.replicationQueues | .[].queueSize'` ## Index Summary printf "%-10s %-20s %-20d %-20d %-10s %-10s\n" $nodeId $countInDatabase $countInIndex $countInArchive " " " " printf "%-10s %-20s %-20s %-20s %-10s %-10s\n" " " " " " " " " $replicationQueues printf "%-10s %-20s %-20s %-20s %-10s %-10s\n" " " " " " " " " $queueSize done } indexReadPerNode () { ### Printing information stored for i in "${!node[@]}" #file content do echo '-----' printf "%-10s %-30s %-20s %-20s %-20s %-20s\n" "NODE" "reportTime" "countInDatabase" "countInIndex" "countInArchive" "queueSize" for (( fn=0; fn<$f; fn++ )); do #file number count in repeatStart ## Index Summary printf "%-10s %-30s %-20d %-20d %-20d %-10s %-10s\n" ${nodeId[$fn,$i]} ${reportTime[$fn,$i]} ${countInDatabase[$fn,$i]} ${countInIndex[$fn,$i]} ${countInArchive[$fn,$i]} " " " " printf "%-10s %-30s %-20s %-20s %-20s %-10s %-10s\n" " " " " " " " " " " ${replicationQueues[$fn,$i]} printf "%-10s %-30s %-20s %-20s %-20s %-10s %-10s\n" " " " " " " " " " " ${queueSize[$fn,$i]} done done } storingIndexPerNode () { for i in "${!node[@]}" #file content do #### Storing data reportTime[$f,$i]=`jq '.reportTime' <<< "${node[$i]}"` nodeId[$f,$i]=`jq '.nodeId' <<< "${node[$i]}"` countInDatabase[$f,$i]=`jq '.issueIndex .countInDatabase' <<< "${node[$i]}"` countInIndex[$f,$i]=`jq '.issueIndex .countInIndex' <<< "${node[$i]}"` countInArchive[$f,$i]=`jq '.issueIndex .countInArchive' <<< "${node[$i]}"` replicationQueues[$f,$i]=`jq '.replicationQueues | keys_unsorted[]' <<< "${node[$i]}"` queueSize[$f,$i]=`jq '.replicationQueues | .[].queueSize' <<< "${node[$i]}"` done } #---------- # Run in sequence #---------- repeatStart () { if [ "$MAN" = "1" ]; then for i in $(seq 10); do indexStoreNoJq sleep 10; done elif [ "$ANALYZ" = "1" ]; then f=0 for filez in indexsummary.*.txt; do echo "Processing $filez" IFS=$'\n' read -d '' -r -a node < $filez storingIndexPerNode f=$((f+1)) done indexReadPerNode else ## loop for 10 times every 10 seconds for i in $(seq 10); do echo " " echo Time: `date '+%Y-%m-%d %H:%M:%S'` echo "-----" printf "%-10s %-20s %-20s %-20s %-20s\n" "NODE" "countInDatabase" "countInIndex" "countInArchive" "queueSize" indexSumOnScreen sleep 10; done fi } #---------- # Run Script #---------- getNodeUrlfromConfig repeatStart # cleanup temp rm -f temp.out
Run and display results on screen.
$ ./index.sh
Automatically retrieve and parse to a readable format on terminal.Run and store indexsummary information per node.
$ ./index.sh -m
Stores to indexsummary.node#.txt files.Run to parse the indexsummary data
$ ./index.sh -a
Share
indexsummary.node#.txt
files generated to support.
Index Summary REST Endpoint
The Index Summary endpoint /rest/api/2/index/summary helps to understand a more detailed index status for each node. It gives insight of where the particular node is against the database and the other nodes.
Index Summary gives the example of the JSON output and the details of what each value represents.
As a starting point, the following values are to be noted:
countInDatabase
andcountInIndex
to match, and continuously increasing to matchlastConsumedOperation
andlastOperationInQueue
to match, and continuously increasing to matchqueueSize
value not increasing drastically
Database Validation Endpoint
Index replication communication is done between the nodes and the database. The following checks are to see if the nodes are sending heartbeats to the database continuously.
Check
CLUSTERNODE
table if the node registered in the clusterACTIVE
node status has a recent timestampMany inactive nodes may cause a delay in Index replication due to
Check
CLUSTERNODEHEARTBEAT
table if the node responding to the heartbeat message
Index Replication on Re-index
A re-indexing, either foreground, background or project, will be performed on a node that it is being triggered on. Once it is completed, here is how it gets replicated across other nodes (based on Jira 7.13.1):
Foreground (locked) re-indexing
On completion of a foreground re-indexing, an Index Snapshot will be created and stored in the Jira Shared Home directory. The other node will be informed of the completion of the re-indexing process via the database replicatedindexoperation
table with operation
column value of FULL_REINDEX_END. They will copy the same Index Snapshot from the Shared Home to the node's Local Home and unpack the Lucene indexes.
A temporary directory called JiraIndexRestore is created in your node's local home. It contains your snapshot from the Shared Home while it is unpacked into your local indexes before being removed.
Background re-indexing
The same process is used for Background re-indexing. An Index Snapshot will be created and stored in the Jira Shared Home directory. The other node will be informed of the completion of the re-indexing process via the database replicatedindexoperation
table with operation
column value of BACKGROUND_REINDEX_END.
Project re-indexing
The project re-indexing process is different, as it will not create an Index Snapshot. The node where the Project re-indexing is triggered will inform other nodes that Project re-indexing needs to be done via the database replicatedindexoperation
table with operation
column value of PROJECT_REINDEX. Other nodes will then trigger their own local Project re-indexing operation.
Using Jira Statistics for troubleshooting
Since Jira 8.12, we can find periodic Jira stats logs for DBR in atlassian-jira.log
, which can be used to understand the replication operations.
Look for entries with [JIRA-STATS] [DBR]
in the logs.
More details about Jira stats on Troubleshooting performance with Jira Stats.
Also check Document-based replication in Jira Data Center for more details on DBR and the metrics in the logs.
Enable Additional Logging for Index Replication
To get detailed logging of what Jira is doing add a new logger to Jira’s Logging and Profiling page
Logging for Replication communication
level: DEBUG
package: com.atlassian.jira.index.ha.DefaultReplicatedIndexManager
Logging for Local indexing
level: DEBUG
package: com.atlassian.jira.issue.index.DefaultIndexManager
Logging for Scheduler for indexing needs
level: DEBUG
package: com.atlassian.jira.index.ha.DefaultNodeReindexService
Was this helpful?