Lexorank balance is very slow in Jira Data Center
Platform Notice: Data Center Only - This article only applies to Atlassian products on the Data Center platform.
Note that this KB was created for the Data Center version of the product. Data Center KBs for non-Data-Center-specific features may also work for Server versions of the product, however they have not been tested. Support for Server* products ended on February 15th 2024. If you are running a Server product, you can visit the Atlassian Server end of support announcement to review your migration options.
*Except Fisheye and Crucible
Summary
Lexorank re-balance is very slow or even stopped altogether, but there are no errors in the logs.
Environment
Jira Data Center with multiple nodes.
Diagnosis
Set DEBUG to the package
com.atlassian.greenhopper.service.lexorank
in the Logging an Profiling settings pageWait for a few minutes
Check for entries like the ones below in
atlassian-greenhopper.log
1
2
3
4
2022-06-21 22:29:06,641 Caesium-1-2 DEBUG ServiceRunner [service.lexorank.balance.LexoRankBalancingService] Index replication on node node1 is behind node node3 for 31 seconds. (Based on replicated operation id: 553857270)
2022-06-21 22:29:06,645 Caesium-1-2 DEBUG ServiceRunner [service.lexorank.balance.LexoRankBalancingService] Index replication on node node1 is behind node node2 for 23 seconds. (Based on replicated operation id: 553857469)
2022-06-21 22:29:06,645 Caesium-1-2 DEBUG ServiceRunner [service.lexorank.balance.LexoRankBalancingService] For at least one node index replication is behind current node for more than threshold=30 seconds. Balancing is terminating. It will resume once index replication lag for all nodes will be within a threshold.
2022-06-21 22:29:06,645 Caesium-1-2 DEBUG ServiceRunner [service.lexorank.balance.LexoRankBalancingService] Balancing has been backed off because there are some nodes that are lagging behind with index recovery
Cause
Jira has a mechanism that stops the Lexorank balance job when the index replication delay is higher than a value (30 seconds, by default).
This is designed to prevent severe index replication issues while the job is running, and was introduced when we fixed JSWSERVER-15703 - LexoRank Rebalance can cause index replication delays in JIRA Datacenter .
A side-effect is that in large environments, in which the index replication delay is consistently above 30s, the job is idle most of the time.
Solution
Increase the delay threshold, so the job is able to run when Jira is operating in normal conditions.
We need to add the JVM parameter with the desired value (60 seconds in the example below) and restart the nodes.
1
-Djira.agile.lexorank.balancing.backoff.threshold=60000
Alternatively, troubleshoot the index replication to reduce its delay across the nodes.
Was this helpful?