Lexorank balance is very slow in Jira Data Center

Platform Notice: Data Center Only - This article only applies to Atlassian products on the Data Center platform.

Note that this KB was created for the Data Center version of the product. Data Center KBs for non-Data-Center-specific features may also work for Server versions of the product, however they have not been tested. Support for Server* products ended on February 15th 2024. If you are running a Server product, you can visit the Atlassian Server end of support announcement to review your migration options.

*Except Fisheye and Crucible

Summary

Lexorank re-balance is very slow or even stopped altogether, but there are no errors in the logs.

Environment

Jira Data Center with multiple nodes.

Diagnosis

  • Set DEBUG to the package com.atlassian.greenhopper.service.lexorank in the Logging an Profiling settings page

  • Wait for a few minutes

  • Check for entries like the ones below in atlassian-greenhopper.log

1 2 3 4 2022-06-21 22:29:06,641 Caesium-1-2 DEBUG ServiceRunner [service.lexorank.balance.LexoRankBalancingService] Index replication on node node1 is behind node node3 for 31 seconds. (Based on replicated operation id: 553857270) 2022-06-21 22:29:06,645 Caesium-1-2 DEBUG ServiceRunner [service.lexorank.balance.LexoRankBalancingService] Index replication on node node1 is behind node node2 for 23 seconds. (Based on replicated operation id: 553857469) 2022-06-21 22:29:06,645 Caesium-1-2 DEBUG ServiceRunner [service.lexorank.balance.LexoRankBalancingService] For at least one node index replication is behind current node for more than threshold=30 seconds. Balancing is terminating. It will resume once index replication lag for all nodes will be within a threshold. 2022-06-21 22:29:06,645 Caesium-1-2 DEBUG ServiceRunner [service.lexorank.balance.LexoRankBalancingService] Balancing has been backed off because there are some nodes that are lagging behind with index recovery

Cause

Jira has a mechanism that stops the Lexorank balance job when the index replication delay is higher than a value (30 seconds, by default). 

This is designed to prevent severe index replication issues while the job is running, and was introduced when we fixed JSWSERVER-15703 - LexoRank Rebalance can cause index replication delays in JIRA Datacenter .

A side-effect is that in large environments, in which the index replication delay is consistently above 30s, the job is idle most of the time.

Solution

Increase the delay threshold, so the job is able to run when Jira is operating in normal conditions.

We need to add the JVM parameter with the desired value (60 seconds in the example below) and restart the nodes.

1 -Djira.agile.lexorank.balancing.backoff.threshold=60000

Alternatively, troubleshoot the index replication to reduce its delay across the nodes.

Updated on March 14, 2025

Still need help?

The Atlassian Community is here for you.