Troubleshoot False Positives in Jira DC Cluster Cache Replication Health Check
Platform Notice: Data Center Only - This article only applies to Atlassian products on the Data Center platform.
Note that this KB was created for the Data Center version of the product. Data Center KBs for non-Data-Center-specific features may also work for Server versions of the product, however they have not been tested. Support for Server* products ended on February 15th 2024. If you are running a Server product, you can visit the Atlassian Server end of support announcement to review your migration options.
*Except Fisheye and Crucible
Summary
In a multi-node cluster setup, one node periodically reports that the other nodes are not replicating, resulting in Health Check failure messages for Cluster Cache Replication.
Environment
Any Jira Data Center version 8.x, 9.x or 10.x
Diagnosis
The
atlassian-jira.log
file of at least one of the nodes will contain messages like:2025-03-28 12:40:12,749+0100 HealthCheck:thread-2 WARN [c.a.t.j.healthcheck.cluster.ClusterReplicationHealthCheck] Node <nodename> does not seem to replicate its cache 2025-03-28 12:40:12,758+0100 Caesium-1-3 ERROR ServiceRunner [c.a.t.healthcheck.concurrent.SupportHealthCheckProcess] Health check 'Cluster-Cache-Replication' failed with severity 'critical': 'The Node <nodename> is not replicating'
This message usually means that the node mentioned is not replicating information to the cluster—it exists in the database but not in the replication cache.
However, when the scheduled job runs for the Cache replication, it is completed successfully. As can be verified by entries in the
atlassian-jira.log
file, like:2025-03-28 12:40:30,554+0100 localq-stats-0 INFO [c.a.j.c.distribution.localq.LocalQCacheManager] [LOCALQ] [scheduled] Running cache replication queue stats for: 20 queues... 2025-03-28 12:40:30,555+0100 localq-stats-0 INFO [c.a.j.c.distribution.localq.LocalQCacheManager] [LOCALQ] [scheduled] ... done running cache replication queue stats for: 20 queues. 2025-03-28 14:25:30,554+0100 localq-stats-0 INFO [c.a.j.c.distribution.localq.LocalQCacheManager] [LOCALQ] [scheduled] Running cache replication queue stats for: 20 queues... 2025-03-28 14:25:30,555+0100 localq-stats-0 INFO [c.a.j.c.distribution.localq.LocalQCacheManager] [LOCALQ] [scheduled] ... done running cache replication queue stats for: 20 queues.
Check Cluster Cache Replication health check fails in Jira Data Centerto verify any valid causes, such as:
Check if the network connection between the nodes shows any issues.
Check if the name resolution for both Nodes works fine.
Check the cluster.properties file has correct entries for the
ehcache.listener.hostName.
Check if the
clusternode
contains only the IP address of the Nodes of the Cluster and nothing invalid.
Check the Clustering page in ⚙ > System, the application status of the "problematic node" might be empty.
Cause
This seems to be a cosmetic issue, as Cache Replication has completed successfully.
Solution
Verify the Atlassian Troubleshooting and Support Tools app version
Verify if the Atlassian Troubleshooting and Support Tools plugin is running on the latest version. If an Update is available, please install it.
Navigate to Administration ⚙ > Manage Apps > Manage Apps.
Search for Atlassian Troubleshooting and Support Tools.
Expand the entry.
If available, select Update to install the newer version.
Restart the Jira Instance
Run a rolling restart of the Jira nodes. After the restart, the health checks should stop complaining about this node.
Providing data to Atlassian Support
If a restart of the node did not resolve the issue, please reach out to Atlassian Support via this link. To help the Atlassian support team investigate the issue faster, please attach a support zip from each node to the ticket.
Was this helpful?