Cluster panic is triggered in Confluence Data Center when a node rejoins the cluster
Platform Notice: Data Center Only - This article only applies to Atlassian products on the Data Center platform.
Note that this KB was created for the Data Center version of the product. Data Center KBs for non-Data-Center-specific features may also work for Server versions of the product, however they have not been tested. Support for Server* products ended on February 15th 2024. If you are running a Server product, you can visit the Atlassian Server end of support announcement to review your migration options.
*Except Fisheye and Crucible
Problem
Cluster panic is triggered in Confluence Data Center when a node rejoins the cluster. There are no logs written to atlassian-confluence.log except a warning that Hazelcast is terminating forcefully:
When making the following actions:
All cluster nodes are in a cluster gracefully (e.g. Nodes 1, 2, and 3)
One node is taken out of the cluster by being shut down (e.g. node 3)
This leaves nodes 1 and 2 in the cluster. Once node 3 starts again and joins the cluster, nodes 1 and 2 go into panic mode
Hazelcast terminates forcefully in some/all nodes
Similar logging as the following appears in the atlassian-confluence.log:
1
2021-01-27 22:08:43,894 WARN [hz.ShutdownThread] [com.hazelcast.instance.Node] log [xxx.xxx.xxx.xxx]:5801 [confluenceCluster] [3.8.6] Terminating forcefully...
Cause
The nodes are having some issues communicating over multicast consistently. Network communication tools such as Omping show no communication errors while the nodes are in a cluster, but communication is broken down when Hazelcast terminates in one of the nodes.
Workaround
The workaround for Confluence Data Center versions 5.9 and above is to move from using multicast to unicast:
Confluence Data Center versions prior to 5.9 do not have the option to use unicast, so the workaround is not applicable. However, a similar issue has been addressed for versions 5.8.5 and above: CONFSERVER-39396 - Node rejoining cluster can cause cluster panic. Configure cluster safety cache to flush value on merge.
Was this helpful?