Cluster panic is triggered in Confluence Data Center when a node rejoins the cluster

Platform Notice: Data Center Only - This article only applies to Atlassian products on the Data Center platform.

Note that this KB was created for the Data Center version of the product. Data Center KBs for non-Data-Center-specific features may also work for Server versions of the product, however they have not been tested. Support for Server* products ended on February 15th 2024. If you are running a Server product, you can visit the Atlassian Server end of support announcement to review your migration options.

*Except Fisheye and Crucible

Problem

Cluster panic is triggered in Confluence Data Center when a node rejoins the cluster. There are no logs written to atlassian-confluence.log except a warning that Hazelcast is terminating forcefully:

When making the following actions:

  • All cluster nodes are in a cluster gracefully (e.g. Nodes 1, 2, and 3)

  • One node is taken out of the cluster by being shut down (e.g. node 3)

  • This leaves nodes 1 and 2 in the cluster. Once node 3 starts again and joins the cluster, nodes 1 and 2 go into panic mode

  • Hazelcast terminates forcefully in some/all nodes

Similar logging as the following appears in the atlassian-confluence.log:

1 2021-01-27 22:08:43,894 WARN [hz.ShutdownThread] [com.hazelcast.instance.Node] log [xxx.xxx.xxx.xxx]:5801 [confluenceCluster] [3.8.6] Terminating forcefully...

Cause

The nodes are having some issues communicating over multicast consistently. Network communication tools such as Omping show no communication errors while the nodes are in a cluster, but communication is broken down when Hazelcast terminates in one of the nodes.

Workaround

The workaround for Confluence Data Center versions 5.9 and above is to move from using multicast to unicast:

Confluence Data Center versions prior to 5.9 do not have the option to use unicast, so the workaround is not applicable. However, a similar issue has been addressed for versions 5.8.5 and above: CONFSERVER-39396 - Node rejoining cluster can cause cluster panic. Configure cluster safety cache to flush value on merge.

Updated on April 15, 2025

Still need help?

The Atlassian Community is here for you.