Nodes in the Bitbucket Data Center abruptly leave the cluster

Platform Notice: Data Center Only - This article only applies to Atlassian products on the Data Center platform.

Note that this KB was created for the Data Center version of the product. Data Center KBs for non-Data-Center-specific features may also work for Server versions of the product, however they have not been tested. Support for Server* products ended on February 15th 2024. If you are running a Server product, you can visit the Atlassian Server end of support announcement to review your migration options.

*Except Fisheye and Crucible

Summary

After a certain time interval, nodes in the Bitbucket Data Center automatically leave the cluster.

Environment

Bitbucket Data Center 8.9.5

Diagnosis

  • Theatlassian-bitbucket.logwill have the following error stack trace on one of the nodes.

    1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 2024-01-25 14:32:36,133 WARN [hz.hazelcast.InvocationMonitorThread] c.h.s.i.o.impl.InvocationMonitor [172.XX.XX.7]:5701 [<Hazelcast-Group-Name>] [3.12.13] MonitorInvocationsTask delayed 93449 ms 2024-01-25 14:33:17,416 WARN [hz.hazelcast.cached.thread-4] c.a.s.i.h.OptimisticOutOfMemoryHandler OutOfMemoryError occurred attempting to continue operating as normal java.lang.OutOfMemoryError: Java heap space 2024-01-25 14:33:17,416 ERROR [httpclient-io:thread-1] o.a.h.i.n.c.InternalHttpAsyncClient I/O reactor terminated abnormally org.apache.http.nio.reactor.IOReactorException: I/O dispatch worker terminated abnormally at org.apache.http.impl.nio.reactor.AbstractMultiworkerIOReactor.execute(AbstractMultiworkerIOReactor.java:359) at org.apache.http.impl.nio.conn.PoolingNHttpClientConnectionManager.execute(PoolingNHttpClientConnectionManager.java:221) at org.apache.http.impl.nio.client.CloseableHttpAsyncClientBase$1.run(CloseableHttpAsyncClientBase.java:64) at java.base/java.lang.Thread.run(Thread.java:840) Caused by: java.lang.OutOfMemoryError: Java heap space 2024-01-25 14:33:17,417 ERROR [httpclient-io:thread-1] o.a.h.i.n.c.InternalHttpAsyncClient I/O reactor terminated abnormally org.apache.http.nio.reactor.IOReactorException: I/O dispatch worker terminated abnormally at org.apache.http.impl.nio.reactor.AbstractMultiworkerIOReactor.execute(AbstractMultiworkerIOReactor.java:359) at org.apache.http.impl.nio.conn.PoolingNHttpClientConnectionManager.execute(PoolingNHttpClientConnectionManager.java:221) at org.apache.http.impl.nio.client.CloseableHttpAsyncClientBase$1.run(CloseableHttpAsyncClientBase.java:64) at java.base/java.lang.Thread.run(Thread.java:840) Caused by: java.lang.OutOfMemoryError: Java heap space
  • The other node(s) will have the following event in their atlassian-bitbucket.logfile.

    1 2 3 4 2024-01-25 14:32:03,010 WARN [hz.hazelcast.cached.thread-2] c.h.i.c.impl.ClusterHeartbeatManager [172.XX.XX.6]:5701 [<Hazelcast-Group-Name>] [3.12.13] Suspecting Member [172.XX.XX.7]:5701 - dfb29ffe-ffdc-49a4-b79b-c5dc065f5d1b because it has not sent any heartbeats since 2024-01-25 14:30:58.217. Now: 2024-01-25 14:32:02.145, heartbeat timeout: 60000 ms, suspicion level: 1.00 2024-01-25 14:32:03,010 WARN [hz.hazelcast.cached.thread-2] c.h.i.cluster.impl.MembershipManager [172.XX.XX.6]:5701 [<Hazelcast-Group-Name>] [3.12.13] Member [172.XX.XX.7]:5701 - dfb29ffe-ffdc-49a4-b79b-c5dc065f5d1b is suspected to be dead for reason: Suspecting Member [172.XX.XX.7]:5701 - dfb29ffe-ffdc-49a4-b79b-c5dc065f5d1b because it has not sent any heartbeats since 2024-01-25 14:30:58.217. Now: 2024-01-25 14:32:02.145, heartbeat timeout: 60000 ms, suspicion level: 1.00 2024-01-25 14:32:03,019 INFO [hz.hazelcast.event-4] c.a.s.i.c.HazelcastClusterService Node '/172.XX.XX.7:5701 (node2)' was REMOVED from the cluster. Updated cluster: [/172.XX.XX.6:5701 master this name='node1' uuid='70ff0ac5-9712-4c18-a05a-95bf50fc44f0' vm-id='855cfaec-fc0a-40b6-85cb-2459005b1980']

Cause

The first stack trace occurs while synchronizing an external directory that has a big user and group pool and cannot be handled by the current heap size defined in Bitbucket. Due to insufficient heap, the hazelcast thread is unable to transmit the heartbeat to other nodes, and as a result, other nodes in the cluster mark this node as dead and remove it from the cluster.

Solution

Solution 1:

Limit the user/group synchronization in Bitbucket by defining the correct filter in the external directory's configuration defined on the Bitbucket side.

Solution 2:

Increase the heap size to a comparatively higher value to handle a large user or group tier during directory synchronization.

  • For further information on increasing the heap size in Bitbucket, refer to the Set heap size for Java on Bitbucket Server and Data Center article.

  • We do not recommend increasing the heap to a very high value as it may constrain the amount of memory available for Git processes, which may result in poor performance.

Updated on March 17, 2025

Still need help?

The Atlassian Community is here for you.