Troubleshoot Bitbucket service failure due to JFR logging

Platform Notice: Data Center Only - This article only applies to Atlassian products on the Data Center platform.

Note that this KB was created for the Data Center version of the product. Data Center KBs for non-Data-Center-specific features may also work for Server versions of the product, however they have not been tested. Support for Server* products ended on February 15th 2024. If you are running a Server product, you can visit the Atlassian Server end of support announcement to review your migration options.

*Except Fisheye and Crucible

Summary

Bitbucket server experiences periodic downtimes, which don't have specific pointers in the logs, other than the Nodes being removed from the cluster.

1 2 3 4 5 6 2024-04-23 13:33:57,802 INFO [hz.hazelcast.event-5] c.a.s.i.c.HazelcastClusterService Node '/10.x.x.x:5701' was REMOVED from the cluster. Updated cluster: 2024-04-23 12:30:53,558 INFO [hz.hazelcast.event-5] c.a.s.i.c.HazelcastClusterService Node '/10.x.x.x:5701' was REMOVED from the cluster. Updated cluster: 2024-04-23 14:00:36,387 INFO [hz.hazelcast.event-4] c.a.s.i.c.HazelcastClusterService Node '/10.x.x.x:5701' was REMOVED from the cluster. Updated cluster:

Environment

  • Bitbucket 8.14.1 and above.

Diagnosis

While diagnosing this issue we have found:

  • No evidence of the service being killed externally by 'OOM' killer for example

  • No evidence of the service being gracefully shut down

  • The atlassian-bitbucket.log in affected nodes only shows them exiting the cluster without further logging.

1 2024-04-22 15:08:41,425 WARN [hz.hazelcast.cached.thread-1] c.h.n.t.TcpIpConnectionErrorHandler [10.x.x.x]:5701 [GTE-bitbucket-cluster] [3.12.13] Removing connection to endpoint [10.x.x.x]:5701 Cause => java.net.SocketException {Connection refused to address /10.x.x.x:5701}, Error-Count: 52024-04-22 15:08:41,433 INFO [hz.hazelcast.event-4] c.a.s.i.c.HazelcastClusterService Node '/10.X.X.X' was REMOVED from the cluster. Updated cluster:
  • The launcher.log which logs the system stdout/stderr events, however, details the events leading to the JVM shutting down.

1 2 3 4 5 6 7 8 9 10 11 00:43:18.352 [main] INFO com.atlassian.security.java8.serialfilter.DeserializationFilterConfigurator - Global serial filter set to JDK 8 DeserializationFilter ANTLR Tool version 4.5.3 used for code generation does not match the current runtime version 4.6ANTLR Runtime version 4.5.3 used for parser compilation does not match the current runtime version 4.6ANTLR Tool version 4.5.3 used for code generation does not match the current runtime version 4.6ANTLR Runtime version 4.5.3 used for parser compilation does not match the current runtime version 4.62024-02-08 15:26:05,572 analyticsEventProcessor:thread-1 ERROR Unable to write to stream /var/atlassian/application-data/bitbucket/analytics-logs/4bf3c3d2538000f525a05d096b011ad0.11a7eaa7c55bcb6a00e072075c02dd06.atlassian-analytics.log for appender rolling org.apache.logging.log4j.core.appender.AppenderLoggingException: Error writing to stream /var/atlassian/application-data/bitbucket/analytics-logs/4bf3c3d2538000f525a05d096b011ad0.11a7eaa7c55bcb6a00e072075c02dd06.atlassian-analytics.log at org.apache.logging.log4j.core.appender.OutputStreamManager.writeToDestination(OutputStreamManager.java:252) ... Caused by: java.io.IOException: No space left on device at java.io.FileOutputStream.writeBytes(Native Method) at java.io.FileOutputStream.write(FileOutputStream.java:326) at org.apache.logging.log4j.cor20:07:30.372 [main] INFO com.atlassian.security.serialfilter.DeserializationFilterConfigurator - Global serial filter set to JDK 11 DeserializationFilter WARNING: sun.reflect.Reflection.getCallerClass is not supported. This will impact performance. ANTLR Tool version 4.5.3 used for code generation does not match the current runtime version 4.6ANTLR Runtime version 4.5.3 used for parser compilation does not match the current runtime version 4.6ANTLR Tool version 4.5.3 used for code generation does not match the current runtime version 4.6ANTLR Runtime version 4.5.3 used for parser compilation does not match the current runtime version 4.6[194741.694s][error][jfr,system] Failed to write to jfr stream because no space left on device [194741.695s][error][jfr,system] An irrecoverable error in Jfr. Shutting down VM...

The launcher.log does not log the date and time of events but will generally refer to recent events.

  • Look for warnings related to SCM caching being disabled in the Bitbucket Mesh Sidecar logs. If there is a surge of clone requests and the instance runs out of space on the $BITBUCKET_HOME, it can lead to a situation where the $BITBUCKET_HOME disk space is filled. This could prevent JFR recordings from being written to disk, potentially causing the Bitbucket process JVM to crash.

1 WARN [git-hook:thread-10492] scm_clone 6IMAQT6Sx1118x85072912x271 *11MBM8Ex1118x123828767x264 127.0.0.1 "HostingService/UploadPack" (>2 <5) c.a.util.contentcache.ContentCache 6ae44603db17a6a15f71-69315: Caching has been temporarily disabled because there is not enough free space on /var/atlassian/bitbucket/home/caches/pack/6ae44603db17a6a15f71-69315 (905216 bytes free)

Cause

  • Java Flight Recorder (JFR) files fill the default jfr logs location <BITBUCKET_HOME>/log/jfr, which makes the JVM to stall because it couldn't write to the Disk due to No Space left on Disk issue.

Solution

  • The ultimate solution is to analyse the JFR files using a tool such as JDK mission control to understand why the files are being heavily logged.

Workarounds

  • Ensuring that there is sufficient disk space for the JFR files. The space required for the recording is calculated according to the following formula: jfr.recording.max_size * jfr.recording.files_to_remain.

  • Setup a separate volume to write the SCM cache files so they don't fill up the storage used by the $BITBUCKET_HOME as recommended on Scaling Bitbucket Data Center > Caching > Considerations

  • Changing the location of the JFR files to a bigger data store such as an NFS storage.

  • Changing the size limit, count of the JFR files or duration of their storage as per the JFR diagnostics guide.

  • As a temporary workaround, JFR logging can be disabled in the troubleshooting and support tools - diagnostic settings view.

Updated on March 13, 2025

Still need help?

The Atlassian Community is here for you.