Mesh pods go offline after restart in Bitbucket Data Center on Kubernetes

Platform Notice: Data Center Only - This article only applies to Atlassian products on the Data Center platform.

Note that this KB was created for the Data Center version of the product. Data Center KBs for non-Data-Center-specific features may also work for Server versions of the product, however they have not been tested. Support for Server* products ended on February 15th 2024. If you are running a Server product, you can visit the Atlassian Server end of support announcement to review your migration options.

*Except Fisheye and Crucible

Summary

The remote Bitbucket Mesh hosted on Kubernetes Pods using Helm Charts, go offline whenever restarted.

Environment

Bitbucket 8.x onwards.

Kubernetes

Diagnosis

In your Kubernetes cluster, list the pods including the Mesh pods:

1 2 3 4 5 6 7 8 9 ubuntu@ip-10-xxx-xx-xxx:~$ kubectl get all -n bitbucket NAME READY STATUS RESTARTS AGE pod/bitbucket-0 1/1 Running 1 (9m11s ago) 38h pod/bitbucket-1 1/1 Running 1 (9m11s ago) 38h pod/bitbucket-mesh-0 1/1 Running 1 (9m11s ago) 38h pod/bitbucket-mesh-1 1/1 Running 1 (9m11s ago) 38h pod/bitbucket-mesh-2 1/1 Running 1 (9m11s ago) 38h pod/nfs-server-nfs-server-example-0 1/1 Running 6 (9m11s ago) 11d pod/postgres15-postgresql-0 1/1 Running 6 (9m11s ago) 11d
  • Login into one of the Mesh Pods:

1 kubectl exec -it pod/bitbucket-mesh-0 bash -n bitbucket
  • Check the Permissions of the files in the /var/atlassian/application-data/mesh/config directory:

(Auto-migrated image: description temporarily unavailable)

ℹ️ Here /var/atlassian/application-data/mesh is the Mesh home directory. It could be different in your setup.

  • If for some reason the Mesh pods are restarted, the permissions on the control-plane.pem file change as the following:

(Auto-migrated image: description temporarily unavailable)
  • The permissions change from 600 to 660. In other words, from -rw------- to -rw-rw----

  • This causes the Mesh pods to not come up after the restart and hence the Bitbucket Mesh UI shows all the Mesh nodes as OFFLINE.

(Auto-migrated image: description temporarily unavailable)

Cause

The issue seems to be caused by the OpenShift platform and the underlying CSI storage interface driver.

  • When the container starts and you declare a volume, there is this thing called {{securityContext}} in the Pod YAML:

1 2 3 securityContext:     seLinuxOptions:       level: 's0:c27,c19'
  • When we deploy Bitbucket on OpenShift, OpenShift runs with restricted SCC, as shown below:

1 2 3 4 5 6 7 8 openshift:        # -- When set to true, the containers will run with a restricted Security Context Constraint (SCC).     # See: https://docs.openshift.com/container-platform/4.14/authentication/managing-security-context-constraints.html     # This configuration property unsets pod's SecurityContext, nfs-fixer init container (which runs as root), and mounts server     # configuration files as ConfigMaps.     #     runWithRestrictedSCC: true
  • This means we don't set anything related to the security context in the Helm chart, but OpenShift does that automatically. Checking the file and folder permissions in the Bitbucket Mesh home directory, you might notice unusual user IDs, such as 1000740000, which is exclusive to OpenShift.

(Auto-migrated image: description temporarily unavailable)
  • Every namespace is assigned a range of these user IDs. So, the main difference between OpenShift and vanilla Kubernetes or EKS is security, which means you can't run it as a root or as a user; it is run by a privileged user with these random IDs. This matches what we see below from the Pod YAML:

1 2 3 4   securityContext:     seLinuxOptions:       level: 's0:c27,c19'     fsGroup: 1000740000
  • Before the container starts, it does the mount and changes ownership so that whoever in this group has read, write, and access to the volume. That's how this unprivileged user can't write to anything except the declared volumes.

  • It's the storage interface making the files writable for the "fsGroup: 1000740000". It's not the Mesh application that is doing it, and it's not the Helm chart because we start the container and then give it the configuration.

  • In summary, the issue we see is caused by a combination of OpenShift security practices and the underlying CSI storage interface driver used.

Solution

Add an Init container, that runs before the actual container in your Mesh Pods, by appending the following to your values.yaml file that will change the permissions to what is needed, which is 600 for the control-plane.pem file. Below is the sample snippet you can use:

1 2 3 4 5 6 7 8 additionalInitContainers: - name: chmod          image: alpine          command: - ["/bin/sh"] args: ["-c", "chmod 600 /mesh-home/config/control-plane.pem" || true"] volumeMounts: - name: mesh-home mountPath: /var/atlassian/application-data/mesh
Updated on April 8, 2025

Still need help?

The Atlassian Community is here for you.