Mesh pods go offline after restart in Bitbucket Data Center on Kubernetes
Platform Notice: Data Center Only - This article only applies to Atlassian products on the Data Center platform.
Note that this KB was created for the Data Center version of the product. Data Center KBs for non-Data-Center-specific features may also work for Server versions of the product, however they have not been tested. Support for Server* products ended on February 15th 2024. If you are running a Server product, you can visit the Atlassian Server end of support announcement to review your migration options.
*Except Fisheye and Crucible
Summary
The remote Bitbucket Mesh hosted on Kubernetes Pods using Helm Charts, go offline whenever restarted.
Environment
Bitbucket 8.x onwards.
Kubernetes
Diagnosis
In your Kubernetes cluster, list the pods including the Mesh pods:
1
2
3
4
5
6
7
8
9
ubuntu@ip-10-xxx-xx-xxx:~$ kubectl get all -n bitbucket
NAME READY STATUS RESTARTS AGE
pod/bitbucket-0 1/1 Running 1 (9m11s ago) 38h
pod/bitbucket-1 1/1 Running 1 (9m11s ago) 38h
pod/bitbucket-mesh-0 1/1 Running 1 (9m11s ago) 38h
pod/bitbucket-mesh-1 1/1 Running 1 (9m11s ago) 38h
pod/bitbucket-mesh-2 1/1 Running 1 (9m11s ago) 38h
pod/nfs-server-nfs-server-example-0 1/1 Running 6 (9m11s ago) 11d
pod/postgres15-postgresql-0 1/1 Running 6 (9m11s ago) 11d
Login into one of the Mesh Pods:
1
kubectl exec -it pod/bitbucket-mesh-0 bash -n bitbucket
Check the Permissions of the files in the
/var/atlassian/application-data/mesh/config
directory:

ℹ️ Here /var/atlassian/application-data/mesh
is the Mesh home directory. It could be different in your setup.
If for some reason the Mesh pods are restarted, the permissions on the
control-plane.pem
file change as the following:

The permissions change from 600 to 660. In other words, from
-rw-------
to-rw-rw----
This causes the Mesh pods to not come up after the restart and hence the Bitbucket Mesh UI shows all the Mesh nodes as OFFLINE.

Cause
The issue seems to be caused by the OpenShift platform and the underlying CSI storage interface driver.
When the container starts and you declare a volume, there is this thing called {{securityContext}} in the Pod YAML:
1
2
3
securityContext:
seLinuxOptions:
level: 's0:c27,c19'
When we deploy Bitbucket on OpenShift, OpenShift runs with restricted SCC, as shown below:
1
2
3
4
5
6
7
8
openshift:
# -- When set to true, the containers will run with a restricted Security Context Constraint (SCC).
# See: https://docs.openshift.com/container-platform/4.14/authentication/managing-security-context-constraints.html
# This configuration property unsets pod's SecurityContext, nfs-fixer init container (which runs as root), and mounts server
# configuration files as ConfigMaps.
#
runWithRestrictedSCC: true
This means we don't set anything related to the security context in the Helm chart, but OpenShift does that automatically. Checking the file and folder permissions in the Bitbucket Mesh home directory, you might notice unusual user IDs, such as 1000740000, which is exclusive to OpenShift.

Every namespace is assigned a range of these user IDs. So, the main difference between OpenShift and vanilla Kubernetes or EKS is security, which means you can't run it as a root or as a user; it is run by a privileged user with these random IDs. This matches what we see below from the Pod YAML:
1
2
3
4
securityContext:
seLinuxOptions:
level: 's0:c27,c19'
fsGroup: 1000740000
Before the container starts, it does the mount and changes ownership so that whoever in this group has read, write, and access to the volume. That's how this unprivileged user can't write to anything except the declared volumes.
It's the storage interface making the files writable for the "
fsGroup: 1000740000
". It's not the Mesh application that is doing it, and it's not the Helm chart because we start the container and then give it the configuration.In summary, the issue we see is caused by a combination of OpenShift security practices and the underlying CSI storage interface driver used.
Solution
Add an Init container, that runs before the actual container in your Mesh Pods, by appending the following to your values.yaml
file that will change the permissions to what is needed, which is 600 for the control-plane.pem
file. Below is the sample snippet you can use:
1
2
3
4
5
6
7
8
additionalInitContainers:
- name: chmod
image: alpine
command: - ["/bin/sh"]
args: ["-c", "chmod 600 /mesh-home/config/control-plane.pem" || true"]
volumeMounts:
- name: mesh-home
mountPath: /var/atlassian/application-data/mesh
Was this helpful?