REPLICA_STATE_MISSING error on Remote Mesh Nodes of Bitbucket Data Center

Platform Notice: Data Center Only - This article only applies to Atlassian products on the Data Center platform.

Note that this KB was created for the Data Center version of the product. Data Center KBs for non-Data-Center-specific features may also work for Server versions of the product, however they have not been tested. Support for Server* products ended on February 15th 2024. If you are running a Server product, you can visit the Atlassian Server end of support announcement to review your migration options.

*Except Fisheye and Crucible

Summary

The Remote Mesh Nodes of a Bitbucket Data Center are throwing the error REPLICA_STATE_MISSING during a repair.

Environment

Bitbucket Data Center 8.x and above

Mesh 2.x and above

Diagnosis

During a replica repair on remote mesh nodes, the below errors occur on the atlassian-mesh.log

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 2024-07-30 06:59:29,022 DEBUG [repair:thread-2] SJO4LJTEx419x738239747x21 c.a.bitbucket.mesh.repair.RepairGate Claimed slot for p/0027/h/3cdca3995e8add2042b1/r/4365 (available = 4, underRepair = [p/0027/h/8fb1a2720bb454a094ad/r/3947, p/0027/h/fbc6445afe3dbc0636e5/r/3440, p/0027/h/e8ee5d38f61ce207c213/r/1262, p/0027/h/3cdca3995e8add2042b1/r/4365, p/0027/h/af655d16ffb66e0bf5e3/r/5809, p/0027/h/dd46659dcecfd9caf023/r/4799]) 2024-07-30 06:59:29,022 DEBUG [grpc-client:thread-3885] SJO4LJTEx419x738239728x13 c.a.b.m.r.DefaultReplicaStateRegistry [p/0027/h/e8ee5d38f61ce207c213/r/1262] Ignoring stale observation for p/0027/h/e8ee5d38f61ce207c213/r/1262; current: [ReplicaStateObservation{force=false, state=INCONSISTENT, timestamp=1711106342, version=1}], observed [ReplicaStateObservation{force=false, state=UNKNOWN, timestamp=4919401167, version=0}] 2024-07-30 06:59:29,023 INFO [repair:thread-4] - c.a.b.m.repair.DefaultRepairManager Node bitbucket-staging-mesh-2@3 did not have a consistent replica of p/0027/h/6231b5df4f80da052931/r/5419. Trying next replica.. 2024-07-30 06:59:29,024 DEBUG [grpc-client:thread-3866] SJO4LJTEx419x738239738x17 c.a.b.mesh.repair.RepairTarget [p/0027/h/8fb1a2720bb454a094ad/r/3947] Received source state: MISSING (Metadata: , Content: ) 2024-07-30 06:59:29,023 DEBUG [repair:thread-6] SJO4LJTEx419x738239749x20 c.a.b.mesh.repair.RepairTarget [p/0027/h/e9aa5ef13b117dc303ab/r/1559] Starting repair 2024-07-30 06:59:29,023 INFO [repair:thread-1] - c.a.b.m.repair.DefaultRepairManager Node bitbucket-staging-mesh-2@3 did not have a consistent replica of p/0027/h/06ac36768862ba70559c/r/5553. Trying next replica.. 2024-07-30 06:59:29,024 WARN [grpc-client:thread-3866] SJO4LJTEx419x738239738x17 c.a.b.mesh.repair.RepairTarget [p/0027/h/8fb1a2720bb454a094ad/r/3947] Cannot repair because the source is not up to date (REPLICA_STATE_MISSING) 2024-07-30 06:59:29,023 DEBUG [grpc-client:thread-3870] SJO4LJTEx419x738239735x14 c.a.b.mesh.repair.RepairTarget [p/0027/h/fbc6445afe3dbc0636e5/r/3440] Received source state: MISSING (Metadata: , Content: ) 2024-07-30 06:59:29,024 WARN [repair:thread-1] - c.a.b.m.repair.DefaultRepairManager [p/0027/h/06ac36768862ba70559c/r/5553] No up-to-date replica is available for repair (tried 2 nodes). Retrying in 30s (attempt 2/25) 2024-07-30 06:59:29,024 DEBUG [repair:thread-1] - c.a.b.m.repair.DefaultRepairManager [p/0027/h/06ac36768862ba70559c/r/5553] Repair failed com.atlassian.bitbucket.mesh.git.exception.RepositoryRepairFailedException: Could not repair p/0027/h/06ac36768862ba70559c/r/5553 at com.atlassian.bitbucket.mesh.repair.DefaultRepairManager$RepairTask.repair(DefaultRepairManager.java:299) at com.atlassian.bitbucket.mesh.repair.DefaultRepairManager$RepairTask.run(DefaultRepairManager.java:203) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) at java.base/java.lang.Thread.run(Thread.java:829) 2024-07-30 06:59:29,024 WARN [grpc-client:thread-3870] SJO4LJTEx419x738239735x14 c.a.b.mesh.repair.RepairTarget [p/0027/h/fbc6445afe3dbc0636e5/r/3440] Cannot repair because the source is not up to date (REPLICA_STATE_MISSING)

Cause

The system automatically detects that there is less than the replication factor of a repository on the mesh nodes and assigns the repository as missing to a mesh node. That mesh node will then attempt to automatically repair the repository either from the other mesh nodes or the sidecar if the mesh migration is in progress. In this case, the source node from which the repair is attempted itself is having the replica state as missing.

Solution

  • If the "missing replica state" errors are happening during migration where the source is the sidecar and the target is one of the remote mesh nodes, the next step would be to remigrate that repository/hierarchy back to the mesh sidecar and then try the migration again.

  • If the error is happening during the general repair process on the remote mesh nodes, we have to check the replica state from all mesh nodes for an affected repository using the below API and then use another REST end-point to perform a custom repair.

    • API to check the replica state on all the remote mesh nodes:

      1 2 3 4 curl -k -X GET --location "https://{BITBUCKET_URL}/rest/ui/latest/admin/git/mesh/troubleshooting/projects/{PROJECT_NAME}/repos/{REPO_NAME}/replicas" \ -H "Accept: application/json" \ --basic --user admin:password [{"node":{"id":1,"lastSeenDate":1722518675544,"name":"Node1","rpcId":"1","rpcUrl":"http://XX.X.XX.XXX:7777","state":"AVAILABLE","offline":false},"controlPlaneState":{"replicaState":"CONSISTENT","observedVersion":41},"nodeState":{"contentHash":"7c7137cf3d917412c161d8ae129c9086de36b4c46a36f3845fb25353ff4e7bf7","metadataHash":"03cb0eaa36a59acff6a551cbb45b1557e31381e5bf92a6733417fa727bc30608","replicaState":"MISSING","observedVersion":41,"version":41}},{"node":{"id":2,"lastSeenDate":1722518684866,"name":"Node2","rpcId":"2","rpcUrl":"http://XX.X.XX.XXX:7777","state":"AVAILABLE","offline":false},"controlPlaneState":{"replicaState":"CONSISTENT"},"nodeState":{"contentHash":"7c7137cf3d917412c161d8ae129c9086de36b4c46a36f3845fb25353ff4e7bf7","metadataHash":"03cb0eaa36a59acff6a551cbb45b1557e31381e5bf92a6733417fa727bc30608","replicaState":"CONSISTENT","observedVersion":41,"version":41}},{"node":{"id":3,"lastSeenDate":1722518684874,"name":"Node3","rpcId":"3","rpcUrl":"http://XX.X.XX.XXX:7777","state":"AVAILABLE","offline":false},"controlPlaneState":{"replicaState":"CONSISTENT"},"nodeState":{"contentHash":"7c7137cf3d917412c161d8ae129c9086de36b4c46a36f3845fb25353ff4e7bf7","metadataHash":"03cb0eaa36a59acff6a551cbb45b1557e31381e5bf92a6733417fa727bc30608","replicaState":"CONSISTENT","observedVersion":41,"version":41}}]%

      In the above example, we can see that the Node 1 replica state is missing.

    • API to initiate a repair for Node1 with another node whose replicaState is consistent

      1 2 3 4 curl -k -X POST --location "https://{BITBUCKET_URL}/rest/ui/latest/admin/git/mesh/troubleshooting/projects/{PROJECT_NAME}/repos/{PROJECT_NAME}{PROJECT_NAME}/replicas/1/repair?sourceNodeId=2" \ --user admin:password \ -H 'Content-type: application/json' {"success":true}%
    • Check the replica state once more. Now Node 1 is consistent.

      1 2 3 4 curl -k -X GET --location "https://{BITBUCKET_URL}/rest/ui/latest/admin/git/mesh/troubleshooting/projects/{PROJECT_NAME}/repos/{PROJECT_NAME}/replicas" \ -H "Accept: application/json" \ --basic --user admin:password [{"node":{"id":1,"lastSeenDate":1722518706849,"name":"Node1","rpcId":"1","rpcUrl":"http://XX.X.XX.XXX:7777","state":"AVAILABLE","offline":false},"controlPlaneState":{"replicaState":"CONSISTENT","observedVersion":41},"nodeState":{"contentHash":"7c7137cf3d917412c161d8ae129c9086de36b4c46a36f3845fb25353ff4e7bf7","metadataHash":"03cb0eaa36a59acff6a551cbb45b1557e31381e5bf92a6733417fa727bc30608","replicaState":"CONSISTENT","observedVersion":41,"version":41}},{"node":{"id":2,"lastSeenDate":1722518684866,"name":"Node2","rpcId":"2","rpcUrl":"http://XX.X.XX.XXX:7777","state":"AVAILABLE","offline":false},"controlPlaneState":{"replicaState":"CONSISTENT"},"nodeState":{"contentHash":"7c7137cf3d917412c161d8ae129c9086de36b4c46a36f3845fb25353ff4e7bf7","metadataHash":"03cb0eaa36a59acff6a551cbb45b1557e31381e5bf92a6733417fa727bc30608","replicaState":"CONSISTENT","observedVersion":41,"version":41}},{"node":{"id":3,"lastSeenDate":1722518684874,"name":"Node3","rpcId":"3","rpcUrl":"http://XX.X.XX.XXX:7777","state":"AVAILABLE","offline":false},"controlPlaneState":{"replicaState":"CONSISTENT"},"nodeState":{"contentHash":"7c7137cf3d917412c161d8ae129c9086de36b4c46a36f3845fb25353ff4e7bf7","metadataHash":"03cb0eaa36a59acff6a551cbb45b1557e31381e5bf92a6733417fa727bc30608","replicaState":"CONSISTENT","observedVersion":41,"version":41}}]%

Updated on April 14, 2025

Still need help?

The Atlassian Community is here for you.