REPLICA_STATE_MISSING error on Remote Mesh Nodes of Bitbucket Data Center
Platform Notice: Data Center Only - This article only applies to Atlassian products on the Data Center platform.
Note that this KB was created for the Data Center version of the product. Data Center KBs for non-Data-Center-specific features may also work for Server versions of the product, however they have not been tested. Support for Server* products ended on February 15th 2024. If you are running a Server product, you can visit the Atlassian Server end of support announcement to review your migration options.
*Except Fisheye and Crucible
Summary
The Remote Mesh Nodes of a Bitbucket Data Center are throwing the error REPLICA_STATE_MISSING during a repair.
Environment
Bitbucket Data Center 8.x and above
Mesh 2.x and above
Diagnosis
During a replica repair on remote mesh nodes, the below errors occur on the atlassian-mesh.log
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
2024-07-30 06:59:29,022 DEBUG [repair:thread-2] SJO4LJTEx419x738239747x21 c.a.bitbucket.mesh.repair.RepairGate Claimed slot for p/0027/h/3cdca3995e8add2042b1/r/4365 (available = 4, underRepair = [p/0027/h/8fb1a2720bb454a094ad/r/3947, p/0027/h/fbc6445afe3dbc0636e5/r/3440, p/0027/h/e8ee5d38f61ce207c213/r/1262, p/0027/h/3cdca3995e8add2042b1/r/4365, p/0027/h/af655d16ffb66e0bf5e3/r/5809, p/0027/h/dd46659dcecfd9caf023/r/4799])
2024-07-30 06:59:29,022 DEBUG [grpc-client:thread-3885] SJO4LJTEx419x738239728x13 c.a.b.m.r.DefaultReplicaStateRegistry [p/0027/h/e8ee5d38f61ce207c213/r/1262] Ignoring stale observation for p/0027/h/e8ee5d38f61ce207c213/r/1262; current: [ReplicaStateObservation{force=false, state=INCONSISTENT, timestamp=1711106342, version=1}], observed [ReplicaStateObservation{force=false, state=UNKNOWN, timestamp=4919401167, version=0}]
2024-07-30 06:59:29,023 INFO [repair:thread-4] - c.a.b.m.repair.DefaultRepairManager Node bitbucket-staging-mesh-2@3 did not have a consistent replica of p/0027/h/6231b5df4f80da052931/r/5419. Trying next replica..
2024-07-30 06:59:29,024 DEBUG [grpc-client:thread-3866] SJO4LJTEx419x738239738x17 c.a.b.mesh.repair.RepairTarget [p/0027/h/8fb1a2720bb454a094ad/r/3947] Received source state: MISSING (Metadata: , Content: )
2024-07-30 06:59:29,023 DEBUG [repair:thread-6] SJO4LJTEx419x738239749x20 c.a.b.mesh.repair.RepairTarget [p/0027/h/e9aa5ef13b117dc303ab/r/1559] Starting repair
2024-07-30 06:59:29,023 INFO [repair:thread-1] - c.a.b.m.repair.DefaultRepairManager Node bitbucket-staging-mesh-2@3 did not have a consistent replica of p/0027/h/06ac36768862ba70559c/r/5553. Trying next replica..
2024-07-30 06:59:29,024 WARN [grpc-client:thread-3866] SJO4LJTEx419x738239738x17 c.a.b.mesh.repair.RepairTarget [p/0027/h/8fb1a2720bb454a094ad/r/3947] Cannot repair because the source is not up to date (REPLICA_STATE_MISSING)
2024-07-30 06:59:29,023 DEBUG [grpc-client:thread-3870] SJO4LJTEx419x738239735x14 c.a.b.mesh.repair.RepairTarget [p/0027/h/fbc6445afe3dbc0636e5/r/3440] Received source state: MISSING (Metadata: , Content: )
2024-07-30 06:59:29,024 WARN [repair:thread-1] - c.a.b.m.repair.DefaultRepairManager [p/0027/h/06ac36768862ba70559c/r/5553] No up-to-date replica is available for repair (tried 2 nodes). Retrying in 30s (attempt 2/25)
2024-07-30 06:59:29,024 DEBUG [repair:thread-1] - c.a.b.m.repair.DefaultRepairManager [p/0027/h/06ac36768862ba70559c/r/5553] Repair failed
com.atlassian.bitbucket.mesh.git.exception.RepositoryRepairFailedException: Could not repair p/0027/h/06ac36768862ba70559c/r/5553
at com.atlassian.bitbucket.mesh.repair.DefaultRepairManager$RepairTask.repair(DefaultRepairManager.java:299)
at com.atlassian.bitbucket.mesh.repair.DefaultRepairManager$RepairTask.run(DefaultRepairManager.java:203)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:829)
2024-07-30 06:59:29,024 WARN [grpc-client:thread-3870] SJO4LJTEx419x738239735x14 c.a.b.mesh.repair.RepairTarget [p/0027/h/fbc6445afe3dbc0636e5/r/3440] Cannot repair because the source is not up to date (REPLICA_STATE_MISSING)
Cause
The system automatically detects that there is less than the replication factor of a repository on the mesh nodes and assigns the repository as missing to a mesh node. That mesh node will then attempt to automatically repair the repository either from the other mesh nodes or the sidecar if the mesh migration is in progress. In this case, the source node from which the repair is attempted itself is having the replica state as missing.
Solution
If the "missing replica state" errors are happening during migration where the source is the sidecar and the target is one of the remote mesh nodes, the next step would be to remigrate that repository/hierarchy back to the mesh sidecar and then try the migration again.
If the error is happening during the general repair process on the remote mesh nodes, we have to check the replica state from all mesh nodes for an affected repository using the below API and then use another REST end-point to perform a custom repair.
API to check the replica state on all the remote mesh nodes:
1 2 3 4
curl -k -X GET --location "https://{BITBUCKET_URL}/rest/ui/latest/admin/git/mesh/troubleshooting/projects/{PROJECT_NAME}/repos/{REPO_NAME}/replicas" \ -H "Accept: application/json" \ --basic --user admin:password [{"node":{"id":1,"lastSeenDate":1722518675544,"name":"Node1","rpcId":"1","rpcUrl":"http://XX.X.XX.XXX:7777","state":"AVAILABLE","offline":false},"controlPlaneState":{"replicaState":"CONSISTENT","observedVersion":41},"nodeState":{"contentHash":"7c7137cf3d917412c161d8ae129c9086de36b4c46a36f3845fb25353ff4e7bf7","metadataHash":"03cb0eaa36a59acff6a551cbb45b1557e31381e5bf92a6733417fa727bc30608","replicaState":"MISSING","observedVersion":41,"version":41}},{"node":{"id":2,"lastSeenDate":1722518684866,"name":"Node2","rpcId":"2","rpcUrl":"http://XX.X.XX.XXX:7777","state":"AVAILABLE","offline":false},"controlPlaneState":{"replicaState":"CONSISTENT"},"nodeState":{"contentHash":"7c7137cf3d917412c161d8ae129c9086de36b4c46a36f3845fb25353ff4e7bf7","metadataHash":"03cb0eaa36a59acff6a551cbb45b1557e31381e5bf92a6733417fa727bc30608","replicaState":"CONSISTENT","observedVersion":41,"version":41}},{"node":{"id":3,"lastSeenDate":1722518684874,"name":"Node3","rpcId":"3","rpcUrl":"http://XX.X.XX.XXX:7777","state":"AVAILABLE","offline":false},"controlPlaneState":{"replicaState":"CONSISTENT"},"nodeState":{"contentHash":"7c7137cf3d917412c161d8ae129c9086de36b4c46a36f3845fb25353ff4e7bf7","metadataHash":"03cb0eaa36a59acff6a551cbb45b1557e31381e5bf92a6733417fa727bc30608","replicaState":"CONSISTENT","observedVersion":41,"version":41}}]%
In the above example, we can see that the Node 1 replica state is missing.
API to initiate a repair for Node1 with another node whose replicaState is consistent
1 2 3 4
curl -k -X POST --location "https://{BITBUCKET_URL}/rest/ui/latest/admin/git/mesh/troubleshooting/projects/{PROJECT_NAME}/repos/{PROJECT_NAME}{PROJECT_NAME}/replicas/1/repair?sourceNodeId=2" \ --user admin:password \ -H 'Content-type: application/json' {"success":true}%
Check the replica state once more. Now Node 1 is consistent.
1 2 3 4
curl -k -X GET --location "https://{BITBUCKET_URL}/rest/ui/latest/admin/git/mesh/troubleshooting/projects/{PROJECT_NAME}/repos/{PROJECT_NAME}/replicas" \ -H "Accept: application/json" \ --basic --user admin:password [{"node":{"id":1,"lastSeenDate":1722518706849,"name":"Node1","rpcId":"1","rpcUrl":"http://XX.X.XX.XXX:7777","state":"AVAILABLE","offline":false},"controlPlaneState":{"replicaState":"CONSISTENT","observedVersion":41},"nodeState":{"contentHash":"7c7137cf3d917412c161d8ae129c9086de36b4c46a36f3845fb25353ff4e7bf7","metadataHash":"03cb0eaa36a59acff6a551cbb45b1557e31381e5bf92a6733417fa727bc30608","replicaState":"CONSISTENT","observedVersion":41,"version":41}},{"node":{"id":2,"lastSeenDate":1722518684866,"name":"Node2","rpcId":"2","rpcUrl":"http://XX.X.XX.XXX:7777","state":"AVAILABLE","offline":false},"controlPlaneState":{"replicaState":"CONSISTENT"},"nodeState":{"contentHash":"7c7137cf3d917412c161d8ae129c9086de36b4c46a36f3845fb25353ff4e7bf7","metadataHash":"03cb0eaa36a59acff6a551cbb45b1557e31381e5bf92a6733417fa727bc30608","replicaState":"CONSISTENT","observedVersion":41,"version":41}},{"node":{"id":3,"lastSeenDate":1722518684874,"name":"Node3","rpcId":"3","rpcUrl":"http://XX.X.XX.XXX:7777","state":"AVAILABLE","offline":false},"controlPlaneState":{"replicaState":"CONSISTENT"},"nodeState":{"contentHash":"7c7137cf3d917412c161d8ae129c9086de36b4c46a36f3845fb25353ff4e7bf7","metadataHash":"03cb0eaa36a59acff6a551cbb45b1557e31381e5bf92a6733417fa727bc30608","replicaState":"CONSISTENT","observedVersion":41,"version":41}}]%
Was this helpful?