Missing commits in Bitbucket after a filesystem migration

Platform Notice: Data Center Only - This article only applies to Atlassian products on the Data Center platform.

Note that this KB was created for the Data Center version of the product. Data Center KBs for non-Data-Center-specific features may also work for Server versions of the product, however they have not been tested. Support for Server* products ended on February 15th 2024. If you are running a Server product, you can visit the Atlassian Server end of support announcement to review your migration options.

*Except Fisheye and Crucible

Summary

After a migration to a new filesystem, commits appear as missing from the repositories.

The commits usually being reported as missing are merge commits and the commits being part of merged pull requests. The commits are missing from both the source and the target branch.

Diagnosis

When moving to a new filesystem, some customers have been using rsync and follow a workflow similar to the following:

  • do an initial rsync

  • (optional) perform additional rsync. These could be scheduled or manual and have the objective to reduce the downtime during to migrate to a new filesystem.

  • perform a final rsync

While running this sequence of commands, the --delete rsync option is not used so only changes to existing files and new files are synchronized. However, files that have been copied during the earlier rsync are not deleted and this causes unexpected conflicts in the Git repositories.

Cause

At certain intervals, Bitbucket runs a git pack-refs --all (see git-pack-refs for git-pack-refs documentation) causing the files containing the refs to the tip of the branch to be packed to the $REPOSITORY_HOME/packed-refs file.

The file containing the refs, stored as $REPOSITORY_HOME/refs/heads/feature/<branch_name>, is emptied as part of this process.

If the first rsync has been performed before the pack-refs ran and again after that without the --delete option, both the $REPOSITORY_HOME/refs/heads/<branch_name> and the $REPOSITORY_HOME/packed-refs file will be present in the target filesystem.

When that happens, git recognizes the hash in $REPOSITORY_HOME/refs/heads/<branch_name> as the tip of the branch but this is now outdated and leads to what is described as "missing commits".

Solution

Important Notice:

The script provided on this page is not officially supported by Atlassian. We can't guarantee that it will not cause any side effects or unintended consequences. It is provided as-is, and users are responsible for any issues that may arise from its use.

⚠️ Precautionary Steps ⚠️:

1. Take a Backup:

Before running the script, ensure that you have taken a complete backup of your Bitbucket server environment. This is crucial to restore your system in case of any unexpected outcomes.

2. Turn Off Bitbucket:

It is recommended that you turn off the Bitbucket Server before running the script to avoid conflicts and ensure data integrity.

3. Test Environment:

Before executing the script in a production environment, it should be thoroughly tested in a non-production or test environment. This step is critical to verify the script's output and confirm that it works as intended without causing any disruptions.

The script is not a full-blown solution that can handle all cases. It goes over the file system and establishes which unpacked references (represented by files on the file system) are older than the pack file where the packed references are stored. When it finds a file older than the file with packed references, it reports it. However, while doing so, it checks only timestamps on the files, it does not check the file contents. The script won't catch all cases if the timestamps of files with unpacked references are modified. That can happen if "rsync" is done without preserving timestamps, or if the file with the unpacked reference is updated for any reason.

This script or any other should be treated as a "last-resort" tool to use to salvage the Git repository if no other options are possible.

ℹ️ The correct approach to data migration is to always use tools that will make a 1:1 identical copy of the original file system - in the rsync case, it means using the "--delete" parameter, and preserving timestamps, ownerships and permissions.

By proceeding with the script, you acknowledge that you understand and accept the risks involved, and you agree to take full responsibility for any actions performed.

If Bitbucket has not been used yet on the new filesystem. the recommended action is to switch back to the previous filesystem or to perform the rsync again using the delete option to remove all unnecessary refs on the target site.

Whilst the preferred option is to run the rsync --delete before any new work has been performed it may not be possible. The following script will review the refs in the repository and identify which ones are older than the packed-refs file. As is it will identify the suspect refs. Uncommenting the rm will remove the unexpected older refs.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 #!/bin/bash # Script to check the repositories after rsync to find refs files that are older than the latest packed-refs # after a rsync without --delete # uncomment the rm -f $ref once you have confirmed the list looks good. TMPOCFILE="/tmp/$$.oldcommits" cd $BITBUCKET_HOME/shared/data/repositories for rep in [0-9]* do cd $rep echo Checking `pwd` \[Repository $(grep project repository-config|awk '{print $3;}')/$(grep repository repository-config|awk '{print $3;}')] find refs -type f \! -newer packed-refs -print | while read ref do grep " $ref\$" packed-refs done > $TMPOCFILE if [ -s $TMPOCFILE ] then echo Found newer refs in packed-refs cat $TMPOCFILE awk '{print $2;}' $TMPOCFILE | while read ref do echo $ref outside of packed-refs looks old # rm -f $ref done fi cd .. done rm -rf $TMPOCFILE

The above should restore the hidden references. Alternately, the user can resubmit all the changes that they have made since the initial rsync.

Step by step example

This section will show what happens to the git repository on the server with a step by step example.

Push a commit

When a commit is pushed to the rsync_testing branch in the repository, the $REPOSITORY_HOME/refs/heads/rsync_testing is updated containing the hash corresponding to the tip of the branch:

1 2 cat refs/heads/rsync_testing d17c361edcee69f6b4c25f2230896c7ef1673480

If this is a new branch, the packed-refs file does not contain any entry for the rsync_testing branch:

1 2 cat packed-refs | grep "refs/heads/rsync_testing" # no results

First Rsync

Both the $REPOSITORY_HOME/refs/heads/rsync_testing file and the $REPOSITORY_HOME/packed-refs file will match exactly the ones above in the target environment.

Push a second commit

The refs now contain the new hash:

1 2 cat refs/heads/rsync_testing f8d2ce4e12becfdbc875defb81a3041ee9f4e421

The packed-refs file still does not contain any references to the branch:

1 2 cat packed-refs | grep "refs/heads/rsync_testing" # no results

Merge the pull request

When a pull request is merged, a garbage collection is scheduled and, if the minimum interval has passed, the refs are packed:

1 2 cat refs/heads/rsync_testing # The refs have been packed so the file does not exist anymore

The packed-refs file now has an entry for the rsync_testing branch and this points to the most recent commit:

1 2 cat packed-refs | grep "refs/heads/rsync_testing" 7622bf84bdfb2fae299609a38b09e7bfdea22b00 refs/heads/rsync_testing

Second Rsync

The $REPOSITORY_HOME/packed-refs file will be synchronized while the $REPOSITORY_HOME/refs/heads/rsync_testing file will not because the file does not exist.

In the target environment, the refs/heads/rsync_testing is still present and contains the hash of the first commit:

1 2 cat refs/heads/rsync_testing cbc34bf8ea2548c37d853a934b8af0a58e436fe1

The packed-refs contain the updated value with the hash of the second commit.

1 2 cat packed-refs | grep "refs/heads/rsync_testing" 7622bf84bdfb2fae299609a38b09e7bfdea22b00 refs/heads/rsync_testing

The git logic gives the precedence to the content of the loose refs (refs/heads/rsync_testing) so the history will show the

Source environment - master branch

(Auto-migrated image: description temporarily unavailable)

Target environment - master branch

(Auto-migrated image: description temporarily unavailable)

Source environment - rsync_testing branch

(Auto-migrated image: description temporarily unavailable)

Target environment - rsync_testing branch

(Auto-migrated image: description temporarily unavailable)

Other Notes

Q: Does it matter if the source branch was deleted or not during the merge?

A: No, what is relevant is the status of the refs files and not the content of the branches. In the example, the source branch has not been deleted.

Example Script to detect inconsistencies using Push Logs

To identify repositories impacted by this event, you can analyze the entries in the push logs. These logs record the 'fromHash' and 'toHash' for each ref update event. Under normal circumstances, the 'fromHash' of a new entry should match the 'toHash' of the preceding entry.

If a branch has been unexpectedly reset to a different commit, possibly due to a filesystem migration, this in theory could be detected from its push logs.

(Auto-migrated image: description temporarily unavailable)

Below is a sample script, that would look for this discrepancy in the first n entries of the Push logs for a particular branch recursively through repositories.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 #!/bin/bash BASE_URL="https://bitbucket-1.com" PAGE_LIMIT=30 REF_NAME="refs/heads/master" BRANCH_NAME="${REF_NAME##*/}" # Credentials for basic authentication USERNAME="admin" PASSWORD="password" # Function to fetch ref change activities with pagination fetch_ref_change_activities() { local url="$1&limit=$3" curl -s -u "$USERNAME:$PASSWORD" "$url" } # Function to fetch branch information fetch_branch_info() { curl -s -u "$USERNAME:$PASSWORD" "$1" } # Function to compare hashes from the fetched activities compare_hashes() { local activities="$1" echo "===================================================" #echo "$activities" | jq #echo "===================================================" local previous_to_hash="" num_activities=$(echo "$activities" | jq '.values | length') for (( i=num_activities-1; i>=0; i-- )); do #echo "" #echo "iteration number: $i" local from_hash=$(echo "$activities" | jq -r ".values[$i].refChange.fromHash") #echo "From hash: $from_hash" local to_hash=$(echo "$activities" | jq -r ".values[$i].refChange.toHash") #echo "To hash: $to_hash" if [[ -n "$previous_to_hash" && "$previous_to_hash" != "$from_hash" ]]; then echo "Mismatch found: previous toHash '$previous_to_hash' does not match current fromHash '$from_hash' in the push log entry number $i" fi previous_to_hash="$to_hash" done # Store the latest toHash from the most recent activity LATEST_TO_HASH="$previous_to_hash" echo "Latest hash: $LATEST_TO_HASH" echo "Hash comparison completed." } # Main logic, iterating over each project and repository pair while IFS=' ' read -r PRJ_KEY REPO_SLUG; do echo "" echo "===================================================" echo "Processing project: $PRJ_KEY, repository: $REPO_SLUG" REF_CHANGE_API="$BASE_URL/rest/api/latest/projects/$PRJ_KEY/repos/$REPO_SLUG/ref-change-activities?ref=$REF_NAME" BRANCH_API="$BASE_URL/rest/api/latest/projects/$PRJ_KEY/repos/$REPO_SLUG/branches?filterText=master" START=0 IS_LAST_PAGE=false LATEST_TO_HASH="" response=$(fetch_ref_change_activities "$REF_CHANGE_API" "$PAGE_LIMIT") if [[ $? -ne 0 ]]; then echo "Failed to fetch data for $PRJ_KEY/$REPO_SLUG" exit 1 fi compare_hashes "$response" echo "" #Fetch the branch information to compare latest commit hashes branch_response=$(fetch_branch_info "$BRANCH_API") if [[ $? -ne 0 ]]; then echo "Failed to fetch branch data for $PRJ_KEY/$REPO_SLUG" exit 1 fi branch_latest_commit=$(echo "$branch_response" | jq -r '.values[0].latestCommit') # Compare the latest commit hash from the branch with the latest toHash echo "Latest hash as per push logs $LATEST_TO_HASH" if [[ "$branch_latest_commit" == "$LATEST_TO_HASH" ]]; then echo "The latest commit in the branch matches the latest toHash from activities for $PRJ_KEY/$REPO_SLUG." else echo "Mismatch: The latest commit in the branch ($branch_latest_commit) does not match the latest toHash ($LATEST_TO_HASH) in Push logs for $PRJ_KEY/$REPO_SLUG." fi done < project_repo.txt

This script uses two Scenarios to detect inconsistencies in the Push logs.

Scenario 1

  • The script leverages the Push Log API to retrieve the latest 'n' ($PAGE_LIMIT) entries from $REF_NAME branch. It then recursively compares the ToHash of each previous entry with the FromHash of the current entry.

  • This is to cover the situations where new commits have been made to the branch, assuming that any divergence occurred within the latest 'n' ($PAGE_LIMIT) entries.

Scenario 2

  • Next the script utilizes the Push Log API and the Find Branch API to verify if the latest ToHash in the push logs matches the latest commit on the $BRANCH_NAME branch.

  • This is to cover the scenario situations where no new commits have been made to the branch or push log since the last reset.

You would need to update below env variables in the script (PAGE_LIMIT determines the number of push log entries to be checked)

1 2 3 4 5 BASE_URL="" PAGE_LIMIT=30 REF_NAME="" USERNAME="" PASSWORD=""

Additionally the script expects a file named project_repo.txt in the same directory where it is run which has a list of Project keys and repo slugs in the following format

1 2 3 NPROJ repo1 NPROJ repo2 NPROJ repo3

If everything goes well, you should see output of the script as follows:

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 % ./push-log-check.sh =================================================== Processing project: NPROJ, repository: repo1 =================================================== Latest hash: f55e42c501aff8364723f9349020399614737726 Hash comparison completed. Latest hash as per push logs f55e42c501aff8364723f9349020399614737726 The latest commit in the branch matches the latest toHash from activities for NPROJ/repo1. =================================================== Processing project: NPROJ, repository: repo2 =================================================== Latest hash: 19bc028c43bdb57d14e72d7f3e4ace1974988ef1 Hash comparison completed. Latest hash as per push logs 19bc028c43bdb57d14e72d7f3e4ace1974988ef1 Mismatch: The latest commit in the branch (c879d339de8f966f52524d1c90b50a714d91532e) does not match the latest toHash (19bc028c43bdb57d14e72d7f3e4ace1974988ef1) in Push logs for NPROJ/repo2. =================================================== Processing project: NPROJ, repository: repo3 =================================================== Mismatch found: previous toHash '383444187cc85a5e947b0abcc85e93672d680b55' does not match current fromHash 'e8843e5cb1604e7c5f4b5d45d6f8ce6af4984898' in the push log entry number 4 Latest hash: 7fa333a194a0a9cabfb7028ea4b0f892620a8e01 Hash comparison completed. Latest hash as per push logs 7fa333a194a0a9cabfb7028ea4b0f892620a8e01 The latest commit in the branch matches the latest toHash from activities for NPROJ/repo3.

In the above example the inconsistencies are found in repo2 and repo3.

Please note that creating custom scripts falls outside the scope of our standard support offerings. The provided script is intended solely as a reference and is crafted on a best-effort basis. It has not been rigorously tested to account for every potential edge case, so it is crucial that you conduct your own diligence to customize and thoroughly test the script to fit your specific needs. This script is not covered under Atlassian Support, and therefore, we are unable to assist with troubleshooting or further customization. We recommend that you or your team take responsibility for ensuring it meets your requirements.

Updated on April 24, 2025

Still need help?

The Atlassian Community is here for you.