How do I remove sensitive/unwanted content that was pushed to my Bitbucket Data Center instance?
Platform Notice: Data Center Only - This article only applies to Atlassian products on the Data Center platform.
Note that this KB was created for the Data Center version of the product. Data Center KBs for non-Data-Center-specific features may also work for Server versions of the product, however they have not been tested. Support for Server* products ended on February 15th 2024. If you are running a Server product, you can visit the Atlassian Server end of support announcement to review your migration options.
*Except Fisheye and Crucible
Summary
The purpose of this article is to describe the steps that can be taken to remove sensitive or otherwise unwanted information that has been pushed to a repository hosted in Bitbucket Data Center.
Background
When a sensitive file or line has been pushed to a git repository, such as an SSH key or password, if your team has added additional commits since this content was added - simply deleting the content in the latest commit is not enough, as this information is still going to exist within the commit history for this repository.
As soon as this sensitive commit has been pushed, your team should treat this data as though it were compromised. Any passwords or SSH keys should be immediately changed, as it's possible that the sensitive information has been already manually copied. In addition, any clones or forks that contain this commit will not be affected by these steps.
What's more, rewriting history and force pushes can lead to undesirable results and unexpected behaviours in Bitbucket Data Center, which is why we generally discourage this practice if you can avoid it at all.
Solution
There are two different methods you can use to remove this sensitive content from your repository's commit history:
The git command
git filter-branch
Both methods ultimately will end up re-writing the history of the repository to make it as though the sensitive commit was never pushed in the first place.
Using git filter-branch
Running git filter-branch
after storing changes using git stash
will result in these stashed changes being unretrievable. Any stashed changes should be unstashed prior to running this command.
Clone down the repository to your local git client
Navigate into the repository's directory and execute the following command, being sure to replace 'PATH/TO/SENSITIVE/DATA' with the relative path (inside the clone of the repository) of the entire file you want to remove.
1
git filter-branch --force --index-filter "git rm --cached --ignore-unmatch PATH/TO/SENSITIVE/DATA" --prune-empty --tag-name-filter cat -- --all
NOTE: The slashes in PATH/TO/SENSITIVE/DATA must be "/" rather than "\" when running commands from a Windows client.
ℹ️ Though not strictly necessary, it's recommended you add the sensitive data to the repository's .gitignore file to ensure that it is not accidentally committed again.
After your team has reviewed the state of the local repository, run the following commands to force push the changes back up to Bitbucket to overwrite the repository's existing commit history.
1 2
git push origin --force --all git push origin --force --tags
Any users cloning or forking from this repository should be asked to
git rebase
any branches that contain the old repository history. It is important to rebase and not merge, as merging could result in the sensitive data being re-introduced into the now clean git history of the main repository.Lastly - be sure to force all objects in your local repository to be garbage collected using the commands:
1 2 3
git for-each-ref --format='delete %(refname)' refs/original | git update-ref --stdin git reflog expire --expire=now --all git gc --prune=now
⚠️ These commands should NOT be executed directly against the repository on the Bitbucket Data Center - only your local copy of the repository. Running git gc against Bitbucket's copy of the repository can result in serious data corruption
If your team has recently merged this unwanted change in a pull request, it is recommended that you contact Atlassian Support for the next steps, as Bitbucket will preserve many git objects on the server related to pull requests. The merged pull requests' diff may still contain sensitive/unwanted information. The steps that need to be taken to remove the data from these pull requests require manual changes to the external database that will vary depending on the situation. These changes can have adverse effects if they are not performed correctly, and it's for this reason that we encourage teams to reach out to support for additional help along these lines.
Using BFG Repo-Cleaner
BFG Repo-Cleaner is an open-source tool that offers a simpler way of removing unwanted data from your repository's commit history when compared to using the git filter-branch
command.
⚠️ It's important to note that the BFG Repo-Cleaner operates under the assumption that your latest commit is already clean - meaning that it will not perform any changes to the latest commit, but only the commits before it. It's recommended your team push up a new commit that removes the undesired/sensitive information, and that you ensure no code breakages in this clean commit prior to using the BFG Repo-Cleaner.
We recommend consulting the full documentation for the BFG Repo-Cleaner here for a full explanation of what's possible through this tool.
Here is an example of how you can use this tool to help remove data from your Bitbucket repository:
Clone down a local copy of the affected repository using the command
git clone --mirror
ℹ️ We recommend backing up a copy of this bare repository prior to executing any changes using the BFG Repo-Cleaner
ℹ️ It's also recommended that you set up an alias
bfg
as an alias forjava -jar bfg.jar
after the bfg.jar file has been downloaded and moved to your directoryUsing the downloaded bfg.jar, here are some example commands you can run against the repository:
1 2 3 4 5
# Remove any files named 'sensitive_passwords.txt' or 'confidential_passwords.txt' from the repository's commit history bfg --delete-files {sensitive,confidential}_passwords.txt # Replace any entries listed in the file 'bobs_credit_cards_and_ssn.txt' with the text ***REMOVED*** wherever they occur in the repository. # Don't worry, we're also unsure why this was pushed to a git repository. bfg --replace-text bobs_credit_cards_and_ssn.txt
The full list of BFG Repo-Cleaner commands can be found in the tool's documentation:
ℹ️ Though not strictly necessary, it's recommended you add the sensitive data to the repository's .gitignore file to ensure that it is not accidentally committed again.
After your team has reviewed the state of the local repository, force push the changes back up to Bitbucket to overwrite the repository's existing commit history.
1
git push origin --force
Any users cloning or forking from this repository should be asked to
git rebase
any branches that contain the old repository history. It is important to rebase and not merge, as merging could result in the sensitive data being re-introduced into the now clean git history of the main repository.Lastly - be sure to force all objects in your local repository to be garbage collected using the commands:
1 2 3
git for-each-ref --format='delete %(refname)' refs/original | git update-ref --stdin git reflog expire --expire=now --all git gc --prune=now
⚠️These commands should NOT be executed directly against the repository on Bitbucket - only your local copy of the repository. Running git gc against Bitbucket's copy of the repository can result in serious data corruption.
If your team has recently merged this unwanted change in a pull request, it is recommended that you contact Atlassian Support for the next steps, as Bitbucket will preserve many git objects on the server related to pull requests. The merged pull requests' diff may still contain sensitive/unwanted information. The steps that need to be taken to remove the data from these pull requests require manual changes to the external database that will vary depending on the situation. These changes can have adverse effects if they are not performed correctly, and it's for this reason that we encourage teams to reach out to support for additional help along these lines.
The BFG Repo-Cleaner is a third-party utility and is therefore outside of the Atlassian Support Offerings. Any issues arising from the usage of this utility will not be supported by Atlassian.
Was this helpful?