How to perform manual garbage collection on a repository in Bitbucket Server
Platform Notice: Data Center Only - This article only applies to Atlassian apps on the Data Center platform.
Note that this KB was created for the Data Center version of the product. Data Center KBs for non-Data-Center-specific features may also work for Server versions of the product, however they have not been tested. Support for Server* products ended on February 15th 2024. If you are running a Server product, you can visit the Atlassian Server end of support announcement to review your migration options.
*Except Fisheye and Crucible
Summary
Bitbucket automatically manages garbage collection (GC) to maintain Git repositories and determines when to run GC operations as needed. In most cases, it is best to let Bitbucket manage GC. However, there may be specific situations when manual GC operations need to be performed, such as when reducing the repository size or removing secrets that have been unintentionally committed to the repositories.
This page covers the steps to manually run GC on a repository hosted in Bitbucket.
Solution
Bitbucket Garbage Collection Implementation
Before going into the procedures to manually perform GC, here’s a brief overview of Bitbucket’s GC implementation:
Bitbucket does not invoke Git’s built-in git gc command.
Instead, it calls lower-level Git operations when the right conditions are met. These operations include:
git pack-refs - moves individual “loose” references, such as branches and tags, into a single, more efficient file called
packed-refsgit repack - moves loose Git objects into efficient, compressed pack files
git prune - removes unreachable Git objects
The commands covered on this page are git repack and git prune, because these are the ones that can significantly impact the size and objects retained in a Git repo.
Conditions for running repack and prune
The GC operations are triggered for a repo when the following conditions are met.
More than 50 pack files are present in the repo location on disk
More than 6700 loose objects (objects that have not been packed) are present in the repo location. This is approximated by checking if there are more than 27 objects in the
<REPO_LOCATION>/objects/17directory (6700 / 256 dirs ~= 27)
When the conditions have been met, the GC utility commands are called.
For repositories that do not have any forks, both
repackandpruneare called.For repositories that have forks, only the
repackcommand is called. Forks rely on Gitalternates, which is a file referencing the parent repo’s objects. Theprunecommand is not called because unreachable objects in the parent repo may still be needed by the downstream forks.The
repackandpruneoperations are subject to a default cooldown period of 16 hours. This means these commands cannot be called more than once within a 16-hour span.
How to perform GC manually?
Now that we have an idea of how Bitbucket performs GC, we can now discuss details on how to perform GC manually.
Check if the repository has forks
The first step is to check whether or not the repo has forks. If a repo has forks, prune cannot be called on the repo to avoid removing unreachable objects that the forks still need.
Check if a repo has forks using the Get repository forks API.
Sample
curl --user <USERNAME>:<PASSWORD> -H "Content-Type: application/json" -X GET <BITBUCKET_BASE_URL>/rest/api/latest/projects/<PROJECT_KEY>/repos/<REPO_SLUG>/forksThe following is the result when no forks are available from the repository:
{"size":0,"limit":25,"isLastPage":true,"values":[],"start":0}Follow the next sections, based on the results of the check for the existence of forks.
Repositories that have forks
In production, the following steps can require some time. For this reason, it is recommended to check the potential gain on a copy of the repository first.
It’s also advised to keep track of the time required to perform the full sequence of steps as the user who runs the Bitbucket process.
As noted earlier, only the repack command is used for repos that have forks.
Run the following commands as the user that runs the Bitbucket process
cd <repository path in the Bitbucket home directory> cp -pr * /some/tmp/location cd /some/tmp/location du -h git fsck --no-dangling # repack objects, keep unreachable objects git repack -adfln --keep-unreachable --depth=20 --window=200 du -hIf the gain is significant, plan for the required downtime and proceed with the next steps.
Another mechanism to check the gain is by running git count-objects before and after running
repack:git count-objects -v
Perform repack on the repository itself
If there is a significant gain, the steps can then be performed in the production instance.
Generate a backup of Bitbucket - see: Data recovery and backups for reference
Stop Bitbucket
Run the following commands as the user that runs the Bitbucket process:
cd <repository path in the Bitbucket home directory> du -h git fsck --no-dangling # repack objects, keep unreachable objects git repack -adfln --keep-unreachable --depth=20 --window=200 du -hRestart Bitbucket Server
Repositories that do not have forks
In production, the following steps can require some time. For this reason, it is recommended to check the potential gain on a copy of the repository first.
It’s also advised to keep track of the time required to perform the full sequence of steps as the user who runs the Bitbucket process.
For repos that do not have forks, the repack and prune commands can be used:
Run the following commands as the user that runs the Bitbucket process
cd <repository path in the Bitbucket home directory> cp -pr * /some/tmp/location cd /some/tmp/location du -h git fsck --no-dangling # repack and unpack unreachable objects git repack -Adfln --depth=10 --window=200 --unpack-unreachable=72.hours.ago # prune unreachable objects git prune --expire=72.hours.ago du -hIf the gain is significant, plan for the required downtime and proceed with the next steps.
Another mechanism to check the gain is by running git count-objects before and after running
repackandprune:git count-objects -vNote that only objects older than 72 hours (3 days) are removed by
prune
Perform repack and prune on the repository itself
If there is a significant gain, the steps can then be performed in the production instance.
Generate a backup of Bitbucket - see Data recovery and backups for reference
Stop Bitbucket
Run the following commands as the user that runs the Bitbucket process:
du -h git fsck --no-dangling # repack and unpack unreachable objects git repack -Adfln --depth=10 --window=200 --unpack-unreachable=72.hours.ago # prune unreachable objects git prune --expire=72.hours.ago du -hRestart Bitbucket Server
Note
If you are not able to stop the instance before you run the repack in production, do the following:
touch app-info/gc.log.lockthen run repack
then
rm app-info/gc.log.lock
This will ensure Bitbucket Server does not attempt a repack during the repack.
Was this helpful?