Repository size remains the same after deleting large files and running garbage collection (GC) on the remote
Platform Notice: Cloud Only - This article only applies to Atlassian products on the cloud platform.
Summary
You may encounter issues where the repository size remains the same or does not reduce even after deleting large files and running garbage collection on the remote repository.
Diagnosis
This issue can be caused by large file objects that are preserved in Git references on the remote repository. These references are preserved to improve the performance of complex diffs.
Solution
TheBitbucket Cloud Support Team can help confirm the list of pull requests that need to be deleted in order to clear the repository's storage space. You can use the Bitbucket Cloud REST API to back up your pull requests based on the list provided by the support team before you approve the pull requests for deletion.
Below is a sample Python script that can be used to export pull request details, including author, state, ID, created date, source branch, destination branch, and description, into a CSV file. You can also add an API query to filter the pull request details created after a given date. For example, "%%3E+2022-07-01T00%%3A00%%3A00-07%%3A00" filters all the pull requests created after July 1, 2022. For more details, please refer to the API querying documentation.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
import requests
from requests.auth import HTTPBasicAuth
##Login
username = '<please-key-in-your-bitbucket-username-here>'
password = '<please-key-in-your-app-password>'
repository = '<workspace-id/repo-name>'
# Request the next page URL
next_page_url = 'https://api.bitbucket.org/2.0/repositories/%s/pullrequests?fields=next,values.id,values.created_on,values.state,values.author,values.source.branch.name,values.destination.branch.name,values.source.commit.hash,values.destination.commit.hash,values.description&&q=created_on+%%3E+2022-07-01T00%%3A00%%3A00-07%%3A00&&pagelen=20' % repository
f = open('pr_stats.csv','a')
print("PR Author"+","+"PR Status"+","+"PR Number"+","+"PR created date"+","+"PR Source Branch"+","+"PR Destination Branch"+","+"PR Source Branch Commit"+","+"PR Destination Branch Commit"+","+"PR Description", file=f)
# Keep fetching pages while there's a page to fetch
while next_page_url is not None:
response = requests.get(next_page_url, auth=HTTPBasicAuth(username, password))
page_json = response.json()
# Parse repositories from the JSON
for repo in page_json['values']:
author=repo['author']['display_name']
state=repo['state']
PR_ID=str(repo['id'])
created_date=str(repo['created_on'])
PR_SourceBranch=repo['source']['branch']['name']
PR_DestinationBranch=repo['destination']['branch']['name']
PR_SourceBranch_Commit=repo['source']['commit']['hash']
PR_DestinationBranch_Commit=repo['destination']['commit']['hash']
PR_Description=repo['description']
print(author+","+state+","+PR_ID+","+created_date+","+PR_SourceBranch+","+PR_DestinationBranch+","+PR_SourceBranch_Commit+","+PR_DestinationBranch_Commit+","+PR_Description, file=f)
next_page_url = page_json.get('next', None)
Sample 'pr_stats.csv' output:
PR Author | PR Status | PR Number | PR created date | PR Source Branch | PR Destination Branch | PR Source Branch Commit | PR Destination Branch Commit | PR Description |
XYX | OPEN | 968 | 2022-10-23T10:03:35.478284+00:00 | test | master | ff5evv8a5i9f | 019auieha046 | updated test changes |
ABC | OPEN | 967 | 2022-10-22T09:21:52.577095+00:00 | develop | release | e2e3hhade042 | 57009debfca0 | updated the release changes |
Once you have backed up the details of your pull requests, please inform the Bitbucket Cloud Support team to delete the pull requests. Once this is done, the large files associated with the Git references will be removed from the repository, which should reduce the repository size.
Was this helpful?