How to identify large files in a commit

Platform Notice: Cloud Only - This article only applies to Atlassian products on the cloud platform.

Summary

When a large commit is added to a repository it can cause problems while loading commits or pull requests containing that large commit. Therefore, it' will need to be located for reviewing, auditing, or deleteing.

Environment

Any repository with large commits requiring review, or experiencing issues loading pull requests or commits.

Diagnosis

The git show command can be used to identify the contents of the commit:

1 git show --shortstat <Commit SHA>

"Here's an example of what the command's output looks like.":

1 2 3 4 5 6 $ git show --shortstat b53b6ad36b550e362224ac5a5d53220a2d2f2281 commit b53b6ad36b550e362224ac5a5d53220a2d2f2281 Author: Some Name <some@email.com> Date: Thu Dec 22 16:47:41 2022 +0200 Commit message 17731 files changed, 2049444 insertions(+), 1 deletion(-)

In the example above, the commit contains a large number of file changes and insertions, which may cause the page to time out.

However, there are cases where the commit is not necessarily large, yet we still encounter errors when trying to load it on the UI. For example:

1 2 3 4 5 6 $ git show --shortstat 11dfc6b47bcd624efce4585389772ce4ea69f78d commit 11dfc6b47bcd624efce4585389772ce4ea69f78d Author: Some Name <some@email.com> Date: Thu Dec 22 17:37:26 2022 +0530 Some commit here 122 files changed, 469 insertions(+), 458 deletions(-)

This is where we need to investigate the contents of the commit itself, as the likely reason is a specific file being too big.

Cause

When the commit is too large (either due to many small changes or one big one), it can result in an error while loading.

Solution

To identify if a specific file is too big within a commit we can use the following command:

1 git ls-tree -r -l <COMMIT HASH> $(git diff-tree --no-commit-id --name-only -r <COMMIT HASH>) | sort -r -n -k 4

ℹ️ If you'd like to filter we can use the option | tail -n 10 to show the top 10 files.

This will output the list of files in the commit sorted by size:

1 2 3 4 5 6 7 8 9 10 11 $ git ls-tree -r -l 11dfc6b47bcd624efce4585389772ce4ea69f78d $(git diff-tree --no-commit-id --name-only -r 11dfc6b47bcd624efce4585389772ce4ea69f78d) | sort -r -n -k 4 | tail -n 10 100644 blob e80f5d71185eaf0dda42d46cea874cce94b9e3c7 239731256 folder/file_1 100644 blob 0f0ba2a3c6adacbdfdab0ce2beca272539abaac1 70786560 folder/file_2 100644 blob e5d437dd38ea27785eb5c910b0a6ad77777f6116 53519872 folder/file_3 100644 blob 871283433e85b978398e0e08d5ecd386037c092d 47230200 folder/file_4 100644 blob 1950cfde66c12237abc504c395ed954086f4e778 25913104 folder/file_5 100644 blob e022d263bd55c700e7ce3a3ff6c0b872a4eae3db 17956864 folder/file_6 100644 blob b6e32b3265f02518f14c220157d30c9737981457 17543168 folder/file_7 100644 blob 30c6aa1e2c3552fdde24e1821421b75b1fe7b1ff 12045336 folder/file_8 100644 blob 211b70a927f11c688b461ba4693b226ef38eb7ce 11960698 folder/file_9 100644 blob 0e497a089318ad4bdde2cb2aa900a8ca7c4cc1f8 10426816 folder/file_10

The fourth column represents file size in bytes, so here we can see that file_1 is 239731256 bytes. Which is around 240 MB when converted into MB. Therefore this file is likely contributing to the large commit size.

It is possible to prevent scenarios where large files and commits are added mistakenly to a Git repository by implementing measures such as a Git pre-commit hook or standardizing Git hooks across a repository. Doing so can help manage the size of the repository and prevent large commits from being added by mistake.

Updated on April 2, 2025

Still need help?

The Atlassian Community is here for you.