Git Repository Indexing is Too Slow when Creating a New Branch or Tag

Platform Notice: Data Center Only - This article only applies to Atlassian products on the Data Center platform.

Note that this KB was created for the Data Center version of the product. Data Center KBs for non-Data-Center-specific features may also work for Server versions of the product, however they have not been tested. Support for Server* products ended on February 15th 2024. If you are running a Server product, you can visit the Atlassian Server end of support announcement to review your migration options.

*Except Fisheye and Crucible

Summary

The time to index a Git repository has significantly decreased since Fisheye/Crucible 3.4.0 when improvements were made to how the Git manifest is index. More information can be found Git manifest.

Problem

Indexing a new branch/tag in GIT is too slow.

Diagnosis

Environment

  • Fisheye/Crucible version prior to 3.4.0

Cause

When Fisheye finds a new branch, it needs to build a manifest of the branch. I.e. for each file in the branch, what is the latest commit that affects that file. This information is used for various Fisheye operations such as the commit graph, EyeQL queries, and the branch activity display pages.

The way Fisheye generates the branch manifest is by asking Git for the current manifest using the git ls-tree command. Unfortunately Git provides the tree in terms of the file's content hash and not the commit hash. Where the content hash of a given path is unique to a particular commit, Fisheye is able to quickly map the content hash to the commit hash and build the manifest. If, however, multiple commits have the same content for the same file path, Fisheye must determine which commit is the appropriate one to record in the manifest. This takes a relatively slow search of the file path's history (the commit ancestry).

Normally the content of a file path across multiple changes is mostly unique to each commit that affects the file and the search is not needed.

Solution

Resolution

Fisheye provides the flag --Xenable-git-content-hash-resolving-heuristic which changes the behavior when there are multiple commits mapped to the same content hash on the same path. In this case Fisheye picks the most recent match. This should be the correct choice for the workflow in the majority of cases.

Restart Fisheye with this Command-line options and monitor the performance when new branches are pushed. This should have a significant impact on processing times. A possible side effect is that it will lead to incorrect parents, meaning, the list of parents could be wrong for any file revision, the wrong revision will be tagged (in case of a tag) and the last modified date and revision in the dir tree will be incorrect.

Updated on April 2, 2025

Still need help?

The Atlassian Community is here for you.