How to Prevent Google Web Crawlers from Indexing Bitbucket

Platform Notice: Data Center Only - This article only applies to Atlassian products on the Data Center platform.

Note that this KB was created for the Data Center version of the product. Data Center KBs for non-Data-Center-specific features may also work for Server versions of the product, however they have not been tested. Support for Server* products ended on February 15th 2024. If you are running a Server product, you can visit the Atlassian Server end of support announcement to review your migration options.

*Except Fisheye and Crucible

Summary

Publically available Bitbucket site gets displayed in search results when we do browser searches (that is, Google, Bing, etc.) with search strings like "<company_name> bitbucket" or "bitbucket <company_name>". Web crawlers index the Bitbucket site and add it to the search index.

Sometimes, It is undesirable to get Bitbucket site as part of the search and expose details (such as Bitbucket version, etc.), and this article provides solutions to this problem.

Environment

  • Applicable for all Bitbucket Data Center versions.

  • Publically available Bitbucket site.

Cause

By default, the built-in robots.txt response is empty, which allows the instance to be crawled. robots.txt may be accessed anonymously as that is how it can direct web crawler based on preferred configurations.

Solution

Bitbucket server 5.11 introduces the ability to configure robots.txt.

Administrators can create and place their robots.txt in $BITBUCKET_HOME/shared. Adding the file to the shared home ensures it is preserved across upgrades, and all cluster nodes for Data Center installations return the same response.

For reference, robots.txt(content) :

1 2 User-agent: * Disallow: /

The “User-agent: *” part means that, it applies to all robots. The “Disallow: /” part means that it applies to your entire website.

This robots.txt file will tell all robots and web crawlers that they can't access or crawl your site.

Configurations mentioned in robot.txt will determine Allow and Disallow conditions for various User-agents (for example, Googlebot, Bingbot, Yandex Bot, Apple Bot, etc.)

There are multiple ways in which Allow, Disallow and User-agent can be configured to achieve different outcomes based on user's need. How to configure robot.txt - can help to dive deep into options.

Please note: There is no ability to configure and serve the robots.txt file for mirrors.

BSERV-14273 - Provide Mirrors the ability to serve the robots.txt file

Updated on April 24, 2025

Still need help?

The Atlassian Community is here for you.