Custom website connector robots.txt validation fails when robots.txt URL redirects

Platform Notice: Cloud Only - This article only applies to Atlassian apps on the cloud platform.

Summary

When configuring the Custom website connector to URL with context path (for example, https://example.com/documentation), the validation of robots.txt fails even though a robots.txt file appears to exist on the site.

This occurs when the top-level robots.txt URL (for example, https://example.com/robots.txt) responds with an HTTP redirect (e.g., 301) to a different URL (such as the base URL https://example.com/) instead of returning the actual robots.txt file located at https://example.com/documentation/robots.txt.

The Custom website connector currently only supports a standard, top-level robots.txt served directly at https://<host>/robots.txt. The request to that exact URL must succeed (2xx response with robots.txt content). If it redirects elsewhere, the connector treats this as a failure to retrieve robots.txt and validation fails.

Diagnosis

  1. Configure the Web (site crawler) connector to crawl https://example.com/documentation

  2. Open a terminal and query the top-level robots.txt URL of the target site. For example, for the affected customer:

    curl --verbose https://example.com/robots.txt
  3. Observe the HTTP response headers. In the situation, the response was:

    • HTTP/1.1 301 Moved Permanently (or similar redirect status)

    • Location: https://example.com/

Cause

Instead of returning the contents of robots.txt, the server redirects to the base URL.

Solution

To resolve the issue, the site must serve a standard, top-level robots.txt file. If the context path is used, it should redirect to https://example.com/documentation/robots.txt.

  1. Verify the fix from the command line:

    curl --verbose https://<host>/robots.txt

    Confirm that:

    • There is no 3xx redirect.

    • The response body contains the expected robots.txt directives.

  2. Re-run connector validation:

    • In the Web (site crawler) connector configuration, re-run validation for the same site.

    • The connector should now successfully retrieve and validate robots.txt and proceed with crawling (subject to the rules defined in the file).

  3. If issues persist:

    • Capture the full HTTP exchange (headers and status) from https://<host>/robots.txt.

    • Provide these details to Atlassian Support, so we can confirm whether the behavior still matches this known limitation.

Updated on December 17, 2025

Still need help?

The Atlassian Community is here for you.