Anonymize Jira support zip data
Platform Notice: Data Center Only - This article only applies to Atlassian apps on the Data Center platform.
Note that this KB was created for the Data Center version of the product. Data Center KBs for non-Data-Center-specific features may also work for Server versions of the product, however they have not been tested. Support for Server* products ended on February 15th 2024. If you are running a Server product, you can visit the Atlassian Server end of support announcement to review your migration options.
*Except Fisheye and Crucible
Summary
This KB covers how to anonymize the data collected when creating support zips to adhere to any data restrictions you may have.
Remove and verify sensitive data
The example script below is intended to remove all email and IP addresses, and it removes the usernames from known, common patterns in the logs. However, before sharing logs, we still recommend checking for sensitive data that the script might not have removed, even though we'd expect it to remove most of it.
Solution
Here are the steps to use the script:
Copy the example code below and save it to a file as SupportZipAnonymizer.py.
Create or retrieve an existing support zip.
From the command line, execute the script SupportZipAnonymizer.py. For example:
python3 SupportZipAnonymizer.py -p Jira_jsm_node2_1234_support_2025-04-29-23-05-18.zip.You can either reference a support zip or a directory. When you reference a directory, it's assumed all files are extracted, and the anonymized/sanitized directory structure will be maintained.
Verify the anonymized files in the Jira_jsm_node2_1234_support_2025-04-29-23-05-18_anonymized.zip that was created.
If there are no changes needed, upload the Jira_jsm_node2_1234_support_2025-04-29-23-05-18_anonymized.zip file to your Atlassian support ticket.
The example script provided is intended for demonstration purposes only and is not officially supported by Atlassian. If you choose to use or modify this script, please do so at your own risk, as Atlassian Support can’t assist with troubleshooting, maintenance, or customization of example scripts. For official support, please refer to Atlassian’s supported products and services.
Script code
View script in public repository.
SupportZipAnonymizer.py
import os
import re
import argparse
import shutil
import zipfile
parser = argparse.ArgumentParser(
prog='SupportZipAnonymizer.py',
description='A script made to sanitize sentitive information from support zip logs',
epilog='Atlassian cannot guarantee complete accuracy in the log sanitization as there can be new patterns in different use cases.\nThis script is not officially supported and is provided "AS IS".')
parser.add_argument('-v', '--verbose', type=bool, help='Include flag for verbose output', required=False, action=argparse.BooleanOptionalAction, default=False)
parser.add_argument('-p', '--path', type=str, help='Extracted support zip directory or zip file path', required=True)
args = parser.parse_args()
########### Replacement rule class ###########
class Rule:
def __init__(self, rule_name, filename_pattern, find_pattern, replace_pattern):
self.rule_name = rule_name
self.filename_pattern = re.compile(filename_pattern)
self.find_pattern = re.compile(find_pattern)
self.replace_pattern = replace_pattern
########### Replacement rule definitions ###########
replacement_rules = []
# IP address
replacement_rules.append(Rule(rule_name="IP address",
filename_pattern=r'.*',
find_pattern=r'\b(?:\d{1,3}\.){3}\d{1,3}\b|(?:[A-Fa-f0-9]{1,4}:){1,6}:[A-Fa-f0-9]{1,4}|(?:[A-Fa-f0-9]{1,4}:){7}[A-Fa-f0-9]{1,4}\b',
replace_pattern="IP_sanitized_by_script"))
# Email
replacement_rules.append(Rule(rule_name="Email address",
filename_pattern=r'.*',
find_pattern=r'[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,}',
replace_pattern="Email_sanitized_by_script"))
# DB config URL
replacement_rules.append(Rule(rule_name="dbconfig.xml URL",
filename_pattern='dbconfig.xml',
find_pattern=r'(<url>).*?(</url>)',
replace_pattern=r'\1DB_URL_Sanitized_by_script\2'))
# Security log
replacement_rules.append(Rule(rule_name="Security log",
filename_pattern=r'atlassian-jira-security.log.*',
find_pattern=r"(user|for) '\S+'",
replace_pattern=r"\1 'Username_sanitized_by_script'"))
# Application logs
replacement_rules.append(Rule(rule_name="Application log 1",
filename_pattern=r'(atlassian-jira.log.*)|(atlassian-greenhopper.log.*)|(atlassian-jira-slow-queries.log.*)|(atlassian-remoteapps-security.log.*)|(jira-diagnostics.log.*)|(atlassian-jira-stats.log.*)|(atlassian-servicedesk.log.*)|(insight_import.log.*)|(atlassian-jira-migration.log.*)|(atlassian-jira-stats.log.*)',
find_pattern=r'(\buser[:=])\s*[A-Za-z0-9@._-]*',
replace_pattern=r'\1 Username_sanitized_by_script '))
replacement_rules.append(Rule(rule_name="Application log 2",
filename_pattern=r'(atlassian-jira.log.*)|(atlassian-greenhopper.log.*)|(atlassian-jira-slow-queries.log.*)|(atlassian-remoteapps-security.log.*)|(jira-diagnostics.log.*)|(atlassian-jira-stats.log.*)|(atlassian-servicedesk.log.*)|(insight_import.log.*)|(atlassian-jira-migration.log.*)|(atlassian-jira-stats.log.*)',
find_pattern=r'(\busername[:=])\s*[A-Za-z0-9@._-]*',
replace_pattern=r'\1 Username_sanitized_by_script '))
replacement_rules.append(Rule(rule_name="Application log 3",
filename_pattern=r'(atlassian-jira.log.*)|(atlassian-greenhopper.log.*)|(atlassian-jira-slow-queries.log.*)|(atlassian-remoteapps-security.log.*)|(jira-diagnostics.log.*)|(atlassian-jira-stats.log.*)|(atlassian-servicedesk.log.*)|(insight_import.log.*)|(atlassian-jira-migration.log.*)|(atlassian-jira-stats.log.*)',
find_pattern=r'(\bFATAL)\s+\S+\s',
replace_pattern=r'\1 Username_sanitized_by_script '))
replacement_rules.append(Rule(rule_name="Application log 4",
filename_pattern=r'(atlassian-jira.log.*)|(atlassian-greenhopper.log.*)|(atlassian-jira-slow-queries.log.*)|(atlassian-remoteapps-security.log.*)|(jira-diagnostics.log.*)|(atlassian-jira-stats.log.*)|(atlassian-servicedesk.log.*)|(insight_import.log.*)|(atlassian-jira-migration.log.*)|(atlassian-jira-stats.log.*)',
find_pattern=r'(\bERROR)\s+\S+\s',
replace_pattern=r'\1 Username_sanitized_by_script '))
replacement_rules.append(Rule(rule_name="Application log 5",
filename_pattern=r'(atlassian-jira.log.*)|(atlassian-greenhopper.log.*)|(atlassian-jira-slow-queries.log.*)|(atlassian-remoteapps-security.log.*)|(jira-diagnostics.log.*)|(atlassian-jira-stats.log.*)|(atlassian-servicedesk.log.*)|(insight_import.log.*)|(atlassian-jira-migration.log.*)|(atlassian-jira-stats.log.*)',
find_pattern=r'(\bWARN)\s+\S+\s',
replace_pattern=r'\1 Username_sanitized_by_script '))
replacement_rules.append(Rule(rule_name="Application log 6",
filename_pattern=r'(atlassian-jira.log.*)|(atlassian-greenhopper.log.*)|(atlassian-jira-slow-queries.log.*)|(atlassian-remoteapps-security.log.*)|(jira-diagnostics.log.*)|(atlassian-jira-stats.log.*)|(atlassian-servicedesk.log.*)|(insight_import.log.*)|(atlassian-jira-migration.log.*)|(atlassian-jira-stats.log.*)',
find_pattern=r'(\bINFO)\s+\S+\s',
replace_pattern=r'\1 Username_sanitized_by_script '))
replacement_rules.append(Rule(rule_name="Application log 7",
filename_pattern=r'(atlassian-jira.log.*)|(atlassian-greenhopper.log.*)|(atlassian-jira-slow-queries.log.*)|(atlassian-remoteapps-security.log.*)|(jira-diagnostics.log.*)|(atlassian-jira-stats.log.*)|(atlassian-servicedesk.log.*)|(insight_import.log.*)|(atlassian-jira-migration.log.*)|(atlassian-jira-stats.log.*)',
find_pattern=r'(\bDEBUG)\s+\S+\s',
replace_pattern=r'\1 Username_sanitized_by_script '))
replacement_rules.append(Rule(rule_name="Application log 8",
filename_pattern=r'(atlassian-jira.log.*)|(atlassian-greenhopper.log.*)|(atlassian-jira-slow-queries.log.*)|(atlassian-remoteapps-security.log.*)|(jira-diagnostics.log.*)|(atlassian-jira-stats.log.*)|(atlassian-servicedesk.log.*)|(insight_import.log.*)|(atlassian-jira-migration.log.*)|(atlassian-jira-stats.log.*)',
find_pattern=r'(\bTRACE)\s+\S+\s',
replace_pattern=r'\1 Username_sanitized_by_script '))
# Application logs with no level or thread dumps
replacement_rules.append(Rule(rule_name="Application log with no level 1",
filename_pattern=r'(atlassian-jira-xsrf.log.*)|(atlassian-jira-querydsl-sql.log.*)|(atlassian-jira-sql.log.*)|(\d{4}(_\d{2}){5}\.txt)',
find_pattern=r'(\buser[:=])\s+\S+\s+\S+\s',
replace_pattern=r'\1 Username_sanitized_by_script Username_sanitized_by_script '))
replacement_rules.append(Rule(rule_name="Application log with no level 2",
filename_pattern=r'(atlassian-jira-xsrf.log.*)|(atlassian-jira-querydsl-sql.log.*)|(atlassian-jira-sql.log.*)|(\d{4}(_\d{2}){5}\.txt)',
find_pattern=r'\s+\S+\s+(\d+x\d+x\d+\s+)',
replace_pattern=r' Username_sanitized_by_script \1'))
# LOGIN-STATS topUsers
replacement_rules.append(Rule(rule_name="LOGIN-STATS topUsers",
filename_pattern=r'(atlassian-jira.log.*)|(atlassian-jira-stats.log.*)',
find_pattern=r'"topUsers":[{][^}]+[}]',
replace_pattern='"topUsers":{"topUsers_Sanitized_by_script":1}')) #trying not to break the expected JSON syntax
# Access logs and dump
replacement_rules.append(Rule(rule_name="http Access logs and dump username",
filename_pattern=r'(atlassian-jira-http-access.log.*)|(atlassian-jira-http-dump.log.*)|(access_log\..*)',
find_pattern=r'(^|\n)(\S+\s\S+\s)\S*\s',
replace_pattern=r'\1\2Username_sanitized_by_script '))
# HTTP dump token
replacement_rules.append(Rule(rule_name="http Access logs and dump username",
filename_pattern=r'atlassian-jira-http-dump.log.*',
find_pattern=r'token=[^;]+;',
replace_pattern=r'token=Token_sanitized_by_script '))
########### Functions ###########
# Adds the _anonymized string to the path
def dest_path(fullpath):
return re.sub(source,destination_path,fullpath)
# Base sanitize file function
def sanitize_file(source_path, destination_path, sub_list):
if verbose:
print(f"Sanitizing {source_path} file and copying to {destination_path}")
rules=[]
for sub in sub_list:
rules.append(sub.rule_name)
print(f'Applying rules {rules}')
with open(source_path, 'r') as source_file:
content = source_file.read()
for sub in sub_list:
#print(f'pattern = {sub['pattern']} - replace = {sub['replace']}')
content = re.sub(sub.find_pattern, sub.replace_pattern, content) #, re.MULTILINE)
with open(destination_path, 'w') as dest_file:
dest_file.write(content)
def zipdir(path, ziph):
print(f"Creating zip file {ziph.filename}")
for root, dirs, files in os.walk(path):
for file in files:
ziph.write(os.path.join(root, file),
os.path.relpath(os.path.join(root, file),
os.path.join(path, '..')))
###########################################
########### Actual script start ###########
###########################################
# Remove trailing slash if present
source = re.sub(r'[/]$',r'',args.path)
verbose = args.verbose
# Extract if zip
if re.match(r'.*\.zip$',source):
print("Identified path as zip file. Extracting...")
originally_zip_file=True
extraction_directory = re.sub(r'\.zip$',r'',source)
os.makedirs(extraction_directory, exist_ok=True)
# Open the ZIP file and extract its contents
with zipfile.ZipFile(source, 'r') as zip_ref:
zip_ref.extractall(extraction_directory)
print(f'Extracted {source} into {extraction_directory} directory\n')
# Update the directory we'll use as reference to get the files
source = extraction_directory
else:
originally_zip_file=False
# Get destination folder name
destination_path = source + "_anonymized"
print(f'Creating sanitized copy in {destination_path}')
print('=============================================================')
# Create destination directory
os.makedirs(destination_path, exist_ok=True)
########### Main loop ###########
for root, dirs, files in os.walk(source):
# Create directories
for dir in dirs:
os.makedirs(dest_path(os.path.join(root,dir)), exist_ok=True)
for file in files:
file_path = os.path.join(root,file)
destination_file = dest_path(os.path.join(root,file))
# Skip files that we don't want
if re.match(r'.*\.((jfr)|(DS_Store)|(zip)|(png))',file):
if verbose:
print(f"Skipping {file} due to extension")
# Handle the files to be cleaned
else:
rules_applied=[]
# Check which rules apply and call the sanitization function
for rule in replacement_rules:
if re.match(rule.filename_pattern,file):
rules_applied.append(rule)
sanitize_file(file_path,destination_file,rules_applied)
# Create zip file
with zipfile.ZipFile((destination_path+'.zip'), 'w', zipfile.ZIP_DEFLATED) as zipf:
zipdir(destination_path, zipf)
#If we extracted the zip, clean the original extracted directories
if originally_zip_file:
print("Cleaning the extracted log files")
shutil.rmtree(source)Script documentation
Here is the script usage syntax and explanation of the parameters:
usage: SupportZipAnonymizer.py [-h] [-v | --verbose | --no-verbose] -p PATH
A script made to sanitize sentitive information from support zip logs
options:
-h, --help show this help message and exit
-v, --verbose, --no-verbose
Include flag for verbose output
-p, --path PATH Extracted support zip directory or zip file pathCustomizing the script
We can add additional replacement rules in the script to extend it.
To include additional replacement patterns:
Locate the ########### Replacement rule definitions ########### section in the script.
Add your own rule, following the pattern:
replacement_rules.append(Rule(rule_name="My rule name", filename_pattern=r'my-desired-log.log.*', find_pattern=r'string_to_be_searched', replace_pattern=r'string_to_be_replaced'))rule_name - Just a name for organization and verbose logging.
filename_pattern - a regular expression that matches the desired log file name.
find_pattern - a regular expression that matches the data to be replaced.
replace_pattern - a regular expression with the data that will replace (it can be a regular string too).
Reference: re — Regular expression operations
Was this helpful?