Troubleshooting Confluence performance issues with thread dumps
Platform Notice: Data Center Only - This article only applies to Atlassian apps on the Data Center platform.
Note that this KB was created for the Data Center version of the product. Data Center KBs for non-Data-Center-specific features may also work for Server versions of the product, however they have not been tested. Support for Server* products ended on February 15th 2024. If you are running a Server product, you can visit the Atlassian Server end of support announcement to review your migration options.
*Except Fisheye and Crucible
Summary
Problem
Confluence is behaving slowly, and you need more information as to what part of it is being slow. You may have also noticed that the CPU usage on your application server is very high.
This page provides a way of collecting thread dumps (a breakdown of what every thread is doing for a Java process) and the output of top (shows what each native OS thread is consuming as far as resources are concerned). This breakdown could normally be collected with something like jProfiler or other options, as discussed here. In this example we're using native (free) tools to collect information.
This will only work for *nix systems and needs jstack to be installed (should be by default). For Windows please see Generating thread dumps on windows or Generating a Thread Dump.
Solution
Capturing CPU Diagnostics
Run the following command to find your Confluence instance. If this returns multiple results, then you have more than one Tomcat instance running on your machine. You'll need to identify your Confluence instance from these results manually:
ps aux | grep -i catalina.startup.BootstrapSet the variable
CONF_PIDto the process ID returned. This will be the second field from Step 1:CONF_PID=<your_process_id>Alternative methods for obtaining the PID
You may be able to automatically identify and set the Confluence PID variable by using a command such as:
CONF_PID=`ps aux | grep -i confluence | grep -i java | awk -F '[ ]*' '{print $2}'`After the
CONF_PIDvariable is set, execute the following script with the user running the Confluence application. The script will generate 6 sets of CPU usage info and thread dumps at 10 seconds intervals, running for a total of 60 seconds:Option 1 - JSTACK
for i in $(seq 6); do top -b -H -p $CONF_PID -n 1 > conf_cpu_usage.`date +%s`.txt; jstack -l $CONF_PID > conf_threads.`date +%s`.txt; sleep 10; doneOption 2 - KILL -3
for i in $(seq 6); do top -b -H -p $CONF_PID -n 1 > conf_cpu_usage.`date +%s`.txt; kill -3 $CONF_PID; sleep 10; doneIf this gives you the error
"Unable to open socket file: target process not responding or HotSpot VM not loaded", or if any of the files generated are completely empty, please make sure you're executing this script as the same user that started your Confluence process.Alternative methods for capturing thread dumps
There are a few scripts which will automatically grab the PID, then generate the CPU usage info and thread dumps. However these scripts assume that Confluence is the only Java application on the host. If you have multiple Java processes running, the automatic method will not work, as you must manually the correct process that is running Java for the Confluence application:
Alternative 1:
for i in $(seq 6); do top -b -H -p `ps -ef | grep java | awk 'FNR == 1 {print $2}'` -n 1 > conf_cpu_usage.`date +%s`.txt; jstack -l `ps -ef | grep java | awk 'FNR == 1 {print $2}'` > conf_threads.`date +%s`.txt; sleep 10; doneAlternative 2:
for i in $(seq 6); do top -b -H -p `pgrep -f java` -n 1 > conf_cpu_usage.`date +%s`.txt; jstack -l `pgrep -f java` > conf_threads.`date +%s`.txt; sleep 10; doneAlternative 3:
A set of scripts have been designed to make the generation of thread dumps a little easier and can be found at https://bitbucket.org/atlassianlabs/atlassian-support/.
Reading CPU Diagnostics
Two types of files will be generated by the script:
CPU usage info - this shows how much CPU each Confluence thread is consuming at that snapshot in time
Thread dump - this shows what is thread was doing at that snapshot in time
Looking at both sets of data together can help locate a problematic process:
Look in the resulting CPU usage files to identify which threads are consistently using a lot of CPU time.
Take the PID of the top threads which are using CPU time and convert them to Hexadecimal. Eg: 11159 becomes 0x2b97.
Search for the Hex values in the thread dumps to figure out what these high-CPU threads are doing.
Performance Data Collector
The Performance Data Collector is a server-side, standalone application that exposes a number of REST APIs for collecting performance data. It can be used to collect data, such as thread dumps, disk speed and CPU usage information, to troubleshoot performance problems.
See How to use the Performance Data Collector for more information.
Steps for Atlassian Docker containers
/opt/atlassian/support/thread-dumps.sh can be run via docker exec to easily trigger the collection of thread dumps from the containerized application. For example:
docker exec my_container /opt/atlassian/support/thread-dumps.sh
By default this script will collect 10 thread dumps at 5 second intervals. This can be overridden by passing a custom value for the count and interval, by using c / count and -i} / -interval respectively. For example, to collect 20 thread dumps at 3 second intervals:
docker exec my_container /opt/atlassian/support/thread-dumps.sh --count 20 --interval 3
If you're running the Docker container in a Kubernetes environment, you can execute the command as below:
kubectl exec -it confluence-1 -n confluence -- bash -c "/opt/atlassian/support/thread-dumps.sh --count 20 --interval 3"
Replace -it confluence-1 with the pod name, and -n confluence with the namespace where the Confluence pods are running.
Thread dumps will be written to $APP_HOME/thread_dumps/<date>.
Note: By default this script will also capture output from top run in 'Thread-mode'. This can be disabled by passing n / -no-top
The Troubleshooting section on https://hub.docker.com/r/atlassian/confluence has additional information.
Was this helpful?