How to read the Tomcat Stuck Threads log or StuckThreadDetectionValve in Jira

Platform Notice: Data Center Only - This article only applies to Atlassian products on the Data Center platform.

Note that this KB was created for the Data Center version of the product. Data Center KBs for non-Data-Center-specific features may also work for Server versions of the product, however they have not been tested. Support for Server* products ended on February 15th 2024. If you are running a Server product, you can visit the Atlassian Server end of support announcement to review your migration options.

*Except Fisheye and Crucible

Summary

As a best practice, Atlassian advises configuring the Tomcat in Jira to log Stuck Threads, or Threads that have been running for over a certain period of time.

This article offers one way to read these logs from a Linux-like terminal (using grep, awk, sort and column).

On Stuck Thread terminology

On the Tomcat/Catalina context, a "Stuck Thread" is just a Thread that hasn't completed in the configured duration threshold, it doesn't mean it's actually "stuck" like the "blocked" or "waiting" states we see when analyzing Thread dumps (see Troubleshooting Jira performance with Thread dumps) — the Thread can simply be running unblocked for a long time.

This log may also be helpful for a memory shortage analysis (Full GC cycles or OutOfMemoryError) as an additional tool to the Heap dump analysis (more below).

Solution

1. Configuring the Valve

First, make sure you have the StuckThreadDetectionValve configured in your Jira's server.xml:

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 <Server ... <Service name="Catalina"> ... <Connector ... /> <Connector ... /> ... <Engine name="Catalina" ...> <Host ...> ... </Host> ... <Valve className="org.apache.catalina.valves.StuckThreadDetectionValve" threshold="120" /> </Engine> </Service> </Server>

Place the "StuckThreadDetectionValve" line anywhere inside the Engine tag. Once Jira's restarted again with these lines, it'll print on the catalina.out and catalina.yyyy-mm-dd.log whenever a Thread's been running for longer than the configured threshold — 120s in the example.

We advise keeping the threshold on 120 (2 minutes).

2. Parsing the logs

The Stuck Thread logs will show in the catalina logs like this:

1 2 3 13-Feb-2024 14:03:16.914 WARNING [Catalina-utility-1] org.apache.catalina.valves.StuckThreadDetectionValve.notifyStuckThreadDetected Thread [http-nio-8080-exec-98 url: /rest/jpo/1.0/labels; user: USER001] id=[81984]) has been active for [128,447] milliseconds (since [2/13/24 2:01 PM]) to serve the same request for [http://Jira-Base-URL/rest/jpo/1.0/labels?xoauth_requestor_id=USER001] and may be stuck (configured threshold for this StuckThreadDetectionValve is [120] seconds). There is/are [1] thread(s) in total that are monitored by this Valve and may be stuck. 13-Feb-2024 14:03:26.936 WARNING [Catalina-utility-2] org.apache.catalina.valves.StuckThreadDetectionValve.notifyStuckThreadCompleted Thread [http-nio-8080-exec-98] (id=[81984]) was previously reported to be stuck but has completed. It was active for approximately [134,551] milliseconds.

For each "notifyStuckThreadDetected" we'd expect a corresponding "notifyStuckThreadCompleted" when the Thread's completed.

This command will parse the catalina.yyyy-mm-dd.log file and print a table of each notified Stuck Thread:

1 2 grep -a -h -E 'StuckThreadDetectionValve\.notifyStuckThread(Detected|Completed)' tomcat-logs/catalina.yyyy-mm-dd.log | awk 'BEGIN {print "Notify_time", "Completed_time", "Duration", "Id", "URL"}; ($5 ~ /notifyStuckThreadDetected/) {match($0, / \(id=\[[0-9]+\]\) /); thread_id=substr($0, RSTART+6, RLENGTH-9); match($0, /request for \[http.*\] and may be stuck/); thread_name[thread_id]=substr($0, RSTART+13, RLENGTH-31); start_time[thread_id] = $2; end_time[thread_id] = "-"; thread_active[thread_id] = "-"}; ($5 ~ /notifyStuckThreadCompleted/) {match($0, / \(id=\[[0-9]+\]\) /); thread_id=substr($0, RSTART+6, RLENGTH-9); end_time[thread_id] = $2; match($0, /approximately \[[0-9,\.]+\] milliseconds/); thread_active[thread_id] = substr($0, RSTART+15, RLENGTH-29); output[output_id++]=start_time[thread_id]" "end_time[thread_id]" "thread_active[thread_id]" "thread_id" "substr(thread_name[thread_id], 1, 1000);}; END {for (t in output) {print output[t]}; for (t in thread_name) {if (end_time[t] == "-") {print start_time[t], end_time[t], thread_active[t], t, substr(thread_name[t], 1, 1000)}}}' | sort -n | column -t

Replace catalina.yyyy-mm-dd.log by the respective file (or catalina.out) and path if needed. These logs are located by default in Jira's installation folder: /<Jira-install-path>/logs/

Sample output:

1 2 3 4 5 6 7 Notify_time Completed_time Duration Id URL 11:48:50.962 11:49:11.022 141,390 4341 http://Jira-Base-URL/rest/jpo/1.0/labels 11:50:41.269 11:52:51.722 253,640 13378 http://Jira-Base-URL/rest/rest/ical/1.0/ical/config/query/options?_=1787955312699&xoauth_req 11:53:31.821 11:54:11.966 160,931 47210 http://Jira-Base-URL/rest/rest/api/1.0/labels/suggest?customFieldId=12800&query=&_=170792111 11:54:22.000 - - 13352 http://Jira-Base-URL/rest/rest/issueNav/1/issueTable 11:54:32.030 11:55:02.139 142,232 1702 http://Jira-Base-URL/rest/rest/jpo/1.0/labels 11:57:02.576 11:57:32.668 158,078 40407 http://Jira-Base-URL/rest/rest/jpo/1.0/labels

Notice how the "issueTable" request was notified stuck at 11:54 but didn't complete until the time the log was parsed.

  • The "Notify_time" is the time the Thread was reported to be stuck, meaning it must've started at least 2 minutes before that time (120s).

  • The "Duration" is in milliseconds.

  • The "Id" is reused within the Thread pool, so we'll see the same Ids appearing multiple times, though not at the same time. The request can be the same, too (though it's purely a coincidence)

3. Caveats

This log parsing may not work well on logs that contain Jira restarts — as the Thread Ids are re-used, the command may not report the previous (incomplete) start time but replace by the latest start or start-complete times. For example:

1 2 3 4 5 6 7 14:03:16.914 http-nio-8080-exec-10 Reported stuck (notifyStuckThreadDetected) ... JIRA RESTARTED ... 16:00:47.636 http-nio-8080-exec-10 Reported stuck (notifyStuckThreadDetected) ... 16:05:53.012 http-nio-8080-exec-10 Completed (notifyStuckThreadCompleted)

The parsing command may only report the last execution of http-nio-8080-exec-10 (from 16:00:47 to 16:05:53), so if you had a restart, it's best to split the log files so there's no restart in them.

4. Memory shortage analysis

The Heap dump is the ultimate (most assertive) evidence we have for troubleshooting Heap shortages (Full GC cycles or OutOfMemoryError - OOME), but this grep may be helpful if we suspect the shortage to be triggered by user events through the Browser (Tomcat Threads) — in this case, the potential culprit Threads will show first as stuck before the Full GC cycles or OOM.

Be mindful that some browser requests may spin out background (task) Threads in Jira, like Bulk Issue edit, Reindex, Clone Issue, Integrity Checker (on recent versions), etc. These, although triggered by the user through the Browser, won't show up in the logs and would render this analysis useless without a Heap dump.

These logs are only useful if we couldn't get a Heap dump at the time of the event (or it's corrupted or we can't parse it) and we suspect the shortage is caused by Tomcat Threads.

If we're not sure it's Tomcat Threads, analyzing the logs from several outages may draw a pattern — like the same types of requests going stuck first right before the outage events.

It's important to truncate the catalina.yyyy-mm-dd.log file so it ends with the restart of Jira. If the log has a restart, the command may not report the times from Threads before the restart.

In the case of Full GC cycles, the output would look like this:

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 Notify_time Completed_time Duration Id URL 05:22:01.966 05:23:02.114 171,443 214859 https://Jira-Base-URL/rest/api/1.0/labels/suggest?customFieldId=12804&query=&_=9999999999999 07:28:48.621 07:29:18.690 141,694 208671 https://Jira-Base-URL/rest/jpo/1.0/labels 07:56:42.610 07:57:22.747 152,850 1895 https://Jira-Base-URL/rest/jpo/1.0/labels 08:12:44.802 08:13:44.936 177,529 211032 https://Jira-Base-URL/rest/jpo/1.0/labels 08:38:28.963 08:38:38.983 128,473 4870 https://Jira-Base-URL/rest/jpo/1.0/labels 08:38:59.032 08:39:29.113 156,727 4955 https://Jira-Base-URL/rest/jpo/1.0/labels 08:50:01.357 - - 1894 https://Jira-Base-URL/rest/jpo/1.0/plan/setup/issues/count 08:50:11.674 08:51:20.830 193,692 208666 https://Jira-Base-URL/rest/api/2/jql/autocompletedata/suggestions?fieldName=labels&fieldValue=&_=9999999999999 08:50:11.680 08:51:20.833 191,515 1905 https://Jira-Base-URL/rest/api/1.0/labels/suggest?query=&_=9999999999999 08:50:21.910 - - 214989 https://Jira-Base-URL/rest/jpo/1.0/plan/setup/issues/count 08:51:10.788 - - 198615 https://Jira-Base-URL/rest/jpo/1.0/plan/setup/issues/count 08:51:49.298 - - 1899 https://Jira-Base-URL/rest/jpo/1.0/plan/setup/issues/count 08:51:49.659 - - 208581 https://Jira-Base-URL/rest/jpo/1.0/plan/setup/issues/count 08:51:59.765 - - 212270 https://Jira-Base-URL/rest/jpo/1.0/plan/setup/issues/count 08:51:59.776 - - 214855 https://Jira-Base-URL/rest/jpo/1.0/plan/setup/issues/count 08:53:13.826 - - 214864 https://Jira-Base-URL/rest/jpo/1.0/plan/setup/issues/count 08:53:13.834 - - 214981 https://Jira-Base-URL/rest/jpo/1.0/plan/setup/issues/count 08:53:35.989 - - 214858 https://Jira-Base-URL/issues/?filter=323451 08:53:36.013 - - 4961 https://Jira-Base-URL/rest/jpo/1.0/plan/setup/issues/count 08:53:56.680 - - 214969 https://Jira-Base-URL/rest/jpo/1.0/plan/setup/issues/count 08:54:15.073 - - 1893 https://Jira-Base-URL/rest/jpo/1.0/labels

Notice how /jpo/1.0/plan/setup/issues/count is the first Thread to go stuck and other similar ones follow.

We can go after the full log to grep for that URL and find the user that triggered it:

1 grep "https://Jira-Base-URL/rest/jpo/1.0/plan/setup/issues/count" catalina.yyyy-mm-dd.log

It this pattern is persistent over the Heap shortage events, those user requests are probably contributing to the cause of the shortage than just facing the consequences — and the Heap dump can tell us that more assertively.

Note that if the Heap shortage isn't caused by Tomcat Threads directly, we'd have the same log outputs — though it's unlikely the pattern would be persistent over the events. The Threads reported to go stuck would just be the first victims of the Heap shortage.

5. Stuck request grouping

We can also group requests that get stuck / long-running the most grepping the same log:

1 grep -a -h -E 'StuckThreadDetectionValve.notifyStuckThreadDetected' catalina.out | grep -E -o 'for \[http.*\] and' | sed -E 's/(for \[|\] and)//g' | sed -E 's/[&?]_=[0-9]+$//g' | sed -E 's/&_=[0-9]+\]//g' | sort | uniq -c | sort -nr

Sample output:

1 2 3 4 5 6 7 170 https://jira-base-url.com/rest/jpo/1.0/labels 25 https://jira-base-url.com/rest/qoti-rich-filters/latest/gadgets/smart-gauges/load-smart-gauges-60000-205555-1900 20 https://jira-base-url.com/rest/qoti-rich-filters/latest/gadgets/pie-chart/load-pie-chart-60000-294444-1900 18 https://jira-base-url.com/rest/issueNav/1/issueTable 16 https://jira-base-url.com/rest/qoti-rich-filters/latest/gadgets/two-d-stats/load-2d-statistics-60000-310000-1900 13 https://jira-base-url.com/rest/qoti-rich-filters/latest/gadgets/two-d-stats/load-2d-statistics-60000-300000-1900 13 https://jira-base-url.com/rest/qoti-rich-filters/latest/gadgets/table/load-table-60000-207000-1900

Updated on March 19, 2025

Still need help?

The Atlassian Community is here for you.