How to Increase the Number of Ehcache Stripes to Prevent Lock Contention in Jira Data Center
Platform Notice: Data Center Only - This article only applies to Atlassian products on the Data Center platform.
Note that this KB was created for the Data Center version of the product. Data Center KBs for non-Data-Center-specific features may also work for Server versions of the product, however they have not been tested. Support for Server* products ended on February 15th 2024. If you are running a Server product, you can visit the Atlassian Server end of support announcement to review your migration options.
*Except Fisheye and Crucible
Summary
Ehcache is a widely used open-source, Java-based caching library. It is designed to improve performance by reducing the need to repeatedly fetch data from slower sources (like databases) by storing frequently accessed data in memory.
Jira Datacenter uses EHCache as a cache implementation with a local wrapper LoadingCache (com.atlassian.cache.ehcache.LoadingCache), which is configured by default to 2048 stripes (from the previous 64 stripes before Jira 8.9.0).
In specific environments with many nodes (3, 4, or more) and high workloads, lock contention may still occur, causing threads to pile up while waiting for lock release. Increasing the number of stripes can potentially alleviate this issue.
In this article you will learn how to diagnose the lock contention scenario and how to increase the number of Ehcache stripes, however, it won't deal with all situations and a more specialized approach might be needed.
Environment
Jira Software 9 and later
Jira Service Management 5 and later
Diagnosis
Lock contention can be identified through stack traces incatalina.out
that resemble the following:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
13-Sep-2024 13:23:28.229 WARNING [Catalina-utility-1] org.apache.catalina.valves.StuckThreadDetectionValve.notifyStuckThreadDetected Thread [http-nio-8080-exec-108 url: /rest/api/2/issue/2950623/remotelink; user: <removed>] (id=[1411]) has been active for [135,159] milliseconds (since [9/13/24 1:21 PM]) to serve the same request for [https://<removed>/rest/api/2/issue/2950623/remotelink] and may be stuck (configured threshold for this StuckThreadDetectionValve is [120] seconds). There is/are [265] thread(s) in total that are monitored by this Valve and may be stuck.
java.lang.Throwable
at java.base@11.0.24/jdk.internal.misc.Unsafe.park(Native Method)
at java.base@11.0.24/java.util.concurrent.locks.LockSupport.park(LockSupport.java:194)
at java.base@11.0.24/java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:885)
at java.base@11.0.24/java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireShared(AbstractQueuedSynchronizer.java:1009)
at java.base@11.0.24/java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireShared(AbstractQueuedSynchronizer.java:1324)
at java.base@11.0.24/java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.lock(ReentrantReadWriteLock.java:738)
at net.sf.ehcache.concurrent.ReadWriteLockSync.lock(ReadWriteLockSync.java:50)
at net.sf.ehcache.constructs.blocking.BlockingCache.acquiredLockForKey(BlockingCache.java:196)
at net.sf.ehcache.constructs.blocking.BlockingCache.get(BlockingCache.java:158)
at com.atlassian.cache.ehcache.LoadingCache.get(LoadingCache.java:120)
at com.atlassian.cache.ehcache.DelegatingCachedReference.get(DelegatingCachedReference.java:71)
at com.atlassian.cache.impl.metrics.InstrumentedCachedReference.get(InstrumentedCachedReference.java:58)
at com.atlassian.jira.cache.stats.CachedReferenceWithStats.get(CachedReferenceWithStats.java:24)
at com.atlassian.jira.cache.ZduCacheMigrationHelper$1.get(ZduCacheMigrationHelper.java:35)
....
Look for net.sf.ehcache
in the stack trace and note if lock holds exceed two minutes. Hundreds or thousands of stuck requests may be distributed across cluster nodes.
Solution
Increase the Number of Stripes:
The number of stripes should be increased by a power of 2. The next available number is 4096.
Note: Increasing the number of stripes may slightly increase memory usage.
Edit JVM Parameters:
Jira Admins can modify the number of stripes by setting the JVM parameter
com.atlassian.cache.ehcache.LoadingCache.DEFAULT_NUMBER_OF_MUTEXES
. Follow these steps to set the parameter:Edit the JVM arguments in your
setenv.sh
(Linux) orsetenv.bat
(Windows) file:1
JVM_SUPPORT_RECOMMENDED_ARGS="$JVM_SUPPORT_RECOMMENDED_ARGS -Dcom.atlassian.cache.ehcache.LoadingCache.DEFAULT_NUMBER_OF_MUTEXES=4096"
Ensure this change is applied to all nodes in the cluster.
Restart the nodes to apply the changes.
If Issues Persist:
If lock contention continues, open a case with Atlassian Support by creating a ticket at support.atlassian.com.
Collect and attach the following information to the ticket:
Collect a Support ZIP from each of the Data Centre nodes.
Thread dumps in the node where you see the lock contention.
Was this helpful?