Jira node goes down, when many HTTP requests updating, commenting and deleting issues hitting a particular node

Platform Notice: Data Center Only - This article only applies to Atlassian products on the Data Center platform.

Note that this KB was created for the Data Center version of the product. Data Center KBs for non-Data-Center-specific features may also work for Server versions of the product, however they have not been tested. Support for Server* products ended on February 15th 2024. If you are running a Server product, you can visit the Atlassian Server end of support announcement to review your migration options.

*Except Fisheye and Crucible

Summary

There are too many update issues, comments on the issue, comments rest calls and delete issues that hit a node and the DB connection pool reaches its limits. This makes the particular node unresponsive and the HTTP thread pool also reaches its limits.

Environment

8.20.x

Diagnosis

  • Capture thread dump from the time of issue before restarting the impacted node.

  • Capture the support.zip file.

  • Check for the Caesium thread for stack trace as follows in all the thread dumps:

    Caesium-thread

    1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 "Caesium-1-3" #639 daemon prio=5 os_prio=0 tid=0x00007fca29771800 nid=0x4674 runnable [0x00007fca22063000] java.lang.Thread.State: RUNNABLE at java.net.SocketInputStream.socketRead0(Native Method) at java.net.SocketInputStream.socketRead(SocketInputStream.java:116) at java.net.SocketInputStream.read(SocketInputStream.java:171) at java.net.SocketInputStream.read(SocketInputStream.java:141) at com.microsoft.sqlserver.jdbc.TDSChannel.read(IOBuffer.java:2058) at com.microsoft.sqlserver.jdbc.TDSReader.readPacket(IOBuffer.java:6617) - locked <0x00000004d34026c0> (a com.microsoft.sqlserver.jdbc.TDSReader) at com.microsoft.sqlserver.jdbc.TDSCommand.startResponse(IOBuffer.java:7803) at com.microsoft.sqlserver.jdbc.SQLServerPreparedStatement.doExecutePreparedStatement(SQLServerPreparedStatement.java:600) at com.microsoft.sqlserver.jdbc.SQLServerPreparedStatement$PrepStmtExecCmd.doExecute(SQLServerPreparedStatement.java:524) at com.microsoft.sqlserver.jdbc.TDSCommand.execute(IOBuffer.java:7418) at com.microsoft.sqlserver.jdbc.SQLServerConnection.executeCommand(SQLServerConnection.java:3272) - locked <0x00000004d3402848> (a java.lang.Object) at com.microsoft.sqlserver.jdbc.SQLServerStatement.executeCommand(SQLServerStatement.java:247) at com.microsoft.sqlserver.jdbc.SQLServerStatement.executeStatement(SQLServerStatement.java:222) at com.microsoft.sqlserver.jdbc.SQLServerPreparedStatement.executeUpdate(SQLServerPreparedStatement.java:473) at org.apache.commons.dbcp2.DelegatingPreparedStatement.executeUpdate(DelegatingPreparedStatement.java:98) at org.apache.commons.dbcp2.DelegatingPreparedStatement.executeUpdate(DelegatingPreparedStatement.java:98) at com.atlassian.jira.ofbiz.sql.PreparedStatementWrapper.executeUpdate(PreparedStatementWrapper.java:47) at com.atlassian.jira.diagnostic.connection.DiagnosticPreparedStatement.lambda$executeUpdate$7(DiagnosticPreparedStatement.java:69) at com.atlassian.jira.diagnostic.connection.DiagnosticPreparedStatement$$Lambda$3785/582362128.execute(Unknown Source) at com.atlassian.diagnostics.internal.platform.monitor.db.DefaultDatabaseDiagnosticsCollector.recordExecutionTime(DefaultDatabaseDiagnosticsCollector.java:70) at com.atlassian.jira.diagnostic.connection.DatabaseDiagnosticsCollectorDelegate.recordExecutionTime(DatabaseDiagnosticsCollectorDelegate.java:55) at com.atlassian.jira.diagnostic.connection.DiagnosticPreparedStatement.executeUpdate(DiagnosticPreparedStatement.java:69) at org.ofbiz.core.entity.jdbc.SQLProcessor.executeUpdate(SQLProcessor.java:562) at org.ofbiz.core.entity.GenericDAO.deleteByCondition(GenericDAO.java:1242) at org.ofbiz.core.entity.GenericHelperDAO.removeByCondition(GenericHelperDAO.java:244) at org.ofbiz.core.entity.GenericDelegator.removeByCondition(GenericDelegator.java:1374) at com.atlassian.jira.ofbiz.DefaultOfBizDelegator.removeByCondition(DefaultOfBizDelegator.java:142) at com.atlassian.jira.ofbiz.WrappingOfBizDelegator.removeByCondition(WrappingOfBizDelegator.java:136) at com.atlassian.jira.index.ha.OfBizReplicatedIndexOperationStore.purgeOldOperations(OfBizReplicatedIndexOperationStore.java:143) at com.atlassian.jira.service.services.index.ReplicatedIndexCleaningService.run(ReplicatedIndexCleaningService.java:62) at com.atlassian.jira.service.JiraServiceContainerImpl.run(JiraServiceContainerImpl.java:68) at com.atlassian.jira.service.ServiceRunner.runService(ServiceRunner.java:62) at com.atlassian.jira.service.ServiceRunner.runServiceId(ServiceRunner.java:44) at com.atlassian.jira.service.ServiceRunner.runJob(ServiceRunner.java:32) at com.atlassian.scheduler.core.JobLauncher.runJob(JobLauncher.java:134) at com.atlassian.scheduler.core.JobLauncher.launchAndBuildResponse(JobLauncher.java:106) at com.atlassian.scheduler.core.JobLauncher.launch(JobLauncher.java:90) at com.atlassian.scheduler.caesium.impl.CaesiumSchedulerService.launchJob(CaesiumSchedulerService.java:435) at com.atlassian.scheduler.caesium.impl.CaesiumSchedulerService.executeClusteredJob(CaesiumSchedulerService.java:430) at com.atlassian.scheduler.caesium.impl.CaesiumSchedulerService.executeClusteredJobWithRecoveryGuard(CaesiumSchedulerService.java:454) at com.atlassian.scheduler.caesium.impl.CaesiumSchedulerService.executeQueuedJob(CaesiumSchedulerService.java:382) at com.atlassian.scheduler.caesium.impl.CaesiumSchedulerService$$Lambda$5735/292939294.accept(Unknown Source) at com.atlassian.scheduler.caesium.impl.SchedulerQueueWorker.executeJob(SchedulerQueueWorker.java:66) at com.atlassian.scheduler.caesium.impl.SchedulerQueueWorker.executeNextJob(SchedulerQueueWorker.java:60) at com.atlassian.scheduler.caesium.impl.SchedulerQueueWorker.run(SchedulerQueueWorker.java:35) at java.lang.Thread.run(Thread.java:748)
  • Most of the HTTP threads would be long-running for DB operations and in Catalina logs

    HTTP threads in Catalina.out

    1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 WARNING [ContainerBackgroundProcessor[StandardEngine[Catalina]]] org.apache.catalina.valves.StuckThreadDetectionValve.notifyStuckThreadDetected Thread [http-nio-8080-exec-36 url: /rest/api/2/issue/XXX-xxxx/comment; user: xxxxxx] (id=[418645]) has been active for [121,346] milliseconds (since [3/14/23 10:02 PM]) to serve the same request for [https://xyz.com/rest/api/2/issue/XXX-xxxx/comment] and may be stuck (configured threshold for this StuckThreadDetectionValve is [120] seconds). There is/are [5] thread(s) in total that are monitored by this Valve and may be stuck.    java.lang.Throwable at java.net.SocketInputStream.socketRead0(Native Method) at java.net.SocketInputStream.socketRead(SocketInputStream.java:116) at java.net.SocketInputStream.read(SocketInputStream.java:171) at java.net.SocketInputStream.read(SocketInputStream.java:141) at com.microsoft.sqlserver.jdbc.TDSChannel.read(IOBuffer.java:2058) at com.microsoft.sqlserver.jdbc.TDSReader.readPacket(IOBuffer.java:6617) at com.microsoft.sqlserver.jdbc.TDSCommand.startResponse(IOBuffer.java:7803) at com.microsoft.sqlserver.jdbc.SQLServerPreparedStatement.doExecutePreparedStatement(SQLServerPreparedStatement.java:600) at com.microsoft.sqlserver.jdbc.SQLServerPreparedStatement$PrepStmtExecCmd.doExecute(SQLServerPreparedStatement.java:524) at com.microsoft.sqlserver.jdbc.TDSCommand.execute(IOBuffer.java:7418) at com.microsoft.sqlserver.jdbc.SQLServerConnection.executeCommand(SQLServerConnection.java:3272) at com.microsoft.sqlserver.jdbc.SQLServerStatement.executeCommand(SQLServerStatement.java:247) at com.microsoft.sqlserver.jdbc.SQLServerStatement.executeStatement(SQLServerStatement.java:222) at com.microsoft.sqlserver.jdbc.SQLServerPreparedStatement.executeUpdate(SQLServerPreparedStatement.java:473) at org.apache.commons.dbcp2.DelegatingPreparedStatement.executeUpdate(DelegatingPreparedStatement.java:98) at org.apache.commons.dbcp2.DelegatingPreparedStatement.executeUpdate(DelegatingPreparedStatement.java:98) at com.atlassian.jira.ofbiz.sql.PreparedStatementWrapper.executeUpdate(PreparedStatementWrapper.java:47) at com.atlassian.jira.diagnostic.connection.DiagnosticPreparedStatement.lambda$executeUpdate$7(DiagnosticPreparedStatement.java:69) at com.atlassian.jira.diagnostic.connection.DiagnosticPreparedStatement$$Lambda$3785/582362128.execute(Unknown Source)
  • The same thread in the thread dump would be long-running in multiple thread dumps:

    The same thread in thread dumps

    1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 "http-nio-8080-exec-36 url: /rest/api/2/issue/XXX-xxxx/comment; user: xxxxxx" #2626164 daemon prio=5 os_prio=0 tid=0x00007fcbbc8af800 nid=0xc9cb runnable [0x00007fca30ffc000] java.lang.Thread.State: RUNNABLE at java.net.SocketInputStream.socketRead0(Native Method) at java.net.SocketInputStream.socketRead(SocketInputStream.java:116) at java.net.SocketInputStream.read(SocketInputStream.java:171) at java.net.SocketInputStream.read(SocketInputStream.java:141) at com.microsoft.sqlserver.jdbc.TDSChannel.read(IOBuffer.java:2058) at com.microsoft.sqlserver.jdbc.TDSReader.readPacket(IOBuffer.java:6617) - locked <0x00000004bdc00010> (a com.microsoft.sqlserver.jdbc.TDSReader) at com.microsoft.sqlserver.jdbc.TDSCommand.startResponse(IOBuffer.java:7803) at com.microsoft.sqlserver.jdbc.SQLServerPreparedStatement.doExecutePreparedStatement(SQLServerPreparedStatement.java:600) at com.microsoft.sqlserver.jdbc.SQLServerPreparedStatement$PrepStmtExecCmd.doExecute(SQLServerPreparedStatement.java:524) at com.microsoft.sqlserver.jdbc.TDSCommand.execute(IOBuffer.java:7418) at com.microsoft.sqlserver.jdbc.SQLServerConnection.executeCommand(SQLServerConnection.java:3272) - locked <0x00000006a3000098> (a java.lang.Object) at com.microsoft.sqlserver.jdbc.SQLServerStatement.executeCommand(SQLServerStatement.java:247) at com.microsoft.sqlserver.jdbc.SQLServerStatement.executeStatement(SQLServerStatement.java:222) at com.microsoft.sqlserver.jdbc.SQLServerPreparedStatement.executeUpdate(SQLServerPreparedStatement.java:473) at org.apache.commons.dbcp2.DelegatingPreparedStatement.executeUpdate(DelegatingPreparedStatement.java:98) at org.apache.commons.dbcp2.DelegatingPreparedStatement.executeUpdate(DelegatingPreparedStatement.java:98) at com.atlassian.jira.ofbiz.sql.PreparedStatementWrapper.executeUpdate(PreparedStatementWrapper.java:47) at com.atlassian.jira.diagnostic.connection.DiagnosticPreparedStatement.lambda$executeUpdate$7(DiagnosticPreparedStatement.java:69) at com.atlassian.jira.diagnostic.connection.DiagnosticPreparedStatement$$Lambda$3785/582362128.execute(Unknown Source) at com.atlassian.diagnostics.internal.platform.monitor.db.DefaultDatabaseDiagnosticsCollector.recordExecutionTime(DefaultDatabaseDiagnosticsCollector.java:70) at com.atlassian.jira.diagnostic.connection.DatabaseDiagnosticsCollectorDelegate.recordExecutionTime(DatabaseDiagnosticsCollectorDelegate.java:55) at com.atlassian.jira.diagnostic.connection.DiagnosticPreparedStatement.executeUpdate(DiagnosticPreparedStatement.java:69) at com.querydsl.sql.dml.SQLInsertClause.execute(SQLInsertClause.java:423) at com.atlassian.jira.database.IdGeneratingSQLInsertClause.executeWithId(IdGeneratingSQLInsertClause.java:71) at com.atlassian.jira.index.ha.OfBizReplicatedIndexOperationStore.lambda$insertIndexOperation$0(OfBizReplicatedIndexOperationStore.java:128) at com.atlassian.jira.index.ha.OfBizReplicatedIndexOperationStore$$Lambda$8223/1345587209.runQuery(Unknown Source) at com.atlassian.jira.database.DefaultQueryDslAccessor.lambda$executeQuery$0(DefaultQueryDslAccessor.java:68) at com.atlassian.jira.database.DefaultQueryDslAccessor$$Lambda$428/644342402.apply(Unknown Source) at com.atlassian.jira.database.DatabaseAccessorImpl.lambda$runInTransaction$0(DatabaseAccessorImpl.java:105) at com.atlassian.jira.database.DatabaseAccessorImpl$$Lambda$429/1345460721.run(Unknown Source) at com.atlassian.jira.database.DatabaseAccessorImpl.executeQuery(DatabaseAccessorImpl.java:74) at com.atlassian.jira.database.DatabaseAccessorImpl.runInTransaction(DatabaseAccessorImpl.java:100) at com.atlassian.jira.database.DefaultQueryDslAccessor.executeQuery(DefaultQueryDslAccessor.java:67) at com.atlassian.jira.index.ha.OfBizReplicatedIndexOperationStore.insertIndexOperation(OfBizReplicatedIndexOperationStore.java:124) at com.atlassian.jira.index.ha.OfBizReplicatedIndexOperationStore.createIndexOperation(OfBizReplicatedIndexOperationStore.java:79) at com.atlassian.jira.index.ha.DefaultReplicatedIndexManager.updateReplicatedIndex(DefaultReplicatedIndexManager.java:251) at com.atlassian.jira.index.ha.DefaultReplicatedIndexManager.reindexEntityWithVersion(DefaultReplicatedIndexManager.java:100) at com.atlassian.jira.index.ha.DefaultReplicatedIndexManager.reindexComments(DefaultReplicatedIndexManager.java:79) at com.atlassian.jira.issue.index.DefaultIssueIndexer$CommentOperation.close(DefaultIssueIndexer.java:1136) at com.atlassian.jira.issue.index.DefaultIssueIndexer.perform(DefaultIssueIndexer.java:531) at com.atlassian.jira.issue.index.DefaultIssueIndexer.perform(DefaultIssueIndexer.java:549) at com.atlassian.jira.issue.index.DefaultIssueIndexer.reindexComments(DefaultIssueIndexer.java:354) at com.atlassian.jira.issue.index.DefaultIndexManager.lambda$reIndexComments$6(DefaultIndexManager.java:675) at com.atlassian.jira.issue.index.DefaultIndexManager$$Lambda$9725/600345995.get(Unknown Source) at com.atlassian.jira.issue.index.DefaultIndexManager.executeWithIndexLock(DefaultIndexManager.java:863) at com.atlassian.jira.issue.index.DefaultIndexManager.reIndexRelatedEntity(DefaultIndexManager.java:720) at com.atlassian.jira.issue.index.DefaultIndexManager.reIndexComments(DefaultIndexManager.java:676) at com.atlassian.jira.issue.index.DefaultIndexManager.reIndexComments(DefaultIndexManager.java:669) at com.atlassian.jira.issue.index.DefaultIndexManager.reIndexComments(DefaultIndexManager.java:664)
  • Most of the threads would be long-running on DB as shown in the screenshot below:

(Auto-migrated image: description temporarily unavailable)
  • Additionally, check-in startup logs, when the following job triggers in the environment:

    Replicated index flush service

    1 2 3 4 Replicated index flush service : com.atlassian.jira.service.services.index.ReplicatedIndexCleaningService Service Schedule : 0 0 10/12 * * ? Last Run : 2/18/23 10:00 AM RETENTION_PERIOD : 2880m
  • Check if the long-running thread started taking time near the current execution of the service Replicated index flush service. As mentioned above example for Catalina.out.

  • Check with the DB team for long-running too many queries and see if the insert operation on table replicatedindexoperation is blocked due to a delete operation on the same replicatedindexoperation table.

Cause

  • From the thread dumps, we clearly see all threads are waiting on the database performing index replication (not to be confused with foreground reindexing). They are triggered by the node-reindex service thread as part of any update to a Jira issue.

  • A lock on the table replicatedindexoperation for a long time due to the delete operation and all the HTTP requests which needed to insert a row to the replicationindexperation table kept on waiting due to the lock.

  • There is no known root cause of the issue in the database. However, we expect DBAs to be involved in further troubleshooting.

Solution

Restarting the node which triggers the DELETE operation will help in this condition. As it will release the lock on the database table on the restart and allow operations on replicatedindexoperation.

Updated on March 19, 2025

Still need help?

The Atlassian Community is here for you.