JIRA Services stop working due to a database network failure
Platform Notice: Data Center Only - This article only applies to Atlassian products on the Data Center platform.
Note that this KB was created for the Data Center version of the product. Data Center KBs for non-Data-Center-specific features may also work for Server versions of the product, however they have not been tested. Support for Server* products ended on February 15th 2024. If you are running a Server product, you can visit the Atlassian Server end of support announcement to review your migration options.
*Except Fisheye and Crucible
Summary
Problem
All JIRA services e.g. Mail Queue, Mail handler, Directory sync services etc. stop working shortly after a connectivity problem with the database has occurred, even after the database connection has been restored and JIRA appears operational otherwise.
Thread dumps will show long running stacktraces such as the following, showing Caesium threads stuck on database connections:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
"Caesium-1-2" #130 daemon prio=5 os_prio=0 tid=0x0000000003536800 nid=0xc54 runnable [0x00007f90c2b8e000]
java.lang.Thread.State: RUNNABLE
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
at java.net.SocketInputStream.read(SocketInputStream.java:171)
at java.net.SocketInputStream.read(SocketInputStream.java:141)
at java.io.DataInputStream.readFully(DataInputStream.java:195)
at java.io.DataInputStream.readFully(DataInputStream.java:169)
at net.sourceforge.jtds.jdbc.SharedSocket.readPacket(SharedSocket.java:850)
at net.sourceforge.jtds.jdbc.SharedSocket.getNetPacket(SharedSocket.java:731)
- locked <0x0000000720eb03a0> (a java.util.concurrent.ConcurrentHashMap)
at net.sourceforge.jtds.jdbc.ResponseStream.getPacket(ResponseStream.java:477)
at net.sourceforge.jtds.jdbc.ResponseStream.read(ResponseStream.java:114)
at net.sourceforge.jtds.jdbc.ResponseStream.peek(ResponseStream.java:99)
at net.sourceforge.jtds.jdbc.TdsCore.wait(TdsCore.java:4127)
at net.sourceforge.jtds.jdbc.TdsCore.executeSQL(TdsCore.java:1086)
- locked <0x0000000720eb2a00> (a net.sourceforge.jtds.jdbc.TdsCore)
at net.sourceforge.jtds.jdbc.TdsCore.microsoftPrepare(TdsCore.java:1219)
at net.sourceforge.jtds.jdbc.JtdsConnection.prepareSQL(JtdsConnection.java:708)
- locked <0x0000000720eaff38> (a net.sourceforge.jtds.jdbc.JtdsConnection)
at net.sourceforge.jtds.jdbc.JtdsPreparedStatement.executeQuery(JtdsPreparedStatement.java:1028)
- locked <0x0000000720eaff38> (a net.sourceforge.jtds.jdbc.JtdsConnection)
at org.apache.commons.dbcp2.DelegatingPreparedStatement.executeQuery(DelegatingPreparedStatement.java:83)
at org.apache.commons.dbcp2.DelegatingPreparedStatement.executeQuery(DelegatingPreparedStatement.java:83)
at org.ofbiz.core.entity.jdbc.SQLProcessor.executeQuery(SQLProcessor.java:633)
at org.ofbiz.core.entity.GenericDAO.createEntityListIterator(GenericDAO.java:967)
at org.ofbiz.core.entity.GenericDAO.selectListIteratorByCondition(GenericDAO.java:883)
at org.ofbiz.core.entity.GenericHelperDAO.findListIteratorByCondition(GenericHelperDAO.java:194)
at org.ofbiz.core.entity.GenericDelegator.findListIteratorByCondition(GenericDelegator.java:1237)
at com.atlassian.jira.ofbiz.DefaultOfBizDelegator.findListIteratorByCondition(DefaultOfBizDelegator.java:398)
at com.atlassian.jira.ofbiz.WrappingOfBizDelegator.findListIteratorByCondition(WrappingOfBizDelegator.java:278)
at com.atlassian.jira.entity.SelectQueryImpl$ExecutionContextImpl.forEach(SelectQueryImpl.java:227)
at com.atlassian.jira.entity.SelectQueryImpl$ExecutionContextImpl.consumeWith(SelectQueryImpl.java:214)
at com.atlassian.jira.entity.SelectQueryImpl$ExecutionContextImpl.singleValue(SelectQueryImpl.java:191)
at com.atlassian.jira.scheduler.OfBizClusteredJobDao.find(OfBizClusteredJobDao.java:88)
at com.atlassian.scheduler.caesium.impl.CaesiumSchedulerService.executeClusteredJob(CaesiumSchedulerService.java:417)
at com.atlassian.scheduler.caesium.impl.CaesiumSchedulerService.executeClusteredJobWithRecoveryGuard(CaesiumSchedulerService.java:462)
at com.atlassian.scheduler.caesium.impl.CaesiumSchedulerService.executeQueuedJob(CaesiumSchedulerService.java:390)
at com.atlassian.scheduler.caesium.impl.CaesiumSchedulerService$1.consume(CaesiumSchedulerService.java:285)
at com.atlassian.scheduler.caesium.impl.CaesiumSchedulerService$1.consume(CaesiumSchedulerService.java:282)
at com.atlassian.scheduler.caesium.impl.SchedulerQueueWorker.executeJob(SchedulerQueueWorker.java:65)
at com.atlassian.scheduler.caesium.impl.SchedulerQueueWorker.executeNextJob(SchedulerQueueWorker.java:59)
at com.atlassian.scheduler.caesium.impl.SchedulerQueueWorker.run(SchedulerQueueWorker.java:34)
at java.lang.Thread.run(Thread.java:745)
Diagnosis
All the required database configuration recommended in Surviving connection closures have been implemented but doesn't help with the issue.
Environment
Observed with JTDS JDBC connection to SQL server database (could potentially occur in other envrionments)
Cause
JIRA services related database processes are unable to recover from the connection loss due to a network issue that occurred.
Solution
Restart JIRA to refresh the services after a database network error has occurred.
Resolution for MS SQL
If JIRA core version is older than 7.2.0, refer toJRASERVER-62072 - Database connectivity issue causes scheduled jobs to break and upgrade to first get a fix for a known issue with similar symptoms.
If you're seeing this in JIRA 7.2.0 and later, apply a socket timeout parameter to the database connection configuration. This will force this long running processes to hit into a socketTimeout instead of being stuck indefinitely, and will allow the services to regain operation with a new functional database connection.
1. Edit dbconfig.xml and find the database url tag e.g. socketTimeout:
1
<url>jdbc:jtds:sqlserver://SQL:1433/jira</url>
2. Append the socketTimeout= setting to the database URL as shown below. For instance, the setting below sets the stuck database processes to hit into a timeout after 10 minutes which destroys the dead connection, in order for the Service threads to use a new connection to the database and regain operation:
1
<url>jdbc:jtds:sqlserver://SQL:1433/jira;SocketTimeout=600000</url>
3. Restart JIRA on each node for the changes to take effect.
Resolution for PostgreSQL
From: Connection problems to PostgreSQL result in stuck threads in Jira
To solve this problem:
Upgrade the JDBC driver for PostgreSQL to 42.2.18 or later. This driver better handles the properties you’ll add in the next step.
Edit the
dbconfig.xml
file, and add the following properties into<jdbc-resource>
:1
<connection-properties>tcpKeepAlive=true;socketTimeout=240</connection-properties>
tcpKeepAlive: checks whether the connection is still running.
socketTimeout: terminates the connection after the specified time (in seconds). We’ve chosen a conservative 6 minutes, but if you tend to run SQL queries that take a long time, you can increase this value.
Restart JIRA on each node for the changes to take effect.
PostgreSQL jdbc version
The socketTimeout connection property was not enforced properly due to a bug in the driver. Version 42.2.15 (2020-08-14) include a bug fix:
"Make sure socketTimeout is enforced PR 1831, 210b27a6" from in the PostgreSQL JDBC Driver change log.
Please ensure that version 42.2.15 or later is used to ensure that the socketTimeout connection property works as expected.
Resolution for Oracle
To solve this problem:
1. Add the following system property to the setenv.sh file on each node in a test environment. See Setting properties and options on startup for more information.
1
-Doracle.jdbc.ReadTimeout=60000
2. Restart JIRA on each node for the changes to take effect.
Jira will recover after 1 minute, which is in accordance with the ReadTimeout configured as 60000 (timeout is in milliseconds).
Was this helpful?