Health checks are failing on Jira nodes
Platform Notice: Data Center Only - This article only applies to Atlassian products on the Data Center platform.
Note that this KB was created for the Data Center version of the product. Data Center KBs for non-Data-Center-specific features may also work for Server versions of the product, however they have not been tested. Support for Server* products ended on February 15th 2024. If you are running a Server product, you can visit the Atlassian Server end of support announcement to review your migration options.
*Except Fisheye and Crucible
Summary
All health checks are failing with the message:
1
The health check was unable to complete within the timeout of 20000ms.
1
2
3
4
5
2020-12-01 03:28:43,587 HealthCheckWatchdog:thread-6 WARN ServiceRunner [c.a.t.healthcheck.concurrent.SupportHealthCheckTask] Health check Cluster Cache Replication was unable to complete within the timeout of 60000.
2020-12-01 03:28:43,588 HealthCheckWatchdog:thread-2 WARN ServiceRunner [c.a.t.healthcheck.concurrent.SupportHealthCheckTask] Health check Explicit GC was unable to complete within the timeout of 20000.
2020-12-01 03:28:43,588 HealthCheckWatchdog:thread-4 WARN ServiceRunner [c.a.t.healthcheck.concurrent.SupportHealthCheckTask] Health check Fonts was unable to complete within the timeout of 20000.
2020-12-01 03:28:43,588 HealthCheckWatchdog:thread-3 WARN ServiceRunner [c.a.t.healthcheck.concurrent.SupportHealthCheckTask] Health check Application links was unable to complete within the timeout of 20000.
2020-12-01 03:28:43,588 HealthCheckWatchdog:thread-7 WARN ServiceRunner [c.a.t.healthcheck.concurrent.SupportHealthCheckTask] Health check Lucene index files location was unable to complete within the timeout of 20000.
Environment
Jira Data Center 8.x version and above.
Diagnosis
There had been a node outage prior to all health check started to fail.
Support logs can mention health check traces in the scope of the outage (for example, a database outage):
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27
2020-11-29 14:48:29,116 Caesium-1-3 ERROR ServiceRunner [c.a.t.jira.healthcheck.JiraPlatformAccessor] Error determining db vendor com.atlassian.jira.exception.DataAccessException: com.microsoft.sqlserver.jdbc.SQLServerException: Unable to access availability database 'jiradb' because the database replica is not in the PRIMARY or SECONDARY role. Connections to an availability database is permitted only when the database replica is in the PRIMARY or SECONDARY role. Try the operation again later. ClientConnectionId:7cd87c39-9116-47e2-91c6-e45d85d32e11 at com.atlassian.jira.database.DatabaseAccessorImpl.borrowConnection(DatabaseAccessorImpl.java:167) at com.atlassian.jira.database.DatabaseAccessorImpl.executeQuery(DatabaseAccessorImpl.java:72) at sun.reflect.GeneratedMethodAccessor591.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at com.atlassian.plugin.util.ContextClassLoaderSettingInvocationHandler.invoke(ContextClassLoaderSettingInvocationHandler.java:26) at com.sun.proxy.$Proxy436.executeQuery(Unknown Source) ... 2 filtered at java.lang.reflect.Method.invoke(Method.java:498) ... at com.atlassian.troubleshooting.healthcheck.DefaultSupportHealthCheckSupplier.shouldDisplay(DefaultSupportHealthCheckSupplier.java:50) at com.atlassian.troubleshooting.healthcheck.DefaultSupportHealthCheckSupplier.asPluginSuppliedSupportHealthCheck(DefaultSupportHealthCheckSupplier.java:126) at com.atlassian.troubleshooting.healthcheck.DefaultSupportHealthCheckSupplier.lambda$healthChecksFrom$2(DefaultSupportHealthCheckSupplier.java:120) at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193) at java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1384) at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:482) at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:472) at java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708) at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234) at java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:499) at com.atlassian.troubleshooting.healthcheck.DefaultSupportHealthCheckSupplier.healthChecksFrom(DefaultSupportHealthCheckSupplier.java:122) at com.atlassian.troubleshooting.healthcheck.DefaultSupportHealthCheckSupplier.getHealthChecks(DefaultSupportHealthCheckSupplier.java:56) at com.atlassian.troubleshooting.healthcheck.impl.DefaultSupportHealthCheckManager.getHealthChecks(DefaultSupportHealthCheckManager.java:51) at com.atlassian.troubleshooting.healthcheck.impl.DefaultSupportHealthCheckManager.getAllHealthChecks(DefaultSupportHealthCheckManager.java:57) at com.atlassian.troubleshooting.healthcheck.impl.DefaultSupportHealthCheckManager.runAllHealthChecks(DefaultSupportHealthCheckManager.java:70)
Looking further in the logs, we see an error suggesting the presence of the outage:
1 2 3 4 5 6 7 8
2020-11-29 14:48:31,478 PlatformInstrumentsLoggingService RUNNING ERROR [NoModule] There was an error getting a DBCP datasource. java.lang.RuntimeException: Unable to obtain a connection from the underlying connection pool at org.ofbiz.core.entity.jdbc.interceptors.connection.ConnectionTracker.trackConnection(ConnectionTracker.java:52) at org.ofbiz.core.entity.transaction.DBCPConnectionFactory.trackConnection(DBCPConnectionFactory.java:283) at org.ofbiz.core.entity.transaction.DBCPConnectionFactory.getConnection(DBCPConnectionFactory.java:80) at org.ofbiz.core.entity.ConnectionFactory.tryGenericConnectionSources(ConnectionFactory.java:69) at org.ofbiz.core.entity.transaction.JNDIFactory.getConnection(JNDIFactory.java:173) at com.atlassian.jira.ofbiz.sql.TransactionFactoryInterfaceWrapper.getConnection(TransactionFactoryInterfaceWrapper.java:41)
Inspect if there are any locking errors that match the problem in Jira Data Center Functionalities Loss Due to Cluster Wide Lock
Validate if there's any database server shutting down errors showing in the atlassian-jira.log
1 2 3
2024-06-12 20:13:48,036-0700 SdSerialisedOffThreadProcessor:thread-1 ERROR anonymous [NoModule] There was an error getting a DBCP datasource.atlassian-jira.log.14-java.lang.RuntimeException: Unable to obtain a connection from the underlying connection pool ... Caused by: org.postgresql.util.PSQLException: FATAL: the database system is shutting down
Cause
The cause for the failing health check could be a critical outage that had been (or is still) affecting Jira before the checks began (think of database outage, storage outage etc.). If health checks got run during this outage, the warning will appear in Jira.
Solution
If the health check warning haven't been cleared some time after the outage, please follow the steps below to do a forcefully cluster node restart:
Validate the database server is up and running.
Shutdown all the nodes.
Start one node first and wait the node to be starting up properly.
Restart the rest of the other nodes.
If none of the options above help, please reboot the node server.
Was this helpful?