Service Management customer notifications are sent with a long delay due to complex SLA configurations
Platform Notice: Data Center Only - This article only applies to Atlassian products on the Data Center platform.
Note that this KB was created for the Data Center version of the product. Data Center KBs for non-Data-Center-specific features may also work for Server versions of the product, however they have not been tested. Support for Server* products ended on February 15th 2024. If you are running a Server product, you can visit the Atlassian Server end of support announcement to review your migration options.
*Except Fisheye and Crucible
Summary
Symptoms
All the customer notifications within a specific Service Management (former "Service Desk" project) are sent with random delays (sometimes several hours).
Diagnosis
The Service Management project is configured with at least 1 SLA which verifies either or both of the following conditions:
The SLA contains a high number of goals (at least hundreds of goals)
The SLA contains goals which are associated to JQL queries using complex functions, for example:
IQL functions from the Insight Asset Management add-on
Functions/scripts from the ScriptRunner add-on
The Customer Notifications are not the only functionality that experiences delays in the Service Management project. Other functionalities such as the SLA or the Service Management automations are triggered/calculated with a long delay
When collecting thread dumps while the delays are happening, we might see a very long running thread (SdOffThreadEventJobRunner:thread-X) performing JQL operation for the SLA functionality:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55
"SdOffThreadEventJobRunner:thread-4" #692 prio=5 os_prio=0 tid=0x00007f291c168800 nid=0xf98 runnable [0x00007f2884f9f000] java.lang.Thread.State: RUNNABLE at java.lang.Class.getEnclosingMethod0(Native Method) at java.lang.Class.getEnclosingMethodInfo(Class.java:1072) at java.lang.Class.getEnclosingClass(Class.java:1272) at java.lang.Class.getSimpleBinaryName(Class.java:1443) at java.lang.Class.getSimpleName(Class.java:1309) at com.riadalabs.jira.plugins.insight.services.cache.InsightCacheManagerImpl.getCache(InsightCacheManagerImpl.java:33) at com.riadalabs.jira.plugins.insight.services.cache.InsightCacheManagerImpl.getFromCache(InsightCacheManagerImpl.java:75) at com.riadalabs.jira.plugins.insight.services.core.dal.impl.ConfigurationDalImpl.loadStatusType(ConfigurationDalImpl.java:327) at com.riadalabs.jira.plugins.insight.services.predicate.attribute.AttributeValuePredicateBase.getStatusValue(AttributeValuePredicateBase.java:293) ... at com.riadalabs.jira.plugins.insight.services.predicate.MultiplePredicates.testAllOf(MultiplePredicates.java:72) at com.riadalabs.jira.plugins.insight.services.predicate.MultiplePredicates.test(MultiplePredicates.java:57) at com.riadalabs.jira.plugins.insight.services.predicate.MultiplePredicates.test(MultiplePredicates.java:12) at java.util.function.Predicate.lambda$or$2(Predicate.java:101) at java.util.function.Predicate$$Lambda$1320/767395427.test(Unknown Source) at java.util.function.Predicate.lambda$and$0(Predicate.java:69) at java.util.function.Predicate$$Lambda$1771/1190882033.test(Unknown Source) at com.riadalabs.jira.plugins.insight.services.predicate.ObjectIdBucketPredicate.test(ObjectIdBucketPredicate.java:25) at com.riadalabs.jira.plugins.insight.services.predicate.ObjectIdBucketPredicate.test(ObjectIdBucketPredicate.java:9) at com.riadalabs.jira.plugins.insight.services.core.ObjectFilterManagerImpl.filterObjectBeansInThread(ObjectFilterManagerImpl.kt:125) ... at com.riadalabs.jira.plugins.insight.services.jira.customfield.searcher.function.IQLFunction.getFunctionValues(IQLFunction.java:125) at com.riadalabs.jira.plugins.insight.services.jira.customfield.searcher.function.InsightAbstractJqlFunction.getValues(InsightAbstractJqlFunction.java:91) at com.atlassian.jira.jql.operand.FunctionOperandHandler$3.call(FunctionOperandHandler.java:61) at com.atlassian.jira.jql.operand.FunctionOperandHandler$3.call(FunctionOperandHandler.java:58) at com.atlassian.ozymandias.SafePluginPointAccess.call(SafePluginPointAccess.java:390) at com.atlassian.jira.jql.operand.FunctionOperandHandler.getValues(FunctionOperandHandler.java:58) at com.atlassian.jira.jql.operand.FunctionOperandHandler.getValues(FunctionOperandHandler.java:26) at com.atlassian.jira.jql.operand.DefaultJqlOperandResolver.getValues(DefaultJqlOperandResolver.java:88) at sun.reflect.GeneratedMethodAccessor784.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at com.atlassian.plugin.util.ContextClassLoaderSettingInvocationHandler.invoke(ContextClassLoaderSettingInvocationHandler.java:26) at com.sun.proxy.$Proxy313.getValues(Unknown Source) at sun.reflect.GeneratedMethodAccessor784.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at com.atlassian.plugin.osgi.bridge.external.HostComponentFactoryBean$DynamicServiceInvocationHandler.invoke(HostComponentFactoryBean.java:131) at com.sun.proxy.$Proxy313.getValues(Unknown Source) at com.atlassian.jira.jql.query.GenericClauseQueryFactory.getRawValues(GenericClauseQueryFactory.java:75) ... at com.atlassian.jira.jql.query.QueryVisitor.createQuery(QueryVisitor.java:51) at com.atlassian.jira.jql.query.DefaultLuceneQueryBuilder.createLuceneQuery(DefaultLuceneQueryBuilder.java:29) at com.atlassian.jira.issue.search.providers.LuceneSearchProvider.createLuceneQuery(LuceneSearchProvider.java:343) at com.atlassian.jira.issue.search.providers.LuceneSearchProvider.getHitCount(LuceneSearchProvider.java:183) at com.atlassian.jira.issue.search.providers.LuceneSearchProvider.getHitCount(LuceneSearchProvider.java:147) at com.atlassian.servicedesk.internal.sla.goal.GoalMatcherImpl.isMatching(GoalMatcherImpl.java:103) at com.atlassian.servicedesk.internal.sla.goal.GoalMatcherImpl.lambda$null$0(GoalMatcherImpl.java:58) at com.atlassian.servicedesk.internal.sla.goal.GoalMatcherImpl$$Lambda$3298/2052566009.test(Unknown Source)
Cause
The Service Management SLAs, Customer Notifications and JSM Automations all share the same threads (SdOffThreadEventJobRunner:thread
or SdSerialisedOffThreadProcessor:thread
). Whenever an update is made to a ticket, the Service Management project processes this update as follows:
First, the ticket where the update occurred is tested again each single JQL query configured in each SLA within the project
It's only after this ticket has been processed by the SLA functionality that this ticket will be processed by the Service Management Automations and Customer Notifications
If the project contains an SLA configured with a lot of complex JQL queries, it will take a lot of time for any ticket to be processed by this SLA before being process by the Automations and Customer Notifications.
Because of that, we might experience:
delays and inconsistencies with the SLA displayed in the tickets
delays with the triggering of Automations
delays with the Customer Notifications
Solution
Workaround
Identify the complex JQL queries used in the SLA configurations. Some ways to identify them include:
Checking the atlassian-jira-slow-queries.log files, since these log files will report any JQL queries that is slow to run
Here's a command that shows the 20 slowest queries:
1
cat atlassian-jira-slow-queries.log*| grep OffThread | grep -oE "query.*queryTermMetrics" | sed "s/ ), queryTermMetrics//" | sed "s/[{]//g"| sed "s/[}]//g" | sed "s/query=key = [0-9]* AND //g" | sort | uniq -c | sort -nr | head -n 20
Reviewing each JQL query in each SLA and check if any of them is using functions coming from add-ons (such as ScriptRunner, or Insight Asset Management...)
Copying each JQL query in each SLA, and testing them one by one in the page Issues > search for issues to see if they take a long time to execute
Once these slow JQL queries are identified, you will need to find a way to optimize them, for example by removing any kind of add-on script or functions from these queries.
Was this helpful?