Jira Batched Notifications stop being sent from any project after adding a big number of watchers to a ticket
Platform Notice: Data Center Only - This article only applies to Atlassian products on the Data Center platform.
Note that this KB was created for the Data Center version of the product. Data Center KBs for non-Data-Center-specific features may also work for Server versions of the product, however they have not been tested. Support for Server* products ended on February 15th 2024. If you are running a Server product, you can visit the Atlassian Server end of support announcement to review your migration options.
*Except Fisheye and Crucible
Summary
After accidentally adding a very high number of users to the watcher list of a ticket (for example, hundreds of thousands), the Jira Batched Notifications stop being sent to users (or they are sent with hours of delay).
⚠️ Note that if you are using Jira Service Management and that the problem impacts Customer Notifications, then this KB article does not apply. This KB article is only about Jira batched notifications.
Environment
Any Jira version from 8.0.0.
Diagnosis
Any type of Jira notification (issue created, issue updated...) for any user and from any Jira ticket is impacted
Jira Notifications are sent successfully only when batching is disabled in ⚙ > System > Batching email notifications
The problem started to occur after a huge number of watchers (for example, hundreds of thousands) were added to at least 1 Jira ticket
Re-starting the Jira application does not help resolve this issue
Running the following SQL query returns a high number of results (for example, a few millions), showing that there are a high number of batched notification events that still need to be processed
1
select count(*) from "AO_733371_EVENT_RECIPIENT" WHERE "STATUS" = 'NEW' AND "CONSUMER_NAME"='mailEventConsumer';
When collecting thread dumps while the issue is happening, we can see a long running thread which is busy processing batched notification event:
Long running thread (from the 1st dump)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42
"Caesium-1-1" daemon prio=5 tid=0x0000000000000ba9 nid=0 runnable java.lang.Thread.State: RUNNABLE at java.util.Arrays.hashCode(Arrays.java:4146) at java.util.Objects.hash(Objects.java:128) at com.atlassian.jira.plugins.inform.api.events.dto.RecipientDTO.hashCode(RecipientDTO.java:103) at java.util.AbstractList.hashCode(AbstractList.java:541) at java.util.Arrays.hashCode(Arrays.java:4146) at java.util.Objects.hash(Objects.java:128) at com.atlassian.jira.plugins.inform.api.events.dto.EventDTO.hashCode(EventDTO.java:127) at java.util.HashMap.hash(HashMap.java:339) at java.util.HashMap$HashIterator.remove(HashMap.java:1462) at java.util.AbstractSet.removeAll(AbstractSet.java:178) at com.atlassian.jira.plugins.inform.batching.cron.BatchNotificationJob.lambda$removeProcessedEvents$6(BatchNotificationJob.java:229) at com.atlassian.jira.plugins.inform.batching.cron.BatchNotificationJob$$Lambda$5739/870912907.accept(Unknown Source) at java.util.ArrayList.forEach(ArrayList.java:1257) at com.atlassian.jira.plugins.inform.batching.cron.BatchNotificationJob.removeProcessedEvents(BatchNotificationJob.java:228) at com.atlassian.jira.plugins.inform.batching.cron.BatchNotificationJob.processEventBatch(BatchNotificationJob.java:149) at com.atlassian.jira.plugins.inform.batching.cron.BatchNotificationJob.lambda$notifyUsers$1(BatchNotificationJob.java:114) at com.atlassian.jira.plugins.inform.batching.cron.BatchNotificationJob$$Lambda$5688/903887977.apply(Unknown Source) at com.atlassian.jira.plugins.inform.performance.MeasurementWorkerFactory$1.measure(MeasurementWorkerFactory.java:39) at com.atlassian.jira.plugins.inform.batching.cron.BatchNotificationJob.notifyUsers(BatchNotificationJob.java:109) at com.atlassian.jira.plugins.inform.batching.cron.BatchNotificationJob.lambda$runJob$0(BatchNotificationJob.java:86) at com.atlassian.jira.plugins.inform.batching.cron.BatchNotificationJob$$Lambda$5684/1174080777.apply(Unknown Source) at com.atlassian.jira.plugins.inform.performance.MeasurementWorkerFactory$1.measure(MeasurementWorkerFactory.java:39) at com.atlassian.jira.plugins.inform.batching.cron.BatchNotificationJob.runJob(BatchNotificationJob.java:84) at com.atlassian.jira.plugins.inform.batching.cron.ConditionalJobRunner.runJob(ConditionalJobRunner.java:33) at com.atlassian.jira.plugins.inform.batching.cron.ConditionalJobRunner.runJob(ConditionalJobRunner.java:33) at com.atlassian.jira.plugins.inform.batching.cron.OncePerClusterJobRunner.runJob(OncePerClusterJobRunner.java:46) at com.atlassian.scheduler.core.JobLauncher.runJob(JobLauncher.java:134) at com.atlassian.scheduler.core.JobLauncher.launchAndBuildResponse(JobLauncher.java:106) at com.atlassian.scheduler.core.JobLauncher.launch(JobLauncher.java:90) at com.atlassian.scheduler.caesium.impl.CaesiumSchedulerService.launchJob(CaesiumSchedulerService.java:435) at com.atlassian.scheduler.caesium.impl.CaesiumSchedulerService.executeClusteredJob(CaesiumSchedulerService.java:430) at com.atlassian.scheduler.caesium.impl.CaesiumSchedulerService.executeClusteredJobWithRecoveryGuard(CaesiumSchedulerService.java:454) at com.atlassian.scheduler.caesium.impl.CaesiumSchedulerService.executeQueuedJob(CaesiumSchedulerService.java:382) at com.atlassian.scheduler.caesium.impl.CaesiumSchedulerService$$Lambda$2370/1703800892.accept(Unknown Source) at com.atlassian.scheduler.caesium.impl.SchedulerQueueWorker.executeJob(SchedulerQueueWorker.java:66) at com.atlassian.scheduler.caesium.impl.SchedulerQueueWorker.executeNextJob(SchedulerQueueWorker.java:60) at com.atlassian.scheduler.caesium.impl.SchedulerQueueWorker.run(SchedulerQueueWorker.java:35) at java.lang.Thread.run(Thread.java:748) Locked ownable synchronizers:
Same long running thread (from the 2nd dump)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27
"Caesium-1-1" daemon prio=5 tid=0x0000000000000ba9 nid=0 runnable java.lang.Thread.State: RUNNABLE at java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:174) at java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1382) at java.util.stream.ReferencePipeline$Head.forEach(ReferencePipeline.java:580) at java.util.stream.ReferencePipeline$7$1.accept(ReferencePipeline.java:270) at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193) at java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1382) at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:482) at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:472) at java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708) at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234) at java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:499) at com.atlassian.jira.plugins.inform.batching.cron.BatchNotificationJob.getRecipientIds(BatchNotificationJob.java:211) at com.atlassian.jira.plugins.inform.batching.cron.BatchNotificationJob.processBatches(BatchNotificationJob.java:165) at com.atlassian.jira.plugins.inform.batching.cron.BatchNotificationJob.processEventBatch(BatchNotificationJob.java:150) at com.atlassian.jira.plugins.inform.batching.cron.BatchNotificationJob.lambda$notifyUsers$1(BatchNotificationJob.java:114) ... at com.atlassian.scheduler.caesium.impl.SchedulerQueueWorker.executeJob(SchedulerQueueWorker.java:66) at com.atlassian.scheduler.caesium.impl.SchedulerQueueWorker.executeNextJob(SchedulerQueueWorker.java:60) at com.atlassian.scheduler.caesium.impl.SchedulerQueueWorker.run(SchedulerQueueWorker.java:35) at java.lang.Thread.run(Thread.java:748) Locked ownable synchronizers: - None
Same long running thread (from the 3rd dump)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29
"Caesium-1-1" daemon prio=5 tid=0x0000000000000ba9 nid=0 runnable java.lang.Thread.State: RUNNABLE at java.util.Arrays.hashCode(Arrays.java:4146) at java.util.Objects.hash(Objects.java:128) at com.atlassian.jira.plugins.inform.api.events.dto.RecipientDTO.hashCode(RecipientDTO.java:103) at java.util.AbstractList.hashCode(AbstractList.java:541) at java.util.Arrays.hashCode(Arrays.java:4146) at java.util.Objects.hash(Objects.java:128) at com.atlassian.jira.plugins.inform.api.events.dto.EventDTO.hashCode(EventDTO.java:127) at java.util.HashMap.hash(HashMap.java:339) at java.util.HashMap.remove(HashMap.java:799) at java.util.HashSet.remove(HashSet.java:236) at java.util.AbstractSet.removeAll(AbstractSet.java:174) at com.atlassian.jira.plugins.inform.batching.cron.BatchNotificationJob.lambda$removeProcessedEvents$6(BatchNotificationJob.java:229) at com.atlassian.jira.plugins.inform.batching.cron.BatchNotificationJob$$Lambda$5739/870912907.accept(Unknown Source) at java.util.ArrayList.forEach(ArrayList.java:1257) at com.atlassian.jira.plugins.inform.batching.cron.BatchNotificationJob.removeProcessedEvents(BatchNotificationJob.java:228) at com.atlassian.jira.plugins.inform.batching.cron.BatchNotificationJob.processEventBatch(BatchNotificationJob.java:149) at com.atlassian.jira.plugins.inform.batching.cron.BatchNotificationJob.lambda$notifyUsers$1(BatchNotificationJob.java:114) ... at com.atlassian.scheduler.caesium.impl.SchedulerQueueWorker.executeJob(SchedulerQueueWorker.java:66) at com.atlassian.scheduler.caesium.impl.SchedulerQueueWorker.executeNextJob(SchedulerQueueWorker.java:60) at com.atlassian.scheduler.caesium.impl.SchedulerQueueWorker.run(SchedulerQueueWorker.java:35) at java.lang.Thread.run(Thread.java:748) Locked ownable synchronizers: - None
Cause
When using Jira Batched Notifications, whenever an action occurs in a Jira ticket, new events are added to the table AO_733371_EVENT_RECIPIENT with the status "NEW". If multiple users are supposed to receive a notification from that event, then for each event, there will be as many rows added to the table AO_733371_EVENT_RECIPIENT as recipients.
Let's assume that 100k users were added by accident to the watcher list of a Jira ticket. In this case, if 10 actions happen in the ticket and the 100k users needs to be notified about them, then 100k * 10 = 1 million events (rows) will be added to the table AO_733371_EVENT_RECIPIENT with the status "NEW". As a result, the scheduled job which is responsible to process all the events stored in the table AO_733371_EVENT_RECIPIENT might take a very long time to process all these events, and could potentially get stuck.
Solution
One way to fix this issue is to force all the batched notifications events from the problematic ticket to be marked as "PROCESSED" in the AO_733371_EVENT_RECIPIENT table. This way, the batched notification job will stop trying to process them and will move on to the other events from other Jira tickets. The resolution steps are listed below:
Identify the problematic ticket(s). One way to do it is to:
Go to the issue search page
Add "watchers" to the list of columns
Search for issues across the whole Jira instance (no need to add any text to the search, since we are looking for all the Jira issues)
Sort the Jira issues found by the search by "watchers" (desc order), so that you can identify the Jira issue(s) that contain huge numbers of watchers (hundreds of thousands)
Take note of the issue key(s)
Stop the Jira application
⚠️ Use your Database native tool to backup the Database. Make sure to not skip that step, so that you can revert back to this backup if needed
Run the following SQL query and make sure that it returns a high number of rows (expect a few million results). Make sure to replace 'ABC-123', 'ABC-456,' 'ABC-789' in the SQL query with the actual list of issue key(s) identified in the Step 1.
1 2
select count(*) from "AO_733371_EVENT_RECIPIENT" where "EVENT_ID" in (select "EVENT_ID" from "AO_733371_EVENT_PARAMETER" where "NAME" = 'object#issue#key#0' AND "VALUE" in ('ABC-123', 'ABC-456', 'ABC-789')) AND "STATUS" = 'NEW' AND "CONSUMER_NAME"='mailEventConsumer';
Once you confirmed that the query above returned a high number of rows, run the UPDATE query below which will force all the events from the problematic ticket(s) to be marked as processed, so that they can be skipped in the future:
1 2 3
update "AO_733371_EVENT_RECIPIENT" set "STATUS" = 'PROCESSED' where "EVENT_ID" in (select "EVENT_ID" from "AO_733371_EVENT_PARAMETER" where "NAME" = 'object#issue#key#0' AND "VALUE" in ('ABC-123', 'ABC-456', 'ABC-789')) AND "STATUS" = 'NEW' AND "CONSUMER_NAME"='mailEventConsumer';
Start the Jira application
Try to trigger some notifications and verify that Jira Batched Notifications are sent.
⚠️ Note that you might have to wait for some time to confirm that notifications are sent, depending on what frequency was set for the batched notifications (10min by default)
Was this helpful?