Jira node holds mail handler cluster lock for a long time
Platform Notice: Data Center Only - This article only applies to Atlassian products on the Data Center platform.
Note that this KB was created for the Data Center version of the product. Data Center KBs for non-Data-Center-specific features may also work for Server versions of the product, however they have not been tested. Support for Server* products ended on February 15th 2024. If you are running a Server product, you can visit the Atlassian Server end of support announcement to review your migration options.
*Except Fisheye and Crucible
Summary
Problem
The Jira admins keep getting cluster lock health check failures regarding one of the mail handler.
The health check failure reports:
Node '<node_id> ' has been holding cluster lock, 'com.atlassian.jira.service.services.mail.MailFetcherService.<mail_handler_id>', for <#_of_seconds> seconds.
Example:
Node 'node2 ' has been holding cluster lock, 'com.atlassian.jira.service.services.mail.MailFetcherService.11400', for 947 seconds.
Diagnosis
The cluster lock failure message reports the mail handler ID. The investigation should focus on this mail handler.
To find out what mail handler is holding the lock, query the DB as follows:
1
SELECT * FROM serviceconfig WHERE id = <id_from_the_cluster_lock_failure_message>;
Enable incoming mail debug to understand why the mail handler is spending too much time processing incoming mail.
Test the mail handler via Edit > Next > Test to find out how many emails are in the mailbox waiting to be processed.
Cause
A mail handler holding a cluster lock for a long time is not a problem per se; it may simply indicate that the mailbox has too much email to process and this naturally takes time.
Solution
Resolution
The mailbox needs to be inspected and cleaned up for mass incoming email such as:
Spam
Delivery notifications or failures
Bulk email that isn't properly marked as bulk for Jira to ignore
Old email lingering around
If there are no obvious problems with the mailbox, contact Atlassian Support. Be sure to include:
a support zip
a set of thread dumps captured from the node holding the cluster lock (while the lock is being held)
Was this helpful?