Cluster Cache replication failure due to Unresolved Host
Platform Notice: Data Center Only - This article only applies to Atlassian products on the Data Center platform.
Note that this KB was created for the Data Center version of the product. Data Center KBs for non-Data-Center-specific features may also work for Server versions of the product, however they have not been tested. Support for Server* products ended on February 15th 2024. If you are running a Server product, you can visit the Atlassian Server end of support announcement to review your migration options.
*Except Fisheye and Crucible
Summary
JIRA Data Center instance health check shows "Cluster Cache Replication" health check failure. This failure suggests that nodes in the Cluster are not able to communicate.
1
2
3
4
5
Name: Cluster Cache Replication
NodeId: null
Is healthy: false
Failure reason: The node XXXXX is not replicating
Severity: CRITICALAdditional links: []
Environment
Data Center instances having more than one node in the cluster.
Diagnosis
Review atlassian-jira.log for the affected node showing “Cluster Cache Replication” health check warning. Following traces are noticed in the logs:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
2023-06-01 11:58:15,055+0200 main WARN [c.a.jira.util.JiraUtils] IP/Hostname address cannot be calculated for this host. Please fix this.
.
.
2023-06-01 11:58:15,180+0200 main ERROR [n.sf.ehcache.Cache] Unable to set localhost. This prevents creation of a GUID. Cause was: XXXXX: XXXXX: Name or service not known
java.net.UnknownHostException: XXXXX: XXXXX: Name or service not known
.
.
.
2023-06-01 11:58:15,971+0200 main WARN [n.sf.ehcache.CacheManager] Cache com.atlassian.jira.task.TaskManagerImpl.taskMaprequested bootstrap but a CacheException occured. Error bootstrapping from remote peer. Message was: java.lang.reflect.InvocationTargetException
net.sf.ehcache.distribution.RemoteCacheException: Error bootstrapping from remote peer. Message was: java.lang.reflect.InvocationTargetException
at net.sf.ehcache.distribution.RMIBootstrapCacheLoader.doLoad(RMIBootstrapCacheLoader.java:176)
.
.
Caused by: java.rmi.ConnectException: Connection refused to host: 127.0.1.1; nested exception is:
java.net.ConnectException: Connection refused (Connection refused)
at java.rmi/sun.rmi.transport.tcp.TCPEndpoint.newSocket(Unknown Source)
at java.rmi/sun.rmi.transport.tcp.TCPChannel.createConnection(Unknown Source)
at java.rmi/sun.rmi.transport.tcp.TCPChannel.newConnection(Unknown Source)
at java.rmi/sun.rmi.server.UnicastRef.invoke(Unknown Source)
at java.rmi/java.rmi.server.RemoteObjectInvocationHandler.invoke(Unknown Source)
at com.sun.proxy.$Proxy40.getKeys(Unknown Source)
... 64 more
Caused by: java.net.ConnectException: Connection refused (Connection refused)
Verify /etc/hosts entries on the affected node to confirm if there exists an entry like below:
1
127.0.1.1 XXXXX
There is no entry in the /etc/hosts mapping the Node IP address with the jira.node.id value configured in the node’s cluster.properties file
If there are traces in the logs like below. You may refer Cluster Cache replication health check fails with error SocketException: Broken pipe exception
1
2
2021-06-04 18:26:25,830+0000 localq-reader-12 ERROR [c.a.j.c.distribution.localq.LocalQCacheOpReader] [LOCALQ] [VIA-COPY] Abandoning sending: LocalQCacheOp{cacheName='com.atlassian.jira.plugins.healthcheck.service.HeartBeatService.heartbeat', action=PUT, key=node2, value == null ? false, replicatePutsViaCopy=true, creationTimeInMillis=1622831185825} from cache replication queue: [queueId=queue_node1_2_164546f60261c7e4be0c5f5f9aaeec86_put, queuePath=/var/atlassian/application-data/jira-home/localq/queue_node1_2_164546f60261c7e4be0c5f5f9aaeec86_put], failuresCount: 1/1. Removing from queue. Error: java.rmi.MarshalException: error marshalling arguments; nested exception is:
java.net.SocketException: Broken pipe (Write failed)
Cause
These Errors suggest that there is some misconfiguration with the /etc/hosts entries. The node XXXXX is pointing to 127.0.1.1 but this IP address is not resolving to the node itself and hence, Connection refused Error.
Solution
Please comment out (add a '#' in front of lines) below entries in the /etc/hosts file.
1
#127.0.1.1 XXXXX
Update the /etc/hosts to map the Node IP Address to the jira.node.id configured in the cluster.properties.
Once these changes are done, please check again.
⚠️Please note, above changes requires a complete application node restart for the changes to take effect.
Was this helpful?