Cluster Cache replication failure due to Unresolved Host

Platform Notice: Data Center Only - This article only applies to Atlassian products on the Data Center platform.

Note that this KB was created for the Data Center version of the product. Data Center KBs for non-Data-Center-specific features may also work for Server versions of the product, however they have not been tested. Support for Server* products ended on February 15th 2024. If you are running a Server product, you can visit the Atlassian Server end of support announcement to review your migration options.

*Except Fisheye and Crucible

Summary

JIRA Data Center instance health check shows "Cluster Cache Replication" health check failure. This failure suggests that nodes in the Cluster are not able to communicate.

1 2 3 4 5 Name: Cluster Cache Replication NodeId: null Is healthy: false Failure reason: The node XXXXX is not replicating Severity: CRITICALAdditional links: []

Environment

Data Center instances having more than one node in the cluster.

Diagnosis

  • Review atlassian-jira.log for the affected node showing “Cluster Cache Replication” health check warning. Following traces are noticed in the logs:

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 2023-06-01 11:58:15,055+0200 main WARN [c.a.jira.util.JiraUtils] IP/Hostname address cannot be calculated for this host. Please fix this. . . 2023-06-01 11:58:15,180+0200 main ERROR [n.sf.ehcache.Cache] Unable to set localhost. This prevents creation of a GUID. Cause was: XXXXX: XXXXX: Name or service not known java.net.UnknownHostException: XXXXX: XXXXX: Name or service not known . . . 2023-06-01 11:58:15,971+0200 main WARN [n.sf.ehcache.CacheManager] Cache com.atlassian.jira.task.TaskManagerImpl.taskMaprequested bootstrap but a CacheException occured. Error bootstrapping from remote peer. Message was: java.lang.reflect.InvocationTargetException net.sf.ehcache.distribution.RemoteCacheException: Error bootstrapping from remote peer. Message was: java.lang.reflect.InvocationTargetException at net.sf.ehcache.distribution.RMIBootstrapCacheLoader.doLoad(RMIBootstrapCacheLoader.java:176) . . Caused by: java.rmi.ConnectException: Connection refused to host: 127.0.1.1; nested exception is: java.net.ConnectException: Connection refused (Connection refused) at java.rmi/sun.rmi.transport.tcp.TCPEndpoint.newSocket(Unknown Source) at java.rmi/sun.rmi.transport.tcp.TCPChannel.createConnection(Unknown Source) at java.rmi/sun.rmi.transport.tcp.TCPChannel.newConnection(Unknown Source) at java.rmi/sun.rmi.server.UnicastRef.invoke(Unknown Source) at java.rmi/java.rmi.server.RemoteObjectInvocationHandler.invoke(Unknown Source) at com.sun.proxy.$Proxy40.getKeys(Unknown Source) ... 64 more Caused by: java.net.ConnectException: Connection refused (Connection refused)
  • Verify /etc/hosts entries on the affected node to confirm if there exists an entry like below:

1 127.0.1.1 XXXXX
  • There is no entry in the /etc/hosts mapping the Node IP address with the jira.node.id value configured in the node’s cluster.properties file

If there are traces in the logs like below. You may refer Cluster Cache replication health check fails with error SocketException: Broken pipe exception

1 2 2021-06-04 18:26:25,830+0000 localq-reader-12 ERROR [c.a.j.c.distribution.localq.LocalQCacheOpReader] [LOCALQ] [VIA-COPY] Abandoning sending: LocalQCacheOp{cacheName='com.atlassian.jira.plugins.healthcheck.service.HeartBeatService.heartbeat', action=PUT, key=node2, value == null ? false, replicatePutsViaCopy=true, creationTimeInMillis=1622831185825} from cache replication queue: [queueId=queue_node1_2_164546f60261c7e4be0c5f5f9aaeec86_put, queuePath=/var/atlassian/application-data/jira-home/localq/queue_node1_2_164546f60261c7e4be0c5f5f9aaeec86_put], failuresCount: 1/1. Removing from queue. Error: java.rmi.MarshalException: error marshalling arguments; nested exception is: java.net.SocketException: Broken pipe (Write failed)

Cause

These Errors suggest that there is some misconfiguration with the /etc/hosts entries. The node XXXXX is pointing to 127.0.1.1 but this IP address is not resolving to the node itself and hence, Connection refused Error.

Solution

  • Please comment out (add a '#' in front of lines) below entries in the /etc/hosts file.

1 #127.0.1.1 XXXXX
  • Update the /etc/hosts to map the Node IP Address to the jira.node.id configured in the cluster.properties.

  • Once these changes are done, please check again.

⚠️Please note, above changes requires a complete application node restart for the changes to take effect.

Updated on April 8, 2025

Still need help?

The Atlassian Community is here for you.