Cannot discover nodes, returning empty list AWS Hazelcast Discovery

Platform Notice: Data Center Only - This article only applies to Atlassian products on the Data Center platform.

Note that this KB was created for the Data Center version of the product. Data Center KBs for non-Data-Center-specific features may also work for Server versions of the product, however they have not been tested. Support for Server* products ended on February 15th 2024. If you are running a Server product, you can visit the Atlassian Server end of support announcement to review your migration options.

*Except Fisheye and Crucible

Summary

Confluence Data Center fails to discover nodes with a Cannot discover nodes, returning empty list warning followed by a connect timed out stack trace:

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 2020-03-19 16:11:23,627 WARN [Catalina-utility-1] [hazelcast.aws.utility.RetryUtils] log Couldn't connect to the AWS service, [1] retrying in 1 seconds... 2020-03-19 16:11:35,134 WARN [Catalina-utility-1] [hazelcast.aws.utility.RetryUtils] log Couldn't connect to the AWS service, [2] retrying in 2 seconds... 2020-03-19 16:11:47,396 WARN [Catalina-utility-1] [hazelcast.aws.utility.RetryUtils] log Couldn't connect to the AWS service, [3] retrying in 3 seconds... 2020-03-19 16:12:00,784 WARN [Catalina-utility-1] [hazelcast.aws.utility.RetryUtils] log Couldn't connect to the AWS service, [4] retrying in 5 seconds... 2020-03-19 16:12:15,857 WARN [Catalina-utility-1] [hazelcast.aws.utility.RetryUtils] log Couldn't connect to the AWS service, [5] retrying in 7 seconds... 2020-03-19 16:12:33,458 WARN [Catalina-utility-1] [hazelcast.aws.utility.RetryUtils] log Couldn't connect to the AWS service, [6] retrying in 11 seconds... 2020-03-19 16:12:54,857 WARN [Catalina-utility-1] [hazelcast.aws.utility.RetryUtils] log Couldn't connect to the AWS service, [7] retrying in 17 seconds... 2020-03-19 16:13:21,953 WARN [Catalina-utility-1] [hazelcast.aws.utility.RetryUtils] log Couldn't connect to the AWS service, [8] retrying in 25 seconds... 2020-03-19 16:13:57,589 WARN [Catalina-utility-1] [hazelcast.aws.utility.RetryUtils] log Couldn't connect to the AWS service, [9] retrying in 38 seconds... 2020-03-19 16:14:46,030 WARN [Catalina-utility-1] [hazelcast.aws.utility.RetryUtils] log Couldn't connect to the AWS service, [10] retrying in 57 seconds... 2020-03-19 16:15:53,699 WARN [Catalina-utility-1] [com.hazelcast.aws.AwsDiscoveryStrategy] log Cannot discover nodes, returning empty list com.hazelcast.core.HazelcastException: java.net.SocketTimeoutException: connect timed out at com.hazelcast.util.ExceptionUtil$1.create(ExceptionUtil.java:40) at com.hazelcast.util.ExceptionUtil.peel(ExceptionUtil.java:124) at com.hazelcast.util.ExceptionUtil.peel(ExceptionUtil.java:69) at com.hazelcast.util.ExceptionUtil.rethrow(ExceptionUtil.java:129) at com.hazelcast.aws.utility.RetryUtils.retry(RetryUtils.java:56) at com.hazelcast.aws.impl.DescribeInstances.callServiceWithRetries(DescribeInstances.java:272) at com.hazelcast.aws.impl.DescribeInstances.execute(DescribeInstances.java:262) at com.hazelcast.aws.AWSClient.getAddresses(AWSClient.java:57) at com.hazelcast.aws.AwsDiscoveryStrategy.discoverNodes(AwsDiscoveryStrategy.java:146) at com.hazelcast.spi.discovery.impl.DefaultDiscoveryService.discoverNodes(DefaultDiscoveryService.java:71) at com.hazelcast.internal.cluster.impl.DiscoveryJoiner.getPossibleAddresses(DiscoveryJoiner.java:70) at com.hazelcast.internal.cluster.impl.DiscoveryJoiner.getPossibleAddressesForInitialJoin(DiscoveryJoiner.java:59) at com.hazelcast.cluster.impl.TcpIpJoiner.joinViaPossibleMembers(TcpIpJoiner.java:131) at com.hazelcast.cluster.impl.TcpIpJoiner.doJoin(TcpIpJoiner.java:90) at com.hazelcast.internal.cluster.impl.AbstractJoiner.join(AbstractJoiner.java:135) at com.hazelcast.instance.Node.join(Node.java:767) at com.hazelcast.instance.Node.start(Node.java:411) at com.hazelcast.instance.HazelcastInstanceImpl.<init>(HazelcastInstanceImpl.java:131) at com.hazelcast.instance.HazelcastInstanceFactory.constructHazelcastInstance(HazelcastInstanceFactory.java:202) at com.hazelcast.instance.HazelcastInstanceFactory.newHazelcastInstance(HazelcastInstanceFactory.java:181) at com.hazelcast.instance.HazelcastInstanceFactory.newHazelcastInstance(HazelcastInstanceFactory.java:131) at com.hazelcast.core.Hazelcast.newHazelcastInstance(Hazelcast.java:57) at com.atlassian.confluence.cluster.hazelcast.HazelcastClusterManager.startCluster(HazelcastClusterManager.java:344) at com.atlassian.confluence.cluster.hazelcast.HazelcastClusterManager.reconfigure(HazelcastClusterManager.java:316) at com.atlassian.confluence.cluster.DefaultClusterConfigurationHelper.bootstrapCluster(DefaultClusterConfigurationHelper.java:407) at com.atlassian.confluence.setup.DefaultBootstrapManager.afterConfigurationLoaded(DefaultBootstrapManager.java:831) at com.atlassian.config.bootstrap.DefaultAtlassianBootstrapManager.init(DefaultAtlassianBootstrapManager.java:75) at com.atlassian.confluence.setup.DefaultBootstrapManager.init(DefaultBootstrapManager.java:188) at com.atlassian.config.util.BootstrapUtils.init(BootstrapUtils.java:36) at com.atlassian.confluence.setup.ConfluenceConfigurationListener.initialiseBootstrapContext(ConfluenceConfigurationListener.java:133) at com.atlassian.confluence.setup.ConfluenceConfigurationListener.contextInitialized(ConfluenceConfigurationListener.java:64) at org.apache.catalina.core.StandardContext.listenerStart(StandardContext.java:4682) at org.apache.catalina.core.StandardContext.startInternal(StandardContext.java:5143) at org.apache.catalina.util.LifecycleBase.start(LifecycleBase.java:183) at org.apache.catalina.core.ContainerBase$StartChild.call(ContainerBase.java:1384) at org.apache.catalina.core.ContainerBase$StartChild.call(ContainerBase.java:1374) at java.util.concurrent.FutureTask.run(Unknown Source) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(Unknown Source) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(Unknown Source) at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61) at java.lang.Thread.run(Unknown Source) Caused by: java.net.SocketTimeoutException: connect timed out at java.net.PlainSocketImpl.socketConnect(Native Method) at java.net.AbstractPlainSocketImpl.doConnect(Unknown Source) at java.net.AbstractPlainSocketImpl.connectToAddress(Unknown Source) at java.net.AbstractPlainSocketImpl.connect(Unknown Source) at java.net.SocksSocketImpl.connect(Unknown Source) at java.net.Socket.connect(Unknown Source) at sun.security.ssl.SSLSocketImpl.connect(Unknown Source) at sun.net.NetworkClient.doConnect(Unknown Source) at sun.net.www.http.HttpClient.openServer(Unknown Source) at sun.net.www.http.HttpClient.openServer(Unknown Source) at sun.net.www.protocol.https.HttpsClient.<init>(Unknown Source) at sun.net.www.protocol.https.HttpsClient.New(Unknown Source) at sun.net.www.protocol.https.AbstractDelegateHttpsURLConnection.getNewHttpClient(Unknown Source) at sun.net.www.protocol.http.HttpURLConnection.plainConnect0(Unknown Source) at sun.net.www.protocol.http.HttpURLConnection.plainConnect(Unknown Source) at sun.net.www.protocol.https.AbstractDelegateHttpsURLConnection.connect(Unknown Source) at sun.net.www.protocol.https.HttpsURLConnectionImpl.connect(Unknown Source) at com.hazelcast.aws.impl.DescribeInstances.callService(DescribeInstances.java:291) at com.hazelcast.aws.impl.DescribeInstances$1.call(DescribeInstances.java:276) at com.hazelcast.aws.impl.DescribeInstances$1.call(DescribeInstances.java:272) at com.hazelcast.aws.utility.RetryUtils.retry(RetryUtils.java:52) ... 38 more

Environment

  • Confluence Data Center

  • AWS node discovery

  • New node is added to the cluster

Diagnosis

Adding the following to <Confluence-Install>\conf\logging.properties before restarting Confluence to see what request is timing out:

1 2 sun.net.www.protocol.http.HttpURLConnection.level = FINEST sun.net.www.protocol.http.HttpURLConnection.handlers = java.util.logging.ConsoleHandler

Replace debugging level from FINE to FINEST for the below entry in the same file

1 java.util.logging.ConsoleHandler.level = FINEST

Cause 1

Causes will vary, but in one case we saw that there was an HTTP NULL response was returned from GET /latest/meta-data/iam/security-credentials/

1 19-Mar-2020 17:13:10.135 FINE [Catalina-utility-1] sun.net.www.protocol.http.HttpURLConnection.writeRequests sun.net.www.MessageHeader@123456 pairs: {GET /latest/meta-data/iam/security-credentials/test-iam HTTP/1.1: null}{User-Agent: Java/1.8.0_171}{Host: 123.456.789.123}{Accept: text/html, image/gif, image/jpeg, *; q=.2, */*; q=.2}{Connection: keep-alive}

Cause 2

If you see messages like this, check if the IAM role in use and its permissions are correct, update <Confluence-local-home>/confluence.cfg.xml accordingly if needed:

1 2 3 4 5 6 7 8 com.hazelcast.config.InvalidConfigurationException: Unable to retrieve credentials from IAM Role: <IAM-Role-Name> at com.hazelcast.aws.impl.DescribeInstances.fillKeysFromIamRole(DescribeInstances.java:134) ... Caused by: com.hazelcast.config.InvalidConfigurationException: Unable to lookup role in URI: http://169.254.169.254/latest/meta-data/iam/security-credentials/<IAM-Role-Name> at com.hazelcast.aws.utility.MetadataUtil.retrieveMetadataFromURI(MetadataUtil.java:78) ... Caused by: java.io.FileNotFoundException: http://169.254.169.254/latest/meta-data/iam/security-credentials/<IAM-Role-Name> at sun.net.www.protocol.http.HttpURLConnection.getInputStream0(HttpURLConnection.java:1893)

Resolution

For Cause 1, adding proxy connection information to your setenv file on all nodes to allow the connection to complete:

1 2 3 4 CATALINA_OPTS="-Dhttp.nonProxyHosts=localhost\|169.254.170.2\|169.254.169.254\|127.0.0.1 ${CATALINA_OPTS}" CATALINA_OPTS="-Dhttps.nonProxyHosts=localhost\|169.254.170.2\|169.254.169.254\|127.0.0.1 ${CATALINA_OPTS}" CATALINA_OPTS="-Dhttp.proxyHost=<the proxy url> -Dhttp.proxyPort=<the proxy port> ${CATALINA_OPTS}" CATALINA_OPTS="-Dhttps.proxyHost=<the proxy url> -Dhttps.proxyPort=<the proxy port> ${CATALINA_OPTS}"

Restart Confluence after this has been applied in order to resolve the issue.

For Cause 2, check if the IAM role name is correct in <Confluence-local-home>/confluence.cfg.xml.

Updated on April 8, 2025

Still need help?

The Atlassian Community is here for you.