This secondary replica is not connected to the primary replica. The connected state is DISCONNECTED

Environment

-> Database server JBSAG2 and JBSAG1 are on primary datacentre located in East US.

-> Database server JBSAG3 is on secondary datacentre in Korea Central.

-> Database servers JBSAG1, JBSAG2 and JBSAG3 are part of failover cluster without shared storage.

-> Database in Asynchronous secondary replica JBSAG3 exhibits a status of “Not Synchronizing”. Below is the snapshot of Always on Dashboard,

-> Below are the messages that can be found,

This secondary replica is not connected to the primary replica. The connected state is DISCONNECTED.

At least one availability database on this availability replica has an unhealthy data synchronization state. If this is an asynchronous-commit availability replica, all availability databases should be in the SYNCHRONIZING state. If this is a synchronous-commit availability replica, all availability databases should be in the SYNCHRONIZED state.

The data synchronization state of this availability database is unhealthy. On an asynchronous-commit availability replica, every availability database should be in the SYNCHRONIZING state. On a synchronous-commit replica, every availability database should be in the SYNCHRONIZED state.

-> Details from SQL Server Error log in Primary JBSAG2.

2021-02-09 15:46:54.270 spid34s Automatic seeding of availability database ‘JBDB’ in availability group ‘JBSwiki’ failed with a transient error. The operation will be retried.
2021-02-09 15:46:54.270 spid34s Always On: DebugTraceVarArgs AR ‘[HADR] [Primary] operation on replicas [698E9144-D01D-4D34-B898-7FBC32873ABD]->[9B4A72A5-618B-4A9B-81D0-A54826AAEE36], database [JBDB], remote endpoint [TCP://11.0.2.41:5022], source operation [0EFB986D-7156-4BEE-82DC-162265BC3CD8]: Seeding task failed with result 0x80000000.’
2021-02-09 15:46:54.270 spid34s Always On: DebugTraceVarArgs AR ‘Seeding is canceled with cancelReason = 108
2021-02-09 15:46:54.270 spid34s Always On: DebugTraceVarArgs AR ‘[HADR] [Primary] operation on replicas [698E9144-D01D-4D34-B898-7FBC32873ABD]->[9B4A72A5-618B-4A9B-81D0-A54826AAEE36], database [JBDB], remote endpoint [TCP://11.0.2.41:5022], source operation [0EFB986D-7156-4BEE-82DC-162265BC3CD8]: Transitioning from [CHECK_IF_SEEDING_NEEDED] to [FAILED].’
2021-02-09 15:46:54.280 spid34s Always On: DebugTraceVarArgs AR ‘[HADR] [Primary] operation on replicas [698E9144-D01D-4D34-B898-7FBC32873ABD]->[9B4A72A5-618B-4A9B-81D0-A54826AAEE36], database [JBDB], remote endpoint [TCP://11.0.2.41:5022], source operation [0EFB986D-7156-4BEE-82DC-162265BC3CD8]: Seeding encountered a transient failure, state ‘108’, maximum retries exceeded’
2021-02-09 15:47:23.960 spid14s Always On: DebugTraceVarArgs AR ‘[HADR] [Primary] operation on replicas [698E9144-D01D-4D34-B898-7FBC32873ABD]->[9B4A72A5-618B-4A9B-81D0-A54826AAEE36], database [JBDB], remote endpoint [TCP://11.0.2.41:5022], source operation [D9301C18-F2D7-4CBE-A2AD-D688D02EA614]: Transitioning from [PENDING] to [CHECK_IF_SEEDING_NEEDED].’

-> Executing below query and checking the status of all replicas,

select r.replica_server_name, r.endpoint_url,
rs.connected_state_desc, rs.last_connect_error_description,
rs.last_connect_error_number, rs.last_connect_error_timestamp
from sys.dm_hadr_availability_replica_states rs join sys.availability_replicas r
on rs.replica_id=r.replica_id

-> Last_Connect_error_description is as below,

Connection attempt failed with error: ‘10060(A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond.)’.

-> Last_Connect_error_number is 10060 which refers to “connection timed out”.

-> It seems like connection to secondary replica cannot be made by availability group.

-> Messages in SQL server error log shows that there is an issue with endpoint URL [TCP://11.0.2.41:5022] which is for secondary replica JBSAG3.

-> I checked the hosts file in location C:\Windows\System32\drivers\etc to make sure there are no invalid entries for JBSAG3 replica. As expected, there were no bad entries for JBSAG3 in hosts file.

-> I tried pinging JBSAG3 from command prompt to see if it is resolving to correctly.

-> The IP address for replica JBSAG3 is 11.0.2.4. But it seems like the endpoint URL is TCP://11.0.2.41:5022. It seems like the endpoint URL is wrong.

-> I changed it to reflect the correct IP using below query on Primary replica JBSAG2,

USE MASTER
GO
ALTER AVAILABILITY GROUP [JBSwiki] 
MODIFY REPLICA ON 'JBSAG3' WITH (ENDPOINT_URL = 'TCP://11.0.2.4:5022');

-> Once after the above changes. Always On availability group turned healthy.

Thank You,
Vivek Janakiraman

Disclaimer:
The views expressed on this blog are mine alone and do not reflect the views of my company or anyone else. All postings on this blog are provided “AS IS” with no warranties, and confers no rights.

Leave a Reply