Always ON – Failed to join database to Availability group

Environment

-> Setting up Always ON,

-> It completed fine. Lets look at the Always ON dashboard,

-> Even though the wizard did not give any errors. The secondaries are not sync and databases are still in restoring mode.

-> Checking the status using below query,

select r.replica_server_name, r.endpoint_url,
rs.connected_state_desc, rs.last_connect_error_description, 
rs.last_connect_error_number, rs.last_connect_error_timestamp 
from sys.dm_hadr_availability_replica_states rs join sys.availability_replicas r
on rs.replica_id=r.replica_id
where rs.is_local=1
GO

Error Message,
Connection attempt failed with error: ‘10060(A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond.)’.

-> Checking SQL Server errorlog,

Synchronous Secondary – JBSAG2

2020-07-12 16:16:30.730 spid31s Starting up database ‘JBDB’.
2020-07-12 16:16:30.730 spid31s The database ‘JBDB’ is marked RESTORING and is in a state that does not allow recovery to be run.
2020-07-12 16:16:30.750 spid48s The state of the local availability replica in availability group ‘JBSAG’ has changed from ‘RESOLVING_NORMAL’ to ‘NOT_AVAILABLE’. The state changed because either the associated availability group has been deleted, or the local availability replica has been removed from another SQL Server instance. For more information, see the SQL Server error log or cluster log. If this is a Windows Server Failover Clustering (WSFC) availability group, you can also see the WSFC management console.
2020-07-12 16:25:35.710 spid52 The state of the local availability replica in availability group ‘JBSAG’ has changed from ‘NOT_AVAILABLE’ to ‘RESOLVING_NORMAL’. The state changed because the local availability replica is joining the availability group. For more information, see the SQL Server error log or cluster log. If this is a Windows Server Failover Clustering (WSFC) availability group, you can also see the WSFC management console.
2020-07-12 16:25:35.920 spid35s The state of the local availability replica in availability group ‘JBSAG’ has changed from ‘RESOLVING_NORMAL’ to ‘SECONDARY_NORMAL’. The state changed because the availability group state has changed in Windows Server Failover Clustering (WSFC). For more information, see the SQL Server error log or cluster log. If this is a Windows Server Failover Clustering (WSFC) availability group, you can also see the WSFC management console.
2020-07-12 16:25:45.930 spid36s A connection timeout has occurred while attempting to establish a connection to availability replica ‘JBSAG1’ with id [850DE16E-2C2B-43B1-BF52-6D8C5E2046FF]. Either a networking or firewall issue exists, or the endpoint address provided for the replica is not the database mirroring endpoint of the host server instance.
2020-07-12 16:26:12.480 spid52 Starting up database ‘JBDB’.
2020-07-12 16:26:12.490 spid52 The database ‘JBDB’ is marked RESTORING and is in a state that does not allow recovery to be run.

Asynchronous Secondary – JBSAG3

2020-07-12 16:25:47.000 spid57s A connection timeout has occurred while attempting to establish a connection to availability replica ‘JBSAG1’ with id [850DE16E-2C2B-43B1-BF52-6D8C5E2046FF]. Either a networking or firewall issue exists, or the endpoint address provided for the replica is not the database mirroring endpoint of the host server instance.
2020-07-12 16:26:47.060 spid53 Starting up database ‘JBDB’.
2020-07-12 16:26:47.070 spid53 The database ‘JBDB’ is marked RESTORING and is in a state that does not allow recovery to be run.

-> The error message refers to a connection issue between the primary and secondaries.

-> Endpoint uses port 5022. Lets check if port 5022 is not blocked using telnet.

‘telnet’ is not recognized as an internal or external command,
operable program or batch file.

-> Execute below command in powershell to enable telnet.

Install-WindowsFeature -name Telnet-Client

-> Telnet is not working for port 5022. It proves that port 5022 is not open.

-> Let us create a firewall rule to open port 5022.

-> Open “Windows Firewall with Advanced Security” and open “Inbound Rules” on JBSAG1, JBSAG2 and JBSAG3.

-> Once the port 5022 is opened. Lets try joining the secondary database as below,

-> Checking the Status now,

-> Executing below query,

select r.replica_server_name, r.endpoint_url,
rs.connected_state_desc, rs.last_connect_error_description, 
rs.last_connect_error_number, rs.last_connect_error_timestamp 
from sys.dm_hadr_availability_replica_states rs join sys.availability_replicas r
on rs.replica_id=r.replica_id
where rs.is_local=1
GO

-> It works fine now.

Thank You,
Vivek Janakiraman

Disclaimer:
The views expressed on this blog are mine alone and do not reflect the views of my company or anyone else. All postings on this blog are provided “AS IS” with no warranties, and confers no rights.

Always ON – Availability group database is “Not Synchronizing”

Environment

-> Database in Alwayson availability group changed to “Not Synchronizing”.

-> Checking SQL Server errorlog,

Primary – JBSAG1

2020-07-12 14:44:18.530 spid60s A connection timeout has occurred on a previously established connection to availability replica ‘JBSAG2’ with id [24CD3A43-55C6-4352-AA13-0B8B1283BA4D]. Either a networking or a firewall issue exists or the availability replica has transitioned to the resolving role.
2020-07-12 14:44:18.530 spid60s Always On Availability Groups connection with secondary database terminated for primary database ‘JBDB’ on the availability replica ‘JBSAG2’ with Replica ID: {24cd3a43-55c6-4352-aa13-0b8b1283ba4d}. This is an informational message only. No user action is required.
2020-07-12 14:44:18.530 spid67s DbMgrPartnerCommitPolicy::SetSyncAndRecoveryPoint: 24CD3A43-55C6-4352-AA13-0B8B1283BA4D:4
2020-07-12 14:44:18.720 spid67s DbMgrPartnerCommitPolicy::SetSyncAndRecoveryPoint: 24CD3A43-55C6-4352-AA13-0B8B1283BA4D:1
2020-07-12 14:44:18.720 spid67s DbMgrPartnerCommitPolicy::SetSyncAndRecoveryPoint: 24CD3A43-55C6-4352-AA13-0B8B1283BA4D:1
2020-07-12 14:44:18.720 spid67s DbMgrPartnerCommitPolicy::SetSyncState: 24CD3A43-55C6-4352-AA13-0B8B1283BA4D:1
2020-07-12 14:44:19.240 spid43s A connection timeout has occurred on a previously established connection to availability replica ‘JBSAG3’ with id [79FF3186-DF81-45FB-A24B-966BC6EE56EA]. Either a networking or a firewall issue exists or the availability replica has transitioned to the resolving role.
2020-07-12 14:44:19.240 spid43s Always On Availability Groups connection with secondary database terminated for primary database ‘JBDB’ on the availability replica ‘JBSAG3’ with Replica ID: {79ff3186-df81-45fb-a24b-966bc6ee56ea}. This is an informational message only. No user action is required.

Synchronous Secondary – JBSAG2

2020-07-12 14:44:18.530 spid37s Always On Availability Groups connection with primary database terminated for secondary database ‘JBDB’ on the availability replica ‘JBSAG1’ with Replica ID: {208c4c88-7468-4656-97df-4087c5443a74}. This is an informational message only. No user action is required.

Asynchronous Secondary – JBSAG3

2020-07-12 14:44:19.330 spid79s Always On Availability Groups connection with primary database terminated for secondary database ‘JBDB’ on the availability replica ‘JBSAG1’ with Replica ID: {208c4c88-7468-4656-97df-4087c5443a74}. This is an informational message only. No user action is required.
2020-07-12 14:44:29.350 spid28s A connection timeout has occurred while attempting to establish a connection to availability replica ‘JBSAG1’ with id [208C4C88-7468-4656-97DF-4087C5443A74]. Either a networking or firewall issue exists, or the endpoint address provided for the replica is not the database mirroring endpoint of the host server instance.

-> Executing below query,

select r.replica_server_name, r.endpoint_url,
rs.connected_state_desc, rs.last_connect_error_description,
rs.last_connect_error_number, rs.last_connect_error_timestamp
from sys.dm_hadr_availability_replica_states rs join sys.availability_replicas r
on rs.replica_id=r.replica_id
where rs.is_local=1

Primary – JBSAG1

Synchronous Secondary – JBSAG2

Error Message,
Connection attempt failed with error: ‘10060(A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond.)’.

Asynchronous Secondary – JBSAG3

Error Message,
Connection attempt failed with error: ‘10060(A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond.)’.

-> The error message refers to a connection issue between the primary and secondaries.

-> Endpoint uses port 5022. Lets check if port 5022 is not blocked using telnet.

‘telnet’ is not recognized as an internal or external command,
operable program or batch file.

-> Execute below command in powershell to enable telnet.

Install-WindowsFeature -name Telnet-Client

-> Telnet is not working for port 5022. It proves that port 5022 is not open.

-> Let us create a firewall rule to open port 5022.

-> Open “Windows Firewall with Advanced Security” and open “Inbound Rules” on JBSAG1, JBSAG2 and JBSAG3.

-> Wait for 5 mins and check the status now by executing below query,

select r.replica_server_name, r.endpoint_url,
rs.connected_state_desc, rs.last_connect_error_description,
rs.last_connect_error_number, rs.last_connect_error_timestamp
from sys.dm_hadr_availability_replica_states rs join sys.availability_replicas r
on rs.replica_id=r.replica_id
where rs.is_local=1

-> After opening port 5022, everything seems to be working fine.

Thank You,
Vivek Janakiraman

Disclaimer:
The views expressed on this blog are mine alone and do not reflect the views of my company or anyone else. All postings on this blog are provided “AS IS” with no warranties, and confers no rights.

Database Mirroring login attempt failed with error: ‘Connection handshake failed. An OS call failed: (8009030c) 0x8009030c.

Environment

-> Server JBAG1 is the primary replica and Server JBAG2 is the secondary replica. Server JBAG1 and JBAG2 are part of a failover cluster without shared storage. Server JBAG1 and JBAG2 participate in Always on availability group using synchronous commit mode with automatic failover.

-> Database JBDB is part of a Availability group JBAG. Database in Primary replica seems synchronized, but Replica JBAG2\IN01 is down.

-> Database in secondary replica is not-synchronizing.

-> lets look at the Always on dashboard,

-> Checking the SQL Server errorlog,

JBAG1\IN01

2020-07-03 07:34:13.345 Logon Database Mirroring login attempt failed with error: 'Connection handshake failed. An OS call failed: (8009030c) 0x8009030c(The logon attempt failed). State 67.'.  

2020-07-03 07:35:11.500 spid42s A connection timeout has occurred on a previously established connection to availability replica 'JBAG2\IN01' with id [FDBE71F5-C8F7-4A98-AE01-3B4E23451908]. Either a networking or a firewall issue exists or the availability replica has transitioned 

JBAG2\IN01

2020-07-03 07:35:12.380 spid77s A connection timeout has occurred while attempting to establish a connection to availability replica 'JBAG1\IN01' with id [FA3748D6-57FE-4247-8F92-245D2C40418A]. Either a networking or firewall issue exists, or the endpoint address provided for the replica

-> The error message in primary replica JBAG1\IN01 ” 2020-07-03 07:34:13.345 Logon Database Mirroring login attempt failed with error: ‘Connection handshake failed. An OS call failed: (8009030c) 0x8009030c(The logon attempt failed). State 67.’.  [CLIENT: 192.168.0.33]” refers that the primary and secondary replica is not able to talk to each other.

-> Logged into JBAG1 and pinged JBAG2,

-> It was pinging fine. Logged into JBAG2 and tried IPCONFIG,

-> IPAddress for JBAG2 as per ipconfig is 192.168.0.33. There seems to be some issue in JBAG1 where it is resolving to a wrong IPaddress for JBAG2.

-> I checked the hosts file in location C:\windows\System32\drivers\etc on both servers JBAG1 and JBAG2. I was able to see a wrong entry for JBAG2 in the hostfile.

-> I removed the wrong entry and saved the hosts file. It all started working fine,

-> There are scenarios where the same error happens, but pinging to server resolves the correct IPAddress. In those cases verify if SQL Service account password was changed recently and not updated in SQL Server configuration manager. In scenarios like this you will have to try restarting SQL Services in secondary server and check if SQL comes online. If it fails with an credential issue, then update the password in SQL Server configuration manager and that solves the issue.

-> The same error can happen if SQL Service account doesn’t have proper access to the other instance. Verify if access is provisioned properly on all servers in the availability group and provide access if required.

Thank You,
Vivek Janakiraman

Disclaimer:
The views expressed on this blog are mine alone and do not reflect the views of my company or anyone else. All postings on this blog are provided “AS IS” with no warranties, and confers no rights.