Always On – One or more databases are not synchronized or have not joined the availability group. If the availability replica uses the asynchronous-commit mode, consider performing a forced manual failover (with possible data loss).

Environment

-> Database server JBSAG1 and JBSAG2 are located in East US.

-> Database servers JBSAG1 and JBSAG2 are part of failover cluster without shared storage.

-> Database Server JBSAG1 which is the primary replica goes down around 28 Feb 2021 14:27:13.060. Application team advised that their application had issue connecting to the database JBDB after JBSAG1 went down.

-> Upon checking, Availability group status is “Resolving” and below is the status of replica JBSAG2,

-> In the meantime, Database server JBSAG1 comes online and Availability group comes online on JBSAG1 again. Application team were able to access the database without any issues. So it is clear that database JBDB as part of availability group JBSWIKI was not accessible for 24 minutes between 28 Feb 2021 14:27:13.060 to 28 Feb 2021 14:52:41.920

-> We need to understand the reason why Availability group did not failover automatically to Synchronous secondary replica JBSAG2 when database server JBSAG1 went down.

-> Checking SQL Server error log,

JBSAG1JBSAG2
2021-02-28 14:26:59.400 spid9s Always On: The availability replica manager is going offline because SQL Server is shutting down. This is an informational message only. No user action is required.
2021-02-28 14:26:59.400 Server The availability group 'JBSWIKI' is being asked to stop the lease renewal because the availability group is going offline. This is an informational message only. No user action is required.
2021-02-28 14:26:59.400 spid9s Always On: The local replica of availability group 'JBSWIKI' is stopping. This is an informational message only. No user action is required.
2021-02-28 14:26:59.400 spid9s SQL Server is terminating in response to a 'stop' request from Service Control Manager. This is an informational message only. No user action is required.
2021-02-28 14:26:59.520 spid35s The Database Mirroring endpoint has stopped listening for connections.
2021-02-28 14:26:59.520 spid35s Service Broker manager has shut down.
2021-02-28 14:27:09.510 spid9s .NET Framework runtime has been stopped.
2021-02-28 14:27:13.060 spid9s SQL Trace was stopped due to server shutdown. Trace ID = '1'. This is an informational message only; no user action is required.


-> As per above logs, it is clear that Availability Group goes offline due to database server/SQL server instance going down around 28 Feb 2021 14:26:59.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

2021-02-28 14:51:18.480 spid25s Starting up database 'JBDB'.
2021-02-28 14:51:18.480 spid26s Starting up database 'msdb'.
2021-02-28 14:51:18.480 spid27s Always On: The availability replica manager is starting. This is an informational message only. No user action is required.
2021-02-28 14:51:18.480 spid11s Starting up database 'mssqlsystemresource'.
2021-02-28 14:51:18.480 spid27s Always On Availability Groups: Waiting for local Windows Server Failover Clustering node to start. This is an informational message only. No user action is required.
2021-02-28 14:51:18.480 spid27s Always On Availability Groups: Local Windows Server Failover Clustering node started. This is an informational message only. No user action is required.
2021-02-28 14:51:18.480 spid27s Always On Availability Groups: Waiting for local Windows Server Failover Clustering node to come online. This is an informational message only. No user action is required.
2021-02-28 14:51:18.480 spid27s Always On Availability Groups: Local Windows Server Failover Clustering node is online. This is an informational message only. No user action is required.
2021-02-28 14:51:18.480 spid25s Skipping the default startup of database 'JBDB' because the database belongs to an availability group (Group ID: 65536). The database will be started by the availability group. This is an informational message only. No user action is required.
2021-02-28 14:51:18.480 spid27s Always On: The availability replica manager is waiting for the instance of SQL Server to allow client connections. This is an informational message only. No user action is required.
2021-02-28 14:51:18.480 spid27s Always On: The local replica of availability group 'JBSWIKI' is starting. This is an informational message only. No user action is required.
2021-02-28 14:51:18.490 spid27s The state of the local availability replica in availability group 'JBSWIKI' has changed from 'NOT_AVAILABLE' to 'RESOLVING_NORMAL'. The state changed because the local instance of SQL Server is starting up. For more information, see the SQL Server error log or cluster log. If this is a Windows Server Failover Clustering (WSFC) availability group, you can also see the WSFC management console.
2021-02-28 14:51:18.500 spid11s The resource database build version is 15.00.4073. This is an informational message only. No user action is required.
2021-02-28 14:51:18.540 spid11s Starting up database 'model'.
2021-02-28 14:51:18.560 spid51 Error: 46906, Severity: 16, State: 1.
2021-02-28 14:51:18.560 spid51 Unable to retrieve registry value 'NodeRole' from Windows registry key 'Software\Microsoft\Microsoft SQL Server\MSSQL15.MSSQLSERVER\Polybase\Configuration': (null).
2021-02-28 14:51:18.700 spid52 Always On: The local replica of availability group 'JBSWIKI' is preparing to transition to the primary role. This is an informational message only. No user action is required.
2021-02-28 14:51:18.720 spid52 The state of the local availability replica in availability group 'JBSWIKI' has changed from 'RESOLVING_NORMAL' to 'PRIMARY_PENDING'. The state changed because the availability group is coming online. For more information, see the SQL Server error log or cluster log. If this is a Windows Server Failover Clustering (WSFC) availability group, you can also see the WSFC management console.

.

.
2021-02-28 14:51:18.750 Server The state of the local availability replica in availability group 'JBSWIKI' has changed from 'PRIMARY_PENDING' to 'PRIMARY_NORMAL'. The state changed because the local replica has completed processing Online command from Windows Server Failover Clustering (WSFC). For more information, see the SQL Server error log or cluster log. If this is a Windows Server Failover Clustering (WSFC) availability group, you can also see the WSFC management console.
2021-02-28 14:51:18.760 spid25s DbMgrPartnerCommitPolicy::SetSyncAndRecoveryPoint: 531FB0D2-4930-4E75-B1FE-2021-02-28 14:51:18.790 spid25s Starting up database 'JBDB'.
2021-02-28 14:51:19.470 spid25s Parallel redo is started for database 'JBDB' with worker pool size [4].
2021-02-28 14:51:21.470 spid25s DbMgrPartnerCommitPolicy::SetSyncAndRecoveryPoint: 531FB0D2-4930-4E75-B1FE-2875DCCB9FFB:1
2021-02-28 14:51:21.480 spid25s DbMgrPartnerCommitPolicy::SetSyncAndRecoveryPoint: 531FB0D2-4930-4E75-B1FE-2875DCCB9FFB:1
2021-02-28 14:51:21.480 spid25s Recovery of database 'JBDB' (5) is 0% complete (approximately 3276 seconds remain). Phase 2 of 3. This is an informational message only. No user action is required.
.

2021-02-28 14:51:31.560 spid9s Recovery is complete. This is an informational message only. No user action is required.
2021-02-28 14:51:41.490 spid25s Recovery of database 'JBDB' (5) is 10% complete (approximately 191 seconds remain). Phase 2 of 3. This is an informational message only. No user action is required.
2021-02-28 14:51:41.840 spid68s A connection for availability group 'JBSWIKI' from availability replica 'JBSAG1' with id [1B5043DB-0107-4299-A2BA-9D156E056542] to 'JBSAG2' with id [531FB0D2-4930-4E75-B1FE-2875DCCB9FFB] has been successfully established. This is an informational message only. No user action is required.
2021-02-28 14:52:01.500 spid25s Recovery of database 'JBDB' (5) is 19% complete (approximately 171 seconds remain). Phase 2 of 3. This is an informational message only. No user action is required.
2021-02-28 14:52:21.510 spid25s Recovery of database 'JBDB' (5) is 29% complete (approximately 152 seconds remain). Phase 2 of 3. This is an informational message only. No user action is required.
2021-02-28 14:52:41.100 spid25s Recovery of database 'JBDB' (5) is 38% complete (approximately 131 seconds remain). Phase 2 of 3. This is an informational message only. No user action is required.
2021-02-28 14:52:41.140 spid25s Recovery of database 'JBDB' (5) is 38% complete (approximately 131 seconds remain). Phase 3 of 3. This is an informational message only. No user action is required.
2021-02-28 14:52:41.910 spid66s 12 transactions rolled back in database 'JBDB' (5:0). This is an informational message only. No user action is required.
2021-02-28 14:52:41.910 spid66s Recovery is writing a checkpoint in database 'JBDB' (5). This is an informational message only. No user action is required.
2021-02-28 14:52:41.920 spid66s Recovery completed for database JBDB (database ID 5) in 83 second(s) (analysis 1975 ms, redo 79613 ms, undo 764 ms [system undo 140 ms, regular undo 604 ms].) This is an informational message only. No user action is required.


-> Availability group JBSWIKI and database JBDB is online as per above logs.
2021-02-28 14:26:59.520 spid29s Always On: The local replica of availability group 'JBSWIKI' is going offline because the corresponding resource in the Windows Server Failover Clustering (WSFC) cluster is no longer online. This is an informational message only. No user action is required.
2021-02-28 14:26:59.520 spid29s The state of the local availability replica in availability group 'JBSWIKI' has changed from 'SECONDARY_NORMAL' to 'RESOLVING_NORMAL'. The state changed because the availability group state has changed in Windows Server Failover Clustering (WSFC). For more information, see the SQL Server error log or cluster log. If this is a Windows Server Failover Clustering (WSFC) availability group, you can also see the WSFC management console.
2021-02-28 14:26:59.520 spid65s The availability group database "JBDB" is changing roles from "SECONDARY" to "RESOLVING" because the mirroring session or availability group failed over due to role synchronization. This is an informational message only. No user action is required.
2021-02-28 14:26:59.520 spid65s State information for database 'JBDB' - Hardened Lsn: '(1558:871882:1)' Commit LSN: '(1558:175123:24)' Commit Time: 'Feb 28 2021 2:24PM'
2021-02-28 14:26:59.530 spid62s Always On Availability Groups connection with primary database terminated for secondary database 'JBDB' on the availability replica 'JBSAG1' with Replica ID: {1b5043db-0107-4299-a2ba-9d156e056542}. This is an informational message only. No user action is required.
2021-02-28 14:27:16.380 spid67 Always On: The local replica of availability group 'JBSWIKI' is preparing to transition to the primary role. This is an informational message only. No user action is required.
2021-02-28 14:27:16.380 spid67 The availability replica for availability group 'JBSWIKI' on this instance of SQL Server cannot become the primary replica. One or more databases are not synchronized or have not joined the availability group. If the availability replica uses the asynchronous-commit mode, consider performing a forced manual failover (with possible data loss). Otherwise, once all local secondary databases are joined and synchronized, you can perform a planned manual failover to this secondary replica (without data loss). For more information, see SQL Server Books Online.
 


-> Above logs clearly mentions that a failover attempt was performed. But JBSAG2 cannot become primary because one or more databases were not Synchronized.


2021-02-28 14:27:17.290 spid58 Always On: The local replica of availability group ‘JBSWIKI’ is preparing to transition to the resolving role. This is an informational message only. No user action is required.
2021-02-28 14:27:17.570 spid58 Always On: The local replica of availability group ‘JBSWIKI’ is preparing to transition to the primary role. This is an informational message only. No user action is required.
2021-02-28 14:27:17.580 spid58 The availability replica for availability group ‘JBSWIKI’ on this instance of SQL Server cannot become the primary replica. One or more databases are not synchronized or have not joined the availability group. If the availability replica uses the asynchronous-commit mode, consider performing a forced manual failover (with possible data loss). Otherwise, once all local secondary databases are joined and synchronized, you can perform a planned manual failover to this secondary replica (without data loss). For more information, see SQL Server Books Online.
2021-02-28 14:27:18.500 spid58 Always On: The local replica of availability group ‘JBSWIKI’ is preparing to transition to the resolving role. This is an informational message only. No user action is required.
. . . 2021-02-28 14:27:18.500 spid77       Always On: The local replica of availability group ‘JBSWIKI’ is preparing to transition to the primary role. This is an informational message only. No user action is required.

.
.
.

.
.
.
.
.
.

.
.

.
.

.
.

.
.
.

.
.

.
.

.
.
.
.
.
.
.
.
.
.
.

2021-02-28 14:50:08.230 spid77       The availability replica for availability group ‘JBSWIKI’ on this instance of SQL Server cannot become the primary replica. One or more databases are not synchronized or have not joined the availability group. If the availability replica uses the asynchronous-commit mode, consider performing a forced manual failover (with possible data loss). Otherwise, once all local secondary databases are joined and synchronized, you can perform a planned manual failover to this secondary replica (without data loss). For more information, see SQL Server Books Online.
2021-02-28 14:50:09.150 spid77       Always On: The local replica of availability group ‘JBSWIKI’ is preparing to transition to the resolving role. This is an informational message only. No user action is required.  

-> Message “The availability replica for availability group ‘JBSWIKI’ on this instance of SQL Server cannot become the primary replica. One or more databases are not synchronized or have not joined the availability group.” is present from 28 Feb 2021 14:27:16 to 14:50:09.                  

2021-02-28 14:51:18.730 spid29s      The state of the local availability replica in availability group ‘JBSWIKI’ has changed from ‘RESOLVING_NORMAL’ to ‘SECONDARY_NORMAL’.  The state changed because the availability group state has changed in Windows Server Failover Clustering (WSFC).  For more information, see the SQL Server error log or cluster log.  If this is a Windows Server Failover Clustering (WSFC) availability group, you can also see the WSFC management console.
2021-02-28 14:51:18.730 spid71s      The availability group database “JBDB” is changing roles from “RESOLVING” to “SECONDARY” because the mirroring session or availability group failed over due to role synchronization. This is an informational message only. No user action is required.  

-> JBSAG2 becomes secondary as per above logs on 28 Feb 2021 14:51:18.



















































































-> From above log analysis, It is pretty clear that the automatic failover of Availability group from JBSAG1 to JBSAG2 did not complete due to below message,

The availability replica for availability group 'JBSWIKI' on this instance of SQL Server cannot become the primary replica. One or more databases are not synchronized or have not joined the availability group. If the availability replica uses the asynchronous-commit mode, consider performing a forced manual failover (with possible data loss). Otherwise, once all local secondary databases are joined and synchronized, you can perform a planned manual failover to this secondary replica (without data loss). For more information, see SQL Server Books Online.

-> Automatic failover of Availability group JBSWIKI did not complete as availability database JBDB was not in SYNCHRONIZED state between JBSAG1 and JBSAG2. The Availability database should be synchronized to avoid any potential data loss during failovers.

-> The database JBDB may not be synchronized due to a large transaction that was happening on the Primary replica when the failover was performed or the secondary synchronous replica was not available due to a network issue or some other reason. In our case the huge transaction was an Index rebuild operation on a very huge Index when failover was attempted when Primary replica JBSAG1 went down.

-> There is nothing much we can do in these scenarios other than waiting for the primary replica to be online or perform a forced failover with data loss.

Thank You,
Vivek Janakiraman

Disclaimer:
The views expressed on this blog are mine alone and do not reflect the views of my company or anyone else. All postings on this blog are provided “AS IS” with no warranties, and confers no rights.

Always On Secondary Database in “Reverting \ In Recovery” state

Environment

-> Database server JBSAG1 and JBSAG2 are located in East US.

-> Database servers JBSAG1 and JBSAG2are part of failover cluster without shared storage.

-> Below is the status of Always On availability group currently,

-> Below performance monitor logs are currently executing on JBSAG1 and JBSAG2,

-> Rebuild index is executing on JBSAG1.

-> JBSAG1 goes down. An automatic failover of Availability group JBSWIKI happens from JBSAG1 to JBSAG2.

-> Database Server JBSAG1 comes online. JBSAG2 becomes current primary and JBSAG1 becomes secondary.

-> Checking the status of Always On availability group,

-> On Secondary JBSAG1, database JBDB is in “Reverting / In Recovery” State.

-> Executing below query on secondary replica JBSAG1,

select @@servername as SQLName,db_name(database_id) as DBName,synchronization_state_desc,database_state_desc
from sys.dm_hadr_database_replica_states where is_local=1

-> Executing same query on primary replica JBSAG2,

-> Below are the messages found in Always On Dashboard,

The availability group is not ready for automatic failover. The primary replica and a secondary replica are configured for automatic failover, however, the secondary replica is not ready for an automatic failover. Possibly the secondary replica is unavailable, or its data synchronization state is currently not in the SYNCHRONIZED synchronization state.

In this availability group, at least one secondary replica has a NOT SYNCHRONIZING synchronization state and is not receiving data from the primary replica.

In this availability group, at least one synchronous replica is not currently synchronized. The replica synchronization state could be either SYNCHONIZING or NOT SYNCHRONIZING.

-> Checking SQL Server errorlog,

JBSAG1

2021-02-28 06:21:25.340 spid9s Always On: The availability replica manager is going offline because SQL Server is shutting down. This is an informational message only. No user action is required.

-> Availability group is going offline as per above message.
2021-02-28 06:21:25.340 spid9s Always On: The local replica of availability group ‘JBSWIKI’ is stopping. This is an informational message only. No user action is required.
2021-02-28 06:21:25.340 Server The availability group ‘JBSWIKI’ is being asked to stop the lease renewal because the availability group is going offline. This is an informational message only. No user action is required.
2021-02-28 06:21:25.340 spid9s SQL Server is terminating in response to a ‘stop’ request from Service Control Manager. This is an informational message only. No user action is required.
2021-02-28 06:21:25.450 spid25s The Database Mirroring endpoint has stopped listening for connections.
2021-02-28 06:21:25.450 spid25s Service Broker manager has shut down.
2021-02-28 06:21:35.450 spid9s .NET Framework runtime has been stopped.
2021-02-28 06:21:38.110 spid9s SQL Trace was stopped due to server shutdown. Trace ID = ‘1’. This is an informational message only; no user action is required.
2021-02-28 06:21:38.710 spid19s The SQL Server Network Interface library successfully deregistered the Service Principal Name (SPN) [ MSSQLSvc/JBSAG1.JBS.COM:1433 ] for the SQL Server service.

-> SQL Services comes online on JBSAG1 and below are the logs,

2021-02-28 06:22:11.890 spid25s Always On Availability Groups: Waiting for local Windows Server Failover Clustering service to start. This is an informational message only. No user action is required.
2021-02-28 06:22:11.890 spid29s Starting up database ‘JBDB’.
2021-02-28 06:22:11.890 spid11s Starting up database ‘mssqlsystemresource’.
2021-02-28 06:22:11.890 spid25s Always On Availability Groups: Local Windows Server Failover Clustering service started. This is an informational message only. No user action is required.

2021-02-28 06:22:11.890 spid25s Always On Availability Groups: Waiting for local Windows Server Failover Clustering node to start. This is an informational message only. No user action is required.
2021-02-28 06:22:11.890 spid25s Always On Availability Groups: Local Windows Server Failover Clustering node started. This is an informational message only. No user action is required.


-> Windows cluster service is online.

2021-02-28 06:22:11.900 spid29s Skipping the default startup of database ‘JBDB’ because the database belongs to an availability group (Group ID: 65536). The database will be started by the availability group. This is an informational message only. No user action is required.
2021-02-28 06:22:11.900 spid25s Always On: The availability replica manager is waiting for the instance of SQL Server to allow client connections. This is an informational message only. No user action is required.
2021-02-28 06:22:11.900 spid25s Always On: The local replica of availability group ‘JBSWIKI’ is starting. This is an informational message only. No user action is required.
2021-02-28 06:22:11.910 spid25s The state of the local availability replica in availability group ‘JBSWIKI’ has changed from ‘NOT_AVAILABLE’ to ‘RESOLVING_NORMAL’. The state changed because the local instance of SQL Server is starting up. For more information, see the SQL Server error log or cluster log. If this is a Windows Server Failover Clustering (WSFC) availability group, you can also see the WSFC management console.
2021-02-28 06:22:11.940 spid11s The resource database build version is 15.00.4073. This is an informational message only. No user action is required.
2021-02-28 06:22:11.940 spid31s The state of the local availability replica in availability group ‘JBSWIKI’ has changed from ‘RESOLVING_NORMAL’ to ‘SECONDARY_NORMAL’. The state changed because the availability group state has changed in Windows Server Failover Clustering (WSFC). For more information, see the SQL Server error log or cluster log. If this is a Windows Server Failover Clustering (WSFC) availability group, you can also see the WSFC management console.


-> Replica JBSAG1 becomes secondary.

2021-02-28 06:22:12.530 spid32s Parallel redo is started for database ‘JBDB’ with worker pool size [4].
2021-02-28 06:22:12.530 spid32s Recovery of database ‘JBDB’ (5) is 0% complete (approximately 3464 seconds remain). Phase 2 of 3. This is an informational message only. No user action is required.


-> Recovery starts as per above message.

2021-02-28 06:22:21.950 spid33s A connection timeout has occurred while attempting to establish a connection to availability replica ‘JBSAG2’ with id [531FB0D2-4930-4E75-B1FE-2875DCCB9FFB]. Either a networking or firewall issue exists, or the endpoint address provided for the replica is not the database mirroring endpoint of the host server instance.
2021-02-28 06:22:22.020 spid25s A connection for availability group ‘JBSWIKI’ from availability replica ‘JBSAG1’ with id [1B5043DB-0107-4299-A2BA-9D156E056542] to ‘JBSAG2’ with id [531FB0D2-4930-4E75-B1FE-2875DCCB9FFB] has been successfully established. This is an informational message only. No user action is required.
2021-02-28 06:22:32.540 spid32s Recovery of database ‘JBDB’ (5) is 10% complete (approximately 180 seconds remain). Phase 2 of 3. This is an informational message only. No user action is required.
2021-02-28 06:22:52.540 spid32s Recovery of database ‘JBDB’ (5) is 20% complete (approximately 159 seconds remain). Phase 2 of 3. This is an informational message only. No user action is required.
2021-02-28 06:23:12.540 spid32s Recovery of database ‘JBDB’ (5) is 30% complete (approximately 139 seconds remain). Phase 2 of 3. This is an informational message only. No user action is required.
2021-02-28 06:23:13.770 spid32s Recovery of database ‘JBDB’ (5) is 30% complete (approximately 138 seconds remain). Phase 2 of 3. This is an informational message only. No user action is required.
2021-02-28 06:23:13.780 spid32s Always On Availability Groups connection with primary database established for secondary database ‘JBDB’ on the availability replica ‘JBSAG2’ with Replica ID: {531fb0d2-4930-4e75-b1fe-2875dccb9ffb}. This is an informational message only. No user action is required.

-> Database JBDB is restarted on JBSAG1 to synchronize with primary.

2021-02-28 06:39:31.040 spid32s Error: 35278, Severity: 17, State: 1.
2021-02-28 06:39:31.040 spid32s Availability database ‘JBDB’, which is in the secondary role, is being restarted to resynchronize with the current primary database. This is an informational message only. No user action is required.
2021-02-28 06:39:31.040 spid32s Parallel redo is shutdown for database ‘JBDB’ with worker pool size [4].

2021-02-28 06:39:31.650 spid32s Starting up database ‘JBDB’.
2021-02-28 06:39:35.240 spid32s Parallel redo is started for database ‘JBDB’ with worker pool size [4].
2021-02-28 06:39:35.270 spid32s Recovery of database ‘JBDB’ (5) is 0% complete (approximately 608 seconds remain). Phase 2 of 3. This is an informational message only. No user action is required.
2021-02-28 06:39:43.800 spid32s Recovery of database ‘JBDB’ (5) is 25% complete (approximately 26 seconds remain). Phase 2 of 3. This is an informational message only. No user action is required.
2021-02-28 06:39:44.200 spid32s Always On Availability Groups connection with primary database established for secondary database ‘JBDB’ on the availability replica ‘JBSAG2’ with Replica ID: {531fb0d2-4930-4e75-b1fe-2875dccb9ffb}. This is an informational message only. No user action is required.
2021-02-28 06:39:44.200 spid32s The recovery LSN (1558:4889:2) was identified for the database with ID 5. This is an informational message only. No user action is required.
2021-02-28 06:39:44.210 spid32s Recovery completed for database JBDB (database ID 5) in 13 second(s) (analysis 25 ms, redo 8533 ms, undo 0 ms [system undo 222122029433 ms, regular undo 217368571615 ms].) This is an informational message only. No user action is required.
2021-02-28 06:39:44.220 spid46s Always On Availability Groups connection with primary database established for secondary database ‘JBDB’ on the availability replica ‘JBSAG2’ with Replica ID: {531fb0d2-4930-4e75-b1fe-2875dccb9ffb}. This is an informational message only. No user action is required.
2021-02-28 06:39:44.220 spid46s The recovery LSN (1558:4889:2) was identified for the database with ID 5. This is an informational message only. No user action is required.

From above logs, below details are clear.

-> Failover was initiated at 28 Feb 2021 06:21:25.

-> Secondary database on JBSAG1 is available and synchronized with primary around 28 Feb 2021 06:39:44.

-> Secondary database JBDB on JBSAG1 synchronized with Primary after 18 + minutes. It seems like most of the time was spent in “Reverting \ In Recovery”.

-> The access to readable secondary database JBDB will have issues till the state is “Reverting \ In Recovery”.

-> The rollback on secondary replica should complete before the synchronization process from Primary can start. Until then the secondary database can not be accessible.

-> Lets check the error logs in JBSAG2.

JBSAG2

2021-02-28 06:21:25.410 spid47s Always On: The local replica of availability group ‘JBSWIKI’ is going offline because the corresponding resource in the Windows Server Failover Clustering (WSFC) cluster is no longer online. This is an informational message only. No user action is required.
2021-02-28 06:21:25.410 spid47s The state of the local availability replica in availability group ‘JBSWIKI’ has changed from ‘SECONDARY_NORMAL’ to ‘RESOLVING_NORMAL’. The state changed because the availability group state has changed in Windows Server Failover Clustering (WSFC). For more information, see the SQL Server error log or cluster log. If this is a Windows Server Failover Clustering (WSFC) availability group, you can also see the WSFC management console.
2021-02-28 06:21:25.410 spid20s The availability group database “JBDB” is changing roles from “SECONDARY” to “RESOLVING” because the mirroring session or availability group failed over due to role synchronization. This is an informational message only. No user action is required.
2021-02-28 06:21:25.410 spid20s State information for database ‘JBDB’ – Hardened Lsn: ‘(1557:127872:1)’ Commit LSN: ‘(1557:126710:63)’ Commit Time: ‘Feb 28 2021 6:20AM’
2021-02-28 06:21:25.710 spid82s Always On Availability Groups connection with primary database terminated for secondary database ‘JBDB’ on the availability replica ‘JBSAG1’ with Replica ID: {1b5043db-0107-4299-a2ba-9d156e056542}. This is an informational message only. No user action is required.
2021-02-28 06:21:26.730 spid131 Always On: The local replica of availability group ‘JBSWIKI’ is preparing to transition to the primary role. This is an informational message only. No user action is required.
2021-02-28 06:21:27.350 spid131 The state of the local availability replica in availability group ‘JBSWIKI’ has changed from ‘RESOLVING_NORMAL’ to ‘PRIMARY_PENDING’. The state changed because the availability group is coming online. For more information, see the SQL Server error log or cluster log. If this is a Windows Server Failover Clustering (WSFC) availability group, you can also see the WSFC management console.
2021-02-28 06:21:27.350 Server The lease worker of availability group ‘JBSWIKI’ is now sleeping the excess lease time (164250 ms) supplied during online. This is an informational message only. No user action is required.
2021-02-28 06:21:28.500 Server The state of the local availability replica in availability group ‘JBSWIKI’ has changed from ‘PRIMARY_PENDING’ to ‘PRIMARY_NORMAL’. The state changed because the local replica has completed processing Online command from Windows Server Failover Clustering (WSFC). For more information, see the SQL Server error log or cluster log. If this is a Windows Server Failover Clustering (WSFC) availability group, you can also see the WSFC management console.
2021-02-28 06:21:28.500 spid20s The availability group database “JBDB” is changing roles from “RESOLVING” to “PRIMARY” because the mirroring session or availability group failed over due to role synchronization. This is an informational message only. No user action is required.

-> Replica JBSAG2 becomes primary as shown in above logs.

2021-02-28 06:21:28.940 spid73s Recovery of database ‘JBDB’ (5) is 69% complete (approximately 26 seconds remain). Phase 3 of 3. This is an informational message only. No user action is required.
2021-02-28 06:22:08.480 spid73s Recovery of database ‘JBDB’ (5) is 17% complete (approximately 473 seconds remain). Phase 3 of 3. This is an informational message only. No user action is required.
2021-02-28 06:22:22.020 spid21s A connection for availability group ‘JBSWIKI’ from availability replica ‘JBSAG2’ with id [531FB0D2-4930-4E75-B1FE-2875DCCB9FFB] to ‘JBSAG1’ with id [1B5043DB-0107-4299-A2BA-9D156E056542] has been successfully established. This is an informational message only. No user action is required.
2021-02-28 06:22:28.480 spid73s Recovery of database ‘JBDB’ (5) is 36% complete (approximately 207 seconds remain). Phase 3 of 3. This is an informational message only. No user action is required.
2021-02-28 06:22:48.480 spid73s Recovery of database ‘JBDB’ (5) is 72% complete (approximately 51 seconds remain). Phase 3 of 3. This is an informational message only. No user action is required.
2021-02-28 06:23:08.490 spid73s Recovery of database ‘JBDB’ (5) is 99% complete (approximately 2 seconds remain). Phase 3 of 3. This is an informational message only. No user action is required.
2021-02-28 06:23:08.660 spid73s 12 transactions rolled back in database ‘JBDB’ (5:0). This is an informational message only. No user action is required.
2021-02-28 06:23:08.660 spid73s Recovery completed for database JBDB (database ID 5) in 100 second(s) (analysis 0 ms, redo 0 ms, undo 99714 ms [system undo 53953 ms, regular undo 45747 ms].) This is an informational message only. No user action is required.

-> Database JBDB becomes active as per above messages around 28 Feb 2021 06:23:08.

-> Lets check the performance monitor logs on Primary replica JBSAG2 and Secondary replica JBSAG1.

-> Recovery Queue – Total number of hardened log in kilobytes that is waiting to be redone on the secondary.

-> Redone Bytes/Sec – Amount of log records redone in the last second to catch up the database replica.

-> Log Remaining for undo – The amount of log in kilobytes remaining to finish the undo phase.

-> From the performance monitor logs, it is pretty clear that we have spent most of our time during undo in secondary database in replica JBSAG1. Log Remaining for undo decreases as shown in the screenshot above and reaches to 0 around 6:39. This is the time when the secondary database comes online as per details in SQL Server error log. Hence it is pretty clear that a large transaction (Rebuild Index in our case) that is terminated due to failover has caused this issue.

-> Log Remaining for undo can be used to understand how much log in KB is remaining before the database can come to synchronized state from “Reverting \ In Recovery” state.

Thank You,
Vivek Janakiraman

Disclaimer:
The views expressed on this blog are mine alone and do not reflect the views of my company or anyone else. All postings on this blog are provided “AS IS” with no warranties, and confers no rights.

DNS lookup failed with error: ‘11001(No such host is known.)’.

Environment

-> Database server JBSAG1 and JBSAG2 are on primary datacentre located in East US.

-> Database server JBSAG3 is on secondary datacentre in Korea Central.

-> Database servers JBSAG1, JBSAG2 and JBSAG3 are part of failover cluster without shared storage.

-> Availability group status on Synchronous secondary JBSAG2 shows as “Not Synchronizing”,

-> I checked the SQL Server error log on JBSAG1 and JBSAG2,

JBSAG1

2021-02-25 15:05:28.760 spid45s A connection timeout has occurred while attempting to establish a connection to availability replica ‘JBSAG2’ with id [531FB0D2-4930-4E75-B1FE-2875DCCB9FFB]. Either a networking or firewall issue exists, or the endpoint address provided for the replica is not the database mirroring endpoint of the host server instance.
2021-02-25 15:06:07.730 spid24s A connection for availability group ‘JBSWIKI’ from availability replica ‘JBSAG1’ with id [1B5043DB-0107-4299-A2BA-9D156E056542] to ‘JBSAG2’ with id [531FB0D2-4930-4E75-B1FE-2875DCCB9FFB] has been successfully established. This is an informational message only. No user action is required.
2021-02-25 15:06:07.730 spid45s Always On Availability Groups connection with secondary database established for primary database ‘JBDB’ on the availability replica ‘JBSAG2’ with Replica ID: {531fb0d2-4930-4e75-b1fe-2875dccb9ffb}. This is an informational message only. No user action is required.
2021-02-25 15:06:31.850 spid45s Always On Availability Groups connection with secondary database terminated for primary database ‘JBDB’ on the availability replica ‘JBSAG2’ with Replica ID: {531fb0d2-4930-4e75-b1fe-2875dccb9ffb}. This is an informational message only. No user action is required.
2021-02-25 15:06:33.350 spid73s DbMgrPartnerCommitPolicy::SetSyncState: 531FB0D2-4930-4E75-B1FE-2875DCCB9FFB:1
2021-02-25 15:06:41.870 spid24s A connection timeout has occurred while attempting to establish a connection to availability replica ‘JBSAG2’ with id [531FB0D2-4930-4E75-B1FE-2875DCCB9FFB]. Either a networking or firewall issue exists, or the endpoint address provided for the replica is not the database mirroring endpoint of the host server instance.

JBSAG2

2021-02-25 15:06:31.860 spid26s A connection timeout has occurred on a previously established connection to availability replica ‘JBSAG1’ with id [1B5043DB-0107-4299-A2BA-9D156E056542]. Either a networking or a firewall issue exists or the availability replica has transitioned to the resolving role.
2021-02-25 15:06:31.860 spid26s Always On Availability Groups connection with primary database terminated for secondary database ‘JBDB’ on the availability replica ‘JBSAG1’ with Replica ID: {1b5043db-0107-4299-a2ba-9d156e056542}. This is an informational message only. No user action is required.
2021-02-25 15:06:41.860 spid26s A connection timeout has occurred while attempting to establish a connection to availability replica ‘JBSAG1’ with id [1B5043DB-0107-4299-A2BA-9D156E056542]. Either a networking or firewall issue exists, or the endpoint address provided for the replica is not the database mirroring endpoint of the host server instance.

-> Lets execute below query on Primary replica JBSAG1 and check if we are able to get some details,

select r.replica_server_name, r.endpoint_url,
rs.connected_state_desc, rs.last_connect_error_description,
rs.last_connect_error_number, rs.last_connect_error_timestamp
from sys.dm_hadr_availability_replica_states rs join sys.availability_replicas r
on rs.replica_id=r.replica_id

-> Below error can be found for replica JBSAG2,

DNS lookup failed with error: ‘11001(No such host is known.)’.

-> If we look at the endpoint_url in above output, we are able to see a “space” for JBSAG2 endpoint_url “TCP://JBSAG2.JBS.COM :5022”,

-> Use below query to correct the endpoint URL,

USE MASTER
GO
ALTER AVAILABILITY GROUP [JBSwiki] 
MODIFY REPLICA ON 'JBSAG2' WITH (ENDPOINT_URL = 'TCP://JBSAG2.JBS.COM:5022');

-> Everything seems fine after above change,

Thank You,
Vivek Janakiraman

Disclaimer:
The views expressed on this blog are mine alone and do not reflect the views of my company or anyone else. All postings on this blog are provided “AS IS” with no warranties, and confers no rights.