Always On – One or more databases are not synchronized or have not joined the availability group. If the availability replica uses the asynchronous-commit mode, consider performing a forced manual failover (with possible data loss).

Environment

-> Database server JBSAG1 and JBSAG2 are located in East US.

-> Database servers JBSAG1 and JBSAG2 are part of failover cluster without shared storage.

-> Database Server JBSAG1 which is the primary replica goes down around 28 Feb 2021 14:27:13.060. Application team advised that their application had issue connecting to the database JBDB after JBSAG1 went down.

-> Upon checking, Availability group status is “Resolving” and below is the status of replica JBSAG2,

-> In the meantime, Database server JBSAG1 comes online and Availability group comes online on JBSAG1 again. Application team were able to access the database without any issues. So it is clear that database JBDB as part of availability group JBSWIKI was not accessible for 24 minutes between 28 Feb 2021 14:27:13.060 to 28 Feb 2021 14:52:41.920

-> We need to understand the reason why Availability group did not failover automatically to Synchronous secondary replica JBSAG2 when database server JBSAG1 went down.

-> Checking SQL Server error log,

JBSAG1JBSAG2
2021-02-28 14:26:59.400 spid9s Always On: The availability replica manager is going offline because SQL Server is shutting down. This is an informational message only. No user action is required.
2021-02-28 14:26:59.400 Server The availability group 'JBSWIKI' is being asked to stop the lease renewal because the availability group is going offline. This is an informational message only. No user action is required.
2021-02-28 14:26:59.400 spid9s Always On: The local replica of availability group 'JBSWIKI' is stopping. This is an informational message only. No user action is required.
2021-02-28 14:26:59.400 spid9s SQL Server is terminating in response to a 'stop' request from Service Control Manager. This is an informational message only. No user action is required.
2021-02-28 14:26:59.520 spid35s The Database Mirroring endpoint has stopped listening for connections.
2021-02-28 14:26:59.520 spid35s Service Broker manager has shut down.
2021-02-28 14:27:09.510 spid9s .NET Framework runtime has been stopped.
2021-02-28 14:27:13.060 spid9s SQL Trace was stopped due to server shutdown. Trace ID = '1'. This is an informational message only; no user action is required.


-> As per above logs, it is clear that Availability Group goes offline due to database server/SQL server instance going down around 28 Feb 2021 14:26:59.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

2021-02-28 14:51:18.480 spid25s Starting up database 'JBDB'.
2021-02-28 14:51:18.480 spid26s Starting up database 'msdb'.
2021-02-28 14:51:18.480 spid27s Always On: The availability replica manager is starting. This is an informational message only. No user action is required.
2021-02-28 14:51:18.480 spid11s Starting up database 'mssqlsystemresource'.
2021-02-28 14:51:18.480 spid27s Always On Availability Groups: Waiting for local Windows Server Failover Clustering node to start. This is an informational message only. No user action is required.
2021-02-28 14:51:18.480 spid27s Always On Availability Groups: Local Windows Server Failover Clustering node started. This is an informational message only. No user action is required.
2021-02-28 14:51:18.480 spid27s Always On Availability Groups: Waiting for local Windows Server Failover Clustering node to come online. This is an informational message only. No user action is required.
2021-02-28 14:51:18.480 spid27s Always On Availability Groups: Local Windows Server Failover Clustering node is online. This is an informational message only. No user action is required.
2021-02-28 14:51:18.480 spid25s Skipping the default startup of database 'JBDB' because the database belongs to an availability group (Group ID: 65536). The database will be started by the availability group. This is an informational message only. No user action is required.
2021-02-28 14:51:18.480 spid27s Always On: The availability replica manager is waiting for the instance of SQL Server to allow client connections. This is an informational message only. No user action is required.
2021-02-28 14:51:18.480 spid27s Always On: The local replica of availability group 'JBSWIKI' is starting. This is an informational message only. No user action is required.
2021-02-28 14:51:18.490 spid27s The state of the local availability replica in availability group 'JBSWIKI' has changed from 'NOT_AVAILABLE' to 'RESOLVING_NORMAL'. The state changed because the local instance of SQL Server is starting up. For more information, see the SQL Server error log or cluster log. If this is a Windows Server Failover Clustering (WSFC) availability group, you can also see the WSFC management console.
2021-02-28 14:51:18.500 spid11s The resource database build version is 15.00.4073. This is an informational message only. No user action is required.
2021-02-28 14:51:18.540 spid11s Starting up database 'model'.
2021-02-28 14:51:18.560 spid51 Error: 46906, Severity: 16, State: 1.
2021-02-28 14:51:18.560 spid51 Unable to retrieve registry value 'NodeRole' from Windows registry key 'Software\Microsoft\Microsoft SQL Server\MSSQL15.MSSQLSERVER\Polybase\Configuration': (null).
2021-02-28 14:51:18.700 spid52 Always On: The local replica of availability group 'JBSWIKI' is preparing to transition to the primary role. This is an informational message only. No user action is required.
2021-02-28 14:51:18.720 spid52 The state of the local availability replica in availability group 'JBSWIKI' has changed from 'RESOLVING_NORMAL' to 'PRIMARY_PENDING'. The state changed because the availability group is coming online. For more information, see the SQL Server error log or cluster log. If this is a Windows Server Failover Clustering (WSFC) availability group, you can also see the WSFC management console.

.

.
2021-02-28 14:51:18.750 Server The state of the local availability replica in availability group 'JBSWIKI' has changed from 'PRIMARY_PENDING' to 'PRIMARY_NORMAL'. The state changed because the local replica has completed processing Online command from Windows Server Failover Clustering (WSFC). For more information, see the SQL Server error log or cluster log. If this is a Windows Server Failover Clustering (WSFC) availability group, you can also see the WSFC management console.
2021-02-28 14:51:18.760 spid25s DbMgrPartnerCommitPolicy::SetSyncAndRecoveryPoint: 531FB0D2-4930-4E75-B1FE-2021-02-28 14:51:18.790 spid25s Starting up database 'JBDB'.
2021-02-28 14:51:19.470 spid25s Parallel redo is started for database 'JBDB' with worker pool size [4].
2021-02-28 14:51:21.470 spid25s DbMgrPartnerCommitPolicy::SetSyncAndRecoveryPoint: 531FB0D2-4930-4E75-B1FE-2875DCCB9FFB:1
2021-02-28 14:51:21.480 spid25s DbMgrPartnerCommitPolicy::SetSyncAndRecoveryPoint: 531FB0D2-4930-4E75-B1FE-2875DCCB9FFB:1
2021-02-28 14:51:21.480 spid25s Recovery of database 'JBDB' (5) is 0% complete (approximately 3276 seconds remain). Phase 2 of 3. This is an informational message only. No user action is required.
.

2021-02-28 14:51:31.560 spid9s Recovery is complete. This is an informational message only. No user action is required.
2021-02-28 14:51:41.490 spid25s Recovery of database 'JBDB' (5) is 10% complete (approximately 191 seconds remain). Phase 2 of 3. This is an informational message only. No user action is required.
2021-02-28 14:51:41.840 spid68s A connection for availability group 'JBSWIKI' from availability replica 'JBSAG1' with id [1B5043DB-0107-4299-A2BA-9D156E056542] to 'JBSAG2' with id [531FB0D2-4930-4E75-B1FE-2875DCCB9FFB] has been successfully established. This is an informational message only. No user action is required.
2021-02-28 14:52:01.500 spid25s Recovery of database 'JBDB' (5) is 19% complete (approximately 171 seconds remain). Phase 2 of 3. This is an informational message only. No user action is required.
2021-02-28 14:52:21.510 spid25s Recovery of database 'JBDB' (5) is 29% complete (approximately 152 seconds remain). Phase 2 of 3. This is an informational message only. No user action is required.
2021-02-28 14:52:41.100 spid25s Recovery of database 'JBDB' (5) is 38% complete (approximately 131 seconds remain). Phase 2 of 3. This is an informational message only. No user action is required.
2021-02-28 14:52:41.140 spid25s Recovery of database 'JBDB' (5) is 38% complete (approximately 131 seconds remain). Phase 3 of 3. This is an informational message only. No user action is required.
2021-02-28 14:52:41.910 spid66s 12 transactions rolled back in database 'JBDB' (5:0). This is an informational message only. No user action is required.
2021-02-28 14:52:41.910 spid66s Recovery is writing a checkpoint in database 'JBDB' (5). This is an informational message only. No user action is required.
2021-02-28 14:52:41.920 spid66s Recovery completed for database JBDB (database ID 5) in 83 second(s) (analysis 1975 ms, redo 79613 ms, undo 764 ms [system undo 140 ms, regular undo 604 ms].) This is an informational message only. No user action is required.


-> Availability group JBSWIKI and database JBDB is online as per above logs.
2021-02-28 14:26:59.520 spid29s Always On: The local replica of availability group 'JBSWIKI' is going offline because the corresponding resource in the Windows Server Failover Clustering (WSFC) cluster is no longer online. This is an informational message only. No user action is required.
2021-02-28 14:26:59.520 spid29s The state of the local availability replica in availability group 'JBSWIKI' has changed from 'SECONDARY_NORMAL' to 'RESOLVING_NORMAL'. The state changed because the availability group state has changed in Windows Server Failover Clustering (WSFC). For more information, see the SQL Server error log or cluster log. If this is a Windows Server Failover Clustering (WSFC) availability group, you can also see the WSFC management console.
2021-02-28 14:26:59.520 spid65s The availability group database "JBDB" is changing roles from "SECONDARY" to "RESOLVING" because the mirroring session or availability group failed over due to role synchronization. This is an informational message only. No user action is required.
2021-02-28 14:26:59.520 spid65s State information for database 'JBDB' - Hardened Lsn: '(1558:871882:1)' Commit LSN: '(1558:175123:24)' Commit Time: 'Feb 28 2021 2:24PM'
2021-02-28 14:26:59.530 spid62s Always On Availability Groups connection with primary database terminated for secondary database 'JBDB' on the availability replica 'JBSAG1' with Replica ID: {1b5043db-0107-4299-a2ba-9d156e056542}. This is an informational message only. No user action is required.
2021-02-28 14:27:16.380 spid67 Always On: The local replica of availability group 'JBSWIKI' is preparing to transition to the primary role. This is an informational message only. No user action is required.
2021-02-28 14:27:16.380 spid67 The availability replica for availability group 'JBSWIKI' on this instance of SQL Server cannot become the primary replica. One or more databases are not synchronized or have not joined the availability group. If the availability replica uses the asynchronous-commit mode, consider performing a forced manual failover (with possible data loss). Otherwise, once all local secondary databases are joined and synchronized, you can perform a planned manual failover to this secondary replica (without data loss). For more information, see SQL Server Books Online.
 


-> Above logs clearly mentions that a failover attempt was performed. But JBSAG2 cannot become primary because one or more databases were not Synchronized.


2021-02-28 14:27:17.290 spid58 Always On: The local replica of availability group ‘JBSWIKI’ is preparing to transition to the resolving role. This is an informational message only. No user action is required.
2021-02-28 14:27:17.570 spid58 Always On: The local replica of availability group ‘JBSWIKI’ is preparing to transition to the primary role. This is an informational message only. No user action is required.
2021-02-28 14:27:17.580 spid58 The availability replica for availability group ‘JBSWIKI’ on this instance of SQL Server cannot become the primary replica. One or more databases are not synchronized or have not joined the availability group. If the availability replica uses the asynchronous-commit mode, consider performing a forced manual failover (with possible data loss). Otherwise, once all local secondary databases are joined and synchronized, you can perform a planned manual failover to this secondary replica (without data loss). For more information, see SQL Server Books Online.
2021-02-28 14:27:18.500 spid58 Always On: The local replica of availability group ‘JBSWIKI’ is preparing to transition to the resolving role. This is an informational message only. No user action is required.
. . . 2021-02-28 14:27:18.500 spid77       Always On: The local replica of availability group ‘JBSWIKI’ is preparing to transition to the primary role. This is an informational message only. No user action is required.

.
.
.

.
.
.
.
.
.

.
.

.
.

.
.

.
.
.

.
.

.
.

.
.
.
.
.
.
.
.
.
.
.

2021-02-28 14:50:08.230 spid77       The availability replica for availability group ‘JBSWIKI’ on this instance of SQL Server cannot become the primary replica. One or more databases are not synchronized or have not joined the availability group. If the availability replica uses the asynchronous-commit mode, consider performing a forced manual failover (with possible data loss). Otherwise, once all local secondary databases are joined and synchronized, you can perform a planned manual failover to this secondary replica (without data loss). For more information, see SQL Server Books Online.
2021-02-28 14:50:09.150 spid77       Always On: The local replica of availability group ‘JBSWIKI’ is preparing to transition to the resolving role. This is an informational message only. No user action is required.  

-> Message “The availability replica for availability group ‘JBSWIKI’ on this instance of SQL Server cannot become the primary replica. One or more databases are not synchronized or have not joined the availability group.” is present from 28 Feb 2021 14:27:16 to 14:50:09.                  

2021-02-28 14:51:18.730 spid29s      The state of the local availability replica in availability group ‘JBSWIKI’ has changed from ‘RESOLVING_NORMAL’ to ‘SECONDARY_NORMAL’.  The state changed because the availability group state has changed in Windows Server Failover Clustering (WSFC).  For more information, see the SQL Server error log or cluster log.  If this is a Windows Server Failover Clustering (WSFC) availability group, you can also see the WSFC management console.
2021-02-28 14:51:18.730 spid71s      The availability group database “JBDB” is changing roles from “RESOLVING” to “SECONDARY” because the mirroring session or availability group failed over due to role synchronization. This is an informational message only. No user action is required.  

-> JBSAG2 becomes secondary as per above logs on 28 Feb 2021 14:51:18.



















































































-> From above log analysis, It is pretty clear that the automatic failover of Availability group from JBSAG1 to JBSAG2 did not complete due to below message,

The availability replica for availability group 'JBSWIKI' on this instance of SQL Server cannot become the primary replica. One or more databases are not synchronized or have not joined the availability group. If the availability replica uses the asynchronous-commit mode, consider performing a forced manual failover (with possible data loss). Otherwise, once all local secondary databases are joined and synchronized, you can perform a planned manual failover to this secondary replica (without data loss). For more information, see SQL Server Books Online.

-> Automatic failover of Availability group JBSWIKI did not complete as availability database JBDB was not in SYNCHRONIZED state between JBSAG1 and JBSAG2. The Availability database should be synchronized to avoid any potential data loss during failovers.

-> The database JBDB may not be synchronized due to a large transaction that was happening on the Primary replica when the failover was performed or the secondary synchronous replica was not available due to a network issue or some other reason. In our case the huge transaction was an Index rebuild operation on a very huge Index when failover was attempted when Primary replica JBSAG1 went down.

-> There is nothing much we can do in these scenarios other than waiting for the primary replica to be online or perform a forced failover with data loss.

Thank You,
Vivek Janakiraman

Disclaimer:
The views expressed on this blog are mine alone and do not reflect the views of my company or anyone else. All postings on this blog are provided “AS IS” with no warranties, and confers no rights.

Leave a Reply