Always ON Availability group in RESOLVING state

Environment

-> I tried failing over Availability group from JBSAG1 to JBSAG2.

-> The failover wizard started executing and was taking much time,

-> Below was the Availability group state,

-> It failed after some time with below message,

Manual Failover failed (Microsoft.SqlServer.Management.HadrTasks)

TITLE: Microsoft SQL Server Management Studio
ADDITIONAL INFORMATION:

Failed to perform a manual failover of the availability group ‘JBSAG’ to server instance ‘JBSAG2’. (Microsoft.SqlServer.Management.HadrModel)
For help, click: https://go.microsoft.com/fwlink?ProdName=Microsoft+SQL+Server&ProdVer=16.100.37971.0&EvtSrc=Microsoft.SqlServer.Management.Smo.ExceptionTemplates.FailedOperationExceptionText&LinkId=20476

An exception occurred while executing a Transact-SQL statement or batch. (Microsoft.SqlServer.ConnectionInfo)

Failed to bring availability group ‘JBSAG’ online. The operation timed out. If this is a Windows Server Failover Clustering (WSFC) availability group, verify that the local WSFC node is online. Then verify that the availability group resource exists in the WSFC cluster. If the problem persists, you might need to drop the availability group and create it again. (Microsoft SQL Server, Error: 41131)

For help, click: http://go.microsoft.com/fwlink?ProdName=Microsoft%20SQL%20Server&ProdVer=14.00.3294&EvtSrc=MSSQLServer&EvtID=41131&LinkId=20476

BUTTONS:
OK

-> Checking SQL Server Errorlog,

JBSAG2

2020-07-25 16:02:03.210 spid81 The state of the local availability replica in availability group ‘JBSAG’ has changed from ‘SECONDARY_NORMAL’ to ‘RESOLVING_PENDING_FAILOVER’. The state changed because of a user initiated failover. For more information, see the SQL Server error log or cluster log. If this is a Windows Server Failover Clustering (WSFC) availability group, you can also see the WSFC management console.
2020-07-25 16:02:03.790 Logon Error: 18456, Severity: 14, State: 5.
2020-07-25 16:02:03.790 Logon Login failed for user ‘JBS\JBSAG2$’. Reason: Could not find a login matching the name provided. [CLIENT: ]
2020-07-25 16:02:13.270 spid82s A connection timeout has occurred on a previously established connection to availability replica ‘JBSAG1’ with id [850DE16E-2C2B-43B1-BF52-6D8C5E2046FF]. Either a networking or a firewall issue exists or the availability replica has transitioned to the resolving role.
2020-07-25 16:08:58.730 spid81 The state of the local availability replica in availability group ‘JBSAG’ has changed from ‘RESOLVING_PENDING_FAILOVER’ to ‘RESOLVING_NORMAL’. The state changed because of a user initiated failover. For more information, see the SQL Server error log or cluster log. If this is a Windows Server Failover Clustering (WSFC) availability group, you can also see the WSFC management console.
2020-07-25 16:08:58.730 spid83s Always On Availability Groups connection with primary database terminated for secondary database ‘JBSDBDW’ on the availability replica ‘JBSAG1’ with Replica ID: {850de16e-2c2b-43b1-bf52-6d8c5e2046ff}. This is an informational message only. No user action is required.
2020-07-25 16:08:58.730 spid25s Always On Availability Groups connection with primary database terminated for secondary database ‘StackOverflow2010’ on the availability replica ‘JBSAG1’ with Replica ID: {850de16e-2c2b-43b1-bf52-6d8c5e2046ff}. This is an informational message only. No user action is required.
2020-07-25 16:08:58.730 spid82s The availability group database “JBSDBDW” is changing roles from “SECONDARY” to “RESOLVING” because the mirroring session or availability group failed over due to role synchronization. This is an informational message only. No user action is required.
2020-07-25 16:08:58.730 spid25s The availability group database “StackOverflow2010” is changing roles from “SECONDARY” to “RESOLVING” because the mirroring session or availability group failed over due to role synchronization. This is an informational message only. No user action is required.
2020-07-25 16:08:58.740 spid25s The availability group database “JBSDBDW” is changing roles from “RESOLVING” to “SECONDARY” because the mirroring session or availability group failed over due to role synchronization. This is an informational message only. No user action is required.
2020-07-25 16:08:58.740 spid82s The availability group database “StackOverflow2010” is changing roles from “RESOLVING” to “SECONDARY” because the mirroring session or availability group failed over due to role synchronization. This is an informational message only. No user action is required.

-> This will affect automatic failover also incase primary replica goes down.

-> Checked if NT Authority\System is created and has appropriate permission.

-> It seems like NT Authority\System was removed from SQL Server Instance.

-> Recreated NT Authority\System on SQL Server instance and provide required permission.

USE [master]
GO
CREATE LOGIN [NT AUTHORITY\SYSTEM] FROM WINDOWS WITH DEFAULT_DATABASE=[master]
GO 

GRANT ALTER ANY AVAILABILITY GROUP TO [NT AUTHORITY\SYSTEM]
GO
GRANT CONNECT SQL TO [NT AUTHORITY\SYSTEM]
GO
GRANT VIEW SERVER STATE TO [NT AUTHORITY\SYSTEM]
GO 

-> I tried the failover again after creating NT Authority\System and it worked fine,

Thank You,
Vivek Janakiraman

Disclaimer:
The views expressed on this blog are mine alone and do not reflect the views of my company or anyone else. All postings on this blog are provided “AS IS” with no warranties, and confers no rights.

Leave a Reply