Always On: WSFC AG integrity check failed for AG ‘JBAG’ with error 41044, severity 16, state 1

-> I was trying to configure Always On Availability Group and got below error,

TITLE: Microsoft SQL Server Management Studio

Create failed for Availability Group ‘JBAG’. (Microsoft.SqlServer.Management.HadrModel)

For help, click: https://go.microsoft.com/fwlink?ProdName=Microsoft+SQL+Server&ProdVer=16.100.46041.41+(SMO-master-A)&EvtSrc=Microsoft.SqlServer.Management.Smo.ExceptionTemplates.FailedOperationExceptionText&EvtID=Create+AvailabilityGroup&LinkId=20476

ADDITIONAL INFORMATION:

An exception occurred while executing a Transact-SQL statement or batch. (Microsoft.SqlServer.ConnectionInfo)

Failed to bring availability group ‘JBAG’ online. The operation timed out. Verify that the local Windows Server Failover Clustering (WSFC) node is online. Then verify that the availability group resource exists in the WSFC cluster. If the problem persists, you might need to drop the availability group and create it again.
Failed to create availability group ‘JBAG’. The operation encountered SQL Server error 41131 and has been rolled back. Check the SQL Server error log for more details. When the cause of the error has been resolved, retry CREATE AVAILABILITY GROUP command. (Microsoft SQL Server, Error: 41131)

For help, click: http://go.microsoft.com/fwlink?ProdName=Microsoft%20SQL%20Server&ProdVer=13.00.5865&EvtSrc=MSSQLServer&EvtID=41131&LinkId=20476

-> SQL Server Error Log on JBSAG1,

2021-04-12 10:26:06.190 spid55 The Database Mirroring endpoint is now listening for connections.
2021-04-12 10:26:06.190 spid55 Database mirroring has been enabled on this instance of SQL Server.
2021-04-12 10:26:17.270 spid55 The state of the local availability replica in availability group ‘JBAG’ has changed from ‘NOT_AVAILABLE’ to ‘RESOLVING_NORMAL’. The state changed because the local availability replica is joining the availability group. For more information, see the SQL
2021-04-12 10:27:17.340 spid55 The state of the local availability replica in availability group ‘JBAG’ has changed from ‘RESOLVING_NORMAL’ to ‘NOT_AVAILABLE’. The state changed because either the associated availability group has been deleted, or the local availability replica has bee
2021-04-12 10:27:25.530 spid55 Error: 19435, Severity: 16, State: 1.
2021-04-12 10:27:25.530 spid55 Always On: WSFC AG integrity check failed for AG ‘JBAG’ with error 41044, severity 16, state 1.

-> SQL Server Error Log on JBSAG2 and JBSAG3 did not have any relevant details.

-> Message in SQL server error log did not give much details.

-> Checking Cluster.log on JBSAG1. Refer article to learn how to generate cluster.log.

[Verbose] 000007e8.00002324::2021/04/12-10:26:17.799 INFO [RES] SQL Server Availability Group: [hadrag] Extended Event logging is started
[Verbose] 000007e8.00002324::2021/04/12-10:26:17.799 INFO [RES] SQL Server Availability Group: [hadrag] Health worker started for instance JBSAG1
[Verbose] 000007e8.00002324::2021/04/12-10:26:17.799 INFO [RES] SQL Server Availability Group: [hadrag] Connect to SQL Server …
[Verbose] 000007e8.00002324::2021/04/12-10:26:17.803 INFO [RES] SQL Server Availability Group: [hadrag] The connection was established successfully
[Verbose] 000007e8.00002324::2021/04/12-10:26:17.807 INFO [RES] SQL Server Availability Group: [hadrag] Run ‘EXEC sp_server_diagnostics 10’ returns following information
[Verbose] 000007e8.00002324::2021/04/12-10:26:17.807 ERR [RES] SQL Server Availability Group: [hadrag] ODBC Error: [42000] [Microsoft][SQL Server Native Client 11.0][SQL Server]The user does not have permission to perform this action. (297)
[Verbose] 000007e8.00002324::2021/04/12-10:26:17.807 ERR [RES] SQL Server Availability Group: [hadrag] Failed to run diagnostics command. See previous log for error message
[Verbose] 000007e8.00002324::2021/04/12-10:26:17.807 INFO [RES] SQL Server Availability Group: [hadrag] Disconnect from SQL Server
[Verbose] 000018a8.00000a0c::2021/04/12-10:26:18.213 DBG [GEM] Node 2: Sending GemMaxAckControl message with gid 171. Last acknowledged gid was 157

-> Message “The user does not have permission to perform this action” clearly states that the issue is due to permission issue. A quick search landed me on this article which provides below details,

The [NT AUTHORITY\SYSTEM] account is used by SQL Server AlwaysOn health detection to connect to the SQL Server computer and to monitor health. When you create an availability group, health detection is initiated when the primary replica in the availability group comes online. If the [NT AUTHORITY\SYSTEM] account does not exist or does not have sufficient permissions, health detection cannot be initiated, and the availability group cannot come online during the creation process.

-> Cluster.log entries clearly states that the AG creation fails while executing “EXEC sp_server_diagnostics 10” and this matches with above point.

-> Lets execute below query on JBSAG1, JBSAG2 and JBSAG3,

GRANT ALTER ANY AVAILABILITY GROUP TO [NT AUTHORITY\SYSTEM]
GO
GRANT CONNECT SQL TO [NT AUTHORITY\SYSTEM]
GO
GRANT VIEW SERVER STATE TO [NT AUTHORITY\SYSTEM]
GO

-> Once permission is granted to NT AUTHORITY\SYSTEM. Availability group creation completed just fine.

Thank You,
Vivek Janakiraman

Disclaimer:
The views expressed on this blog are mine alone and do not reflect the views of my company or anyone else. All postings on this blog are provided “AS IS” with no warranties, and confers no rights.