The connection to the primary replica is not active. The command cannot be processed. (Microsoft SQL Server, Error: 35250)

-> I was configuring Always On Availability group on Database Servers JBSAG1, JBSAG2 and JBSAG3 using the wizard. It all went fine and below is the screenshot,

-> Checked the Always On Availability Group dashboard and below was the status,

-> Checking from SQL Server Management Studio,

-> There is a warning sign under “Availability Databases” and the database is in restoring state on both the secondary replicas.

-> Below messages are found,

This secondary replica is not connected to the primary replica. The connected state is DISCONNECTED.

At least one availability database on this availability replica has an unhealthy data synchronization state. If this is an asynchronous-commit availability replica, all availability databases should be in the SYNCHRONIZING state. If this is a synchronous-commit availability replica, all availability databases should be in the SYNCHRONIZED state.

This secondary replica is not joined to the availability group. For an availability replica to be successfully joined to the availability group, the join state must be Joined Standalone Instance (1) or Joined Failover Cluster (2). For information about how-to join a secondary replica to an availability group, see SQL Server Books Online.

-> I tried to join the database JBDB to Availability group and got below message,

TITLE: Microsoft SQL Server Management Studio

Failed to join the database ‘JBDB’ to the availability group ‘JBAG’ on the availability replica ‘JBSAG2’. (Microsoft.SqlServer.Smo)

For help, click: https://go.microsoft.com/fwlink?ProdName=Microsoft+SQL+Server&ProdVer=16.100.46041.41+(SMO-master-A)&EvtSrc=Microsoft.SqlServer.Management.Smo.ExceptionTemplates.FailedOperationExceptionText&LinkId=20476

ADDITIONAL INFORMATION:

An exception occurred while executing a Transact-SQL statement or batch. (Microsoft.SqlServer.ConnectionInfo)

The connection to the primary replica is not active. The command cannot be processed. (Microsoft SQL Server, Error: 35250)

-> Checking SQL Server Error log in JBSAG2,

2021-04-12 16:01:56.190 spid54 ALTER DB JBDB with AGNAME:JBAG
2021-04-12 16:01:56.190 spid54 ALTER DB param option: SET
2021-04-12 16:02:44.280 spid48s Always On Availability Groups connection with primary database terminated for secondary database ‘JBDB’ on the availability replica ‘JBSAG1’ with Replica ID: {81ddde84-4546-47f7-b78b-6de341c46f34}. This is an informational message only. No user action is r
2021-04-12 16:02:44.280 spid54 State information for database ‘JBDB’ – Hardened Lsn: ‘(130:102920:1)’ Commit LSN: ‘(0:0:0)’ Commit Time: ‘Apr 12 2021 9:46AM’
2021-04-12 16:02:44.290 spid54 State information for database ‘JBDB’ – Hardened Lsn: ‘(130:102920:1)’ Commit LSN: ‘(0:0:0)’ Commit Time: ‘Apr 12 2021 9:46AM’
2021-04-12 16:02:44.290 spid54 Starting up database ‘JBDB’.
2021-04-12 16:02:44.300 spid54 The database ‘JBDB’ is marked RESTORING and is in a state that does not allow recovery to be run.
2021-04-12 16:02:44.310 spid54 State information for database ‘JBDB’ – Hardened Lsn: ‘(130:102920:1)’ Commit LSN: ‘(0:0:0)’ Commit Time: ‘Apr 12 2021 9:46AM’
2021-04-12 16:02:44.310 spid54 State information for database ‘JBDB’ – Hardened Lsn: ‘(130:102920:1)’ Commit LSN: ‘(0:0:0)’ Commit Time: ‘Apr 12 2021 9:46AM’
2021-04-12 16:02:44.310 spid54 Starting up database ‘JBDB’.
2021-04-12 16:02:44.320 spid54 The database ‘JBDB’ is marked RESTORING and is in a state that does not allow recovery to be run.

-> Checking SQL Server Error log in JBSAG3,

2021-04-12 15:46:53.000 Backup Database was restored: Database: JBDB, creation date(time): 2021/04/12(09:06:02), first LSN: 130:102896:1, last LSN: 130:102920:1, number of dump devices: 1, device information: (FILE=1, TYPE=DISK: {‘\JBS-SQL-DC\Backup\JBDB.bak’}). Informational message.
2021-04-12 15:46:53.020 Backup RESTORE DATABASE successfully processed 461154 pages in 167.666 seconds (21.487 MB/sec).
2021-04-12 15:46:53.650 Backup Log was restored. Database: JBDB, creation date(time): 2021/04/12(09:06:02), first LSN: 130:102816:1, last LSN: 130:102920:1, number of dump devices: 1, device information: (FILE=1, TYPE=DISK: {‘\JBS-SQL-DC\Backup\JBDB.trn’}). This is an informational mes

-> Messages/errors in SQL Server error log did not provide any clue.

-> Executing below query on each of the instances,

select r.replica_server_name, r.endpoint_url,
rs.connected_state_desc, rs.last_connect_error_description,
rs.last_connect_error_number, rs.last_connect_error_timestamp
from sys.dm_hadr_availability_replica_states rs join sys.availability_replicas r
on rs.replica_id=r.replica_id
where rs.is_local=1

Primary – JBSAG1

Synchronous Secondary – JBSAG2

Error Message,
Connection attempt failed with error: ‘10060(A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond.)’.

Asynchronous Secondary – JBSAG3

Error Message,
Connection attempt failed with error: ‘10060(A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond.)’.

-> The error message refers to a connection issue between the primary and secondaries.

-> Endpoint uses port 5022. Lets check if port 5022 is not blocked using telnet.

‘telnet’ is not recognized as an internal or external command,
operable program or batch file.

-> Execute below command in powershell to enable telnet.

Install-WindowsFeature -name Telnet-Client

-> Telnet is not working for port 5022. It proves that port 5022 is not open.

-> Let us create a firewall rule to open port 5022.

-> Open “Windows Firewall with Advanced Security” and open “Inbound Rules” on JBSAG1, JBSAG2 and JBSAG3.

-> Wait for 5 mins and check the status now by executing below query,

select r.replica_server_name, r.endpoint_url,
rs.connected_state_desc, rs.last_connect_error_description,
rs.last_connect_error_number, rs.last_connect_error_timestamp
from sys.dm_hadr_availability_replica_states rs join sys.availability_replicas r
on rs.replica_id=r.replica_id
where rs.is_local=1

-> After opening port 5022, We couldn’t see that error anymore.

-> Let’s try joining the database again and it works fine this time,

Thank You,
Vivek Janakiraman

Disclaimer:
The views expressed on this blog are mine alone and do not reflect the views of my company or anyone else. All postings on this blog are provided “AS IS” with no warranties, and confers no rights.

Always On: WSFC AG integrity check failed for AG ‘JBAG’ with error 41044, severity 16, state 1

-> I was trying to configure Always On Availability Group and got below error,

TITLE: Microsoft SQL Server Management Studio

Create failed for Availability Group ‘JBAG’. (Microsoft.SqlServer.Management.HadrModel)

For help, click: https://go.microsoft.com/fwlink?ProdName=Microsoft+SQL+Server&ProdVer=16.100.46041.41+(SMO-master-A)&EvtSrc=Microsoft.SqlServer.Management.Smo.ExceptionTemplates.FailedOperationExceptionText&EvtID=Create+AvailabilityGroup&LinkId=20476

ADDITIONAL INFORMATION:

An exception occurred while executing a Transact-SQL statement or batch. (Microsoft.SqlServer.ConnectionInfo)

Failed to bring availability group ‘JBAG’ online. The operation timed out. Verify that the local Windows Server Failover Clustering (WSFC) node is online. Then verify that the availability group resource exists in the WSFC cluster. If the problem persists, you might need to drop the availability group and create it again.
Failed to create availability group ‘JBAG’. The operation encountered SQL Server error 41131 and has been rolled back. Check the SQL Server error log for more details. When the cause of the error has been resolved, retry CREATE AVAILABILITY GROUP command. (Microsoft SQL Server, Error: 41131)

For help, click: http://go.microsoft.com/fwlink?ProdName=Microsoft%20SQL%20Server&ProdVer=13.00.5865&EvtSrc=MSSQLServer&EvtID=41131&LinkId=20476

-> SQL Server Error Log on JBSAG1,

2021-04-12 10:26:06.190 spid55 The Database Mirroring endpoint is now listening for connections.
2021-04-12 10:26:06.190 spid55 Database mirroring has been enabled on this instance of SQL Server.
2021-04-12 10:26:17.270 spid55 The state of the local availability replica in availability group ‘JBAG’ has changed from ‘NOT_AVAILABLE’ to ‘RESOLVING_NORMAL’. The state changed because the local availability replica is joining the availability group. For more information, see the SQL
2021-04-12 10:27:17.340 spid55 The state of the local availability replica in availability group ‘JBAG’ has changed from ‘RESOLVING_NORMAL’ to ‘NOT_AVAILABLE’. The state changed because either the associated availability group has been deleted, or the local availability replica has bee
2021-04-12 10:27:25.530 spid55 Error: 19435, Severity: 16, State: 1.
2021-04-12 10:27:25.530 spid55 Always On: WSFC AG integrity check failed for AG ‘JBAG’ with error 41044, severity 16, state 1.

-> SQL Server Error Log on JBSAG2 and JBSAG3 did not have any relevant details.

-> Message in SQL server error log did not give much details.

-> Checking Cluster.log on JBSAG1. Refer article to learn how to generate cluster.log.

[Verbose] 000007e8.00002324::2021/04/12-10:26:17.799 INFO [RES] SQL Server Availability Group: [hadrag] Extended Event logging is started
[Verbose] 000007e8.00002324::2021/04/12-10:26:17.799 INFO [RES] SQL Server Availability Group: [hadrag] Health worker started for instance JBSAG1
[Verbose] 000007e8.00002324::2021/04/12-10:26:17.799 INFO [RES] SQL Server Availability Group: [hadrag] Connect to SQL Server …
[Verbose] 000007e8.00002324::2021/04/12-10:26:17.803 INFO [RES] SQL Server Availability Group: [hadrag] The connection was established successfully
[Verbose] 000007e8.00002324::2021/04/12-10:26:17.807 INFO [RES] SQL Server Availability Group: [hadrag] Run ‘EXEC sp_server_diagnostics 10’ returns following information
[Verbose] 000007e8.00002324::2021/04/12-10:26:17.807 ERR [RES] SQL Server Availability Group: [hadrag] ODBC Error: [42000] [Microsoft][SQL Server Native Client 11.0][SQL Server]The user does not have permission to perform this action. (297)
[Verbose] 000007e8.00002324::2021/04/12-10:26:17.807 ERR [RES] SQL Server Availability Group: [hadrag] Failed to run diagnostics command. See previous log for error message
[Verbose] 000007e8.00002324::2021/04/12-10:26:17.807 INFO [RES] SQL Server Availability Group: [hadrag] Disconnect from SQL Server
[Verbose] 000018a8.00000a0c::2021/04/12-10:26:18.213 DBG [GEM] Node 2: Sending GemMaxAckControl message with gid 171. Last acknowledged gid was 157

-> Message “The user does not have permission to perform this action” clearly states that the issue is due to permission issue. A quick search landed me on this article which provides below details,

The [NT AUTHORITY\SYSTEM] account is used by SQL Server AlwaysOn health detection to connect to the SQL Server computer and to monitor health. When you create an availability group, health detection is initiated when the primary replica in the availability group comes online. If the [NT AUTHORITY\SYSTEM] account does not exist or does not have sufficient permissions, health detection cannot be initiated, and the availability group cannot come online during the creation process.

-> Cluster.log entries clearly states that the AG creation fails while executing “EXEC sp_server_diagnostics 10” and this matches with above point.

-> Lets execute below query on JBSAG1, JBSAG2 and JBSAG3,

GRANT ALTER ANY AVAILABILITY GROUP TO [NT AUTHORITY\SYSTEM]
GO
GRANT CONNECT SQL TO [NT AUTHORITY\SYSTEM]
GO
GRANT VIEW SERVER STATE TO [NT AUTHORITY\SYSTEM]
GO

-> Once permission is granted to NT AUTHORITY\SYSTEM. Availability group creation completed just fine.

Thank You,
Vivek Janakiraman

Disclaimer:
The views expressed on this blog are mine alone and do not reflect the views of my company or anyone else. All postings on this blog are provided “AS IS” with no warranties, and confers no rights.

SQL Server Database Corruption – File Header Page Corruption

-> I tried attaching a MDF file and got below error,

Msg 5172, Level 16, State 15, Line 1
The header for file ‘F:\data\JBDB_Data.mdf’ is not a valid database file header. The PageAudit property is incorrect.
Msg 1802, Level 16, State 7, Line 1
CREATE DATABASE failed. Some file names listed could not be created. Check related errors.

-> It is clear from the message that the file header page of this database is corrupted.

-> File Header Page is the first page on a file that is used by SQL Server to store information about the attributes of the file. Refer article for more details.

-> If File Header Page is corrupted, the best option will be to restore the database with latest available backup.

-> Lets create a new database with same name as the corrupted MDF file. Stop SQL Services and then replace the MDF file of this newly created database with the corrupted MDF file that we were trying to attach. Start SQL Services and check the status of the database.

-> The database is in “Recovery Pending” State,

-> Lets check the SQL Server Error Log,

2021-04-10 17:16:03.250 spid23s Starting up database ‘JBDB’.
2021-04-10 17:16:03.250 spid11s Starting up database ‘mssqlsystemresource’.
2021-04-10 17:16:03.250 spid23s Error: 5172, Severity: 16, State: 15.
2021-04-10 17:16:03.250 spid23s The header for file ‘F:\data\JBDB_Data.mdf’ is not a valid database file header. The PageAudit property is incorrect.

-> Tried executing DBCC checkdb on the database and got below as expected,

dbcc checkdb ([JBDB])
GO

Msg 945, Level 14, State 2, Line 1
Database ‘JBDB’ cannot be opened due to inaccessible files or insufficient memory or disk space. See the SQL Server errorlog for details.

-> Lets try putting the database in emergency mode,

ALTER DATABASE [JBDB] SET EMERGENCY;
GO

-> Getting below error which same as previous occurrences,

Msg 5172, Level 16, State 15, Line 1
The header for file ‘F:\data\JBDB_Data.mdf’ is not a valid database file header. The PageAudit property is incorrect.

-> The only option left is to restore the database with a valid backup. But the bad news is that there are no backups.

-> SQL Server Database recovery tools is something we can check as an option. In my case, I tried “Stellar Repair for MS SQL“.

-> Before using Stellar Repair for MS SQL tool, it is better to backup the MDF and LDF file for safety.

-> Once done make another copy of the files to use it with Stellar Repair for MS SQL tool. In my case I have placed the file in location F:\Debug,

-> Open “Stellar Repair for MS SQL” and click on “Select database”,

-> Select the corrupted MDF file and click on Repair,

-> Repair completed,

-> Once repaired, Click on Save,

-> Processing this new database is going to take time depending on the database size. The time spent is definitely worth it, as I was able to recover the whole database.

-> Once the database is recovered. Do thorough testing and make sure the database looks alright.

-> I executed DBCC CHECKDB on the new database recovered by Stellar Repair for MS SQL and it came out clean.

-> There you go. Database is available now.

Thank You,
Vivek Janakiraman

Disclaimer:
The views expressed on this blog are mine alone and do not reflect the views of my company or anyone else. All postings on this blog are provided “AS IS” with no warranties, and confers no rights.