Always ON – Restoring Transaction Log backups to a lagging Always ON Secondary Replica

Environment

-> Customer wants to perform a Disaster Recovery drill on an environment that is similar to an environment as shown above. While preparing for a failover to secondary replica JBSAG2, we have identified that the Log send queue was large. Transaction log is also large in Primary replica JBSAG1. It seems like the log send queue has increased to an larger extent due to network issue and a failover at this stage will lead to huge data loss. We had to help them failover Always ON availability group to secondary replica JBSAG2 from Primary replica JBSAG1.

-> Checking Always ON dashboard,

-> Executing below query on Secondary replica JBSAG2 to get the details of log send queue and last_hardened_lsn,

SELECT AR.replica_server_name, ADC.database_name, AG.name AS AG_name,
DHDRS.is_primary_replica, DHDRS.log_send_queue_size, DHDRS.last_hardened_lsn
FROM sys.dm_hadr_database_replica_states AS DHDRS
INNER JOIN sys.availability_databases_cluster AS ADC 
ON DHDRS.group_id = ADC.group_id AND DHDRS.group_database_id = ADC.group_database_id
INNER JOIN sys.availability_groups AS AG
ON AG.group_id = DHDRS.group_id INNER JOIN sys.availability_replicas AS AR 
ON DHDRS.group_id = AR.group_id AND 
DHDRS.replica_id = AR.replica_id
order by AR.replica_server_name
	

-> The last_Hardened_lsn in Secondary replica is 197000005084000001 as per above output.

-> Use below query in MSDB to view the log backup within the range of LSN 197000005084000001,

select b.database_name, b.type, bf.physical_device_name, b.backup_start_date, 
b.backup_finish_date
from msdb..backupset b
left outer join msdb..backupmediafamily bf on bf.media_set_id = b.media_set_id
where b.type in('L') and b.database_name = 'jbdb'
and b.first_lsn<'197000005084000001' and b.last_lsn>'197000005084000001' 
order by b.backup_finish_date desc

-> We should restore all transaction log backup on Secondary replica database starting with backup specified on above screenshot.

-> In my case, I see below backups marked to be restored,

-> Before starting the restore on secondary replica, database JBDB should be taken out of availability group in secondary replica using below query,

alter database [JBDB] set hadr off 

-> Once the above query is restored, database JBDB in secondary replica JBSAG2 will be in restoring mode,

-> Create restore query using below tsql. Make sure you check the backup files and confirm that the restore command is correct,

select 'restore log [JBDB] from disk='''
+bf.physical_device_name + ''' with norecovery '
from msdb..backupset b
left outer join msdb..backupmediafamily bf on bf.media_set_id = b.media_set_id
where b.type in('L') and b.database_name = 'jbdb'
and b.last_lsn>'197000005084000001' 
order by b.backup_finish_date asc

-> Now we perform the log restore using below query,

restore log [JBDB] from disk='c:\temp\JBSCLUSTER$JBSAG\JBDB\LOG\JBSCLUSTER$JBSAG_JBDB_LOG_20200927_135500.trn' with norecovery 
.
.
restore log [JBDB] from disk='c:\temp\JBSCLUSTER$JBSAG\JBDB\LOG\JBSCLUSTER$JBSAG_JBDB_LOG_20200927_154500.trn' with norecovery 

-> Once the restore is completed. Execute below query to add the database back to availability group on secondary replica and resume data movement,

alter database [JBDB] set hadr availability group = [JBSAG] 
go 
alter database [JBDB] set hadr resume 
go 

-> Checking Always ON Availability group dashboard and everything seems normal,

-> Failover was initiated after above actions from Primary replica JBSAG1 to secondary replica JBSAG2.

Thank You,
Vivek Janakiraman

Disclaimer:
The views expressed on this blog are mine alone and do not reflect the views of my company or anyone else. All postings on this blog are provided “AS IS” with no warranties, and confers no rights.

Replicated transactions are waiting for next Log backup or for mirroring partner to catch up

Environment

-> We had an environment which is same as above environment. Remote distributor, Publisher and subscriber servers were configured in Always ON availability group.

-> I worked on a disaster recovery drill. As part of the DR, All database server on primary datacentre were brought down. Below is the representation after the primary datacentre was brought down.

-> I failed over all database servers from Primary Datacentre to Secondary Datacentre.

-> Failover completed fine. Below is the view in SQL Server management studio,

-> I checked the Replication monitor and found below message in Log reader agent,

Replicated transactions are waiting for next Log backup or for mirroring partner to catch up.
Initializing
Validating publisher
Starting agent.

-> Message “Replicated transactions are waiting for next Log backup or for mirroring partner to catch up.” is as a result of Always ON availability secondary replica being down.

-> With the current setup transactional replication log reader agent will not be able to to move forward as the asynchronous secondary replicas have not acknowledged the reception of a change.

-> One option to solve this issue is to wait for the Always ON secondary replica to come online and that should fix the issue.

-> But in my case Always ON secondary replica will be online after 12 hours. So waiting for the Always ON secondary replica to come online is not an option.

-> Second option will be enable trace flag 1448. Please check article1 and article2 for more details on this trace flag.

-> Trace flag 1448 enables the replication log reader to move forward even if the asynchronous secondary replicas have not acknowledged the reception of a change. Even with this trace flag enabled, the log reader always waits for the synchronous secondary replicas. The log reader will not go beyond the min ack of the synchronous secondary replicas. This trace flag applies to the instance of SQL Server, not just to an availability group, an availability database, or a log reader instance. This trace flag takes effect immediately without a restart. It can be activated ahead of time or when an asynchronous secondary replica fails.

-> I enabled trace flag 1448 on the Publisher server JBPRIMARY-DR using below query,

dbcc traceon(1448,-1)

-> Once the trace flag 1448 is enabled. Log reader agent completed fine.

-> Please note that this trace flag can be enabled without a SQL Server restart.

Thank You,
Vivek Janakiraman

Disclaimer:
The views expressed on this blog are mine alone and do not reflect the views of my company or anyone else. All postings on this blog are provided “AS IS” with no warranties, and confers no rights.

Cannot open database ‘JBDB’ version 869. Upgrade the database to the latest version. (Microsoft SQL Server, Error: 946)

Environment

-> Database Server JBSAG1 and JBSAG2 hosts SQL Server 2017. The requirement was to perform an In-Place upgrade to SQL Server 2019 on JBSAG1 and JBSAG2.

-> Database Server JBSAG1 is secondary replica and JBSAG2 is primary replica.

-> In-place upgrade from SQL Server 2017 to SQL Server 2019 was started on JBSAG1 as it is the secondary replica. It completed fine without any issues.

-> After the upgrade, connected to JBSAG1 using SSMS and performed a health check. The database JBDB on SQL Server instance JBSAG1 showed as Synchronized / In Recovery as show in below screenshot,

-> Checked SQL Server error log and found below message,

2020-09-10 15:05:27.020 spid53 Error: 946, Severity: 14, State: 1.
2020-09-10 15:05:27.020 spid53 Cannot open database ‘JBDB’ version 869. Upgrade the database to the latest version.

Cannot open database ‘JBDB’ version 869. Upgrade the database to the latest version. (Microsoft SQL Server, Error: 946)

-> The message is very clear. After the In-place upgrade, the database should be upgraded to latest version which is SQLS erver 2019, since the database is in Always on secondary, setup process was not able to upgrade it during the upgrade process.

-> Checking the Always On availability group dashboard from primary replica JBSAG2,

-> There are no errors and Always On availability group is in healthy state. So it is safe to failover Availability group from JBSAG2 to JBSAG1. Once after the failover, the database will get upgraded.

-> After the failover database JBDB in JBSAG1 is online and synchronized,

-> Checking the Always On dashboard,

-> JBSAG2 which is the secondary replica now and shows database JBDB as “Not Synchronizing / In Recovery”. On JBSAG2, check the database JBDB status under Availability databases and you will see that the database is paused.

-> We are seeing this in paused state because after failover the current Primary JBSAG1 is in SQL Server 2019 and the secondary JBSAG2 is in SQL Server 2017. Now it is time to perform an In-Place upgrade to SQL Server 2019 on JBSAG2 and the database JBDB in “Not Synchronizing / In Recovery” state in JBSAG2 is expected.

<TESTING ONLY – DONOT PERFORM ON LIVE SERVERS>

-> I will try resuming the database JBDB on JBSAG2 without
performing an In-Place upgrade from SQL Server 2017 to SQL
Server 2019.

-> Checking SQL Server error log. Below message can be found,


2020-09-10 15:20:21.490 spid71 ALTER DB param option: RESUME
2020-09-10 15:20:21.500 spid71 Always On Availability Groups data
movement for database ‘JBDB’ has been resumed. This is an
informational message only. No user action is required.
2020-09-10 15:20:21.500 spid77s Always On Availability Groups
connection with primary database established for secondary
database ‘JBDB’ on the availability replica ‘JBSAG1’ with Replica ID:
{850de16e-2c2b-43b1-bf52-6d8c5e2046ff}. This is an informational
message only. No user action is required.


2020-09-10 15:20:21.500 spid77s Error: 948, Severity: 20, State:
102.
2020-09-10 15:20:21.500 spid77s The database ‘JBDB’ cannot be
opened because it is version 904. This server supports version
869 and earlier. A downgrade path is not supported.


2020-09-10 15:20:21.500 spid77s Always On Availability Groups
data movement for database ‘JBDB’ has been suspended for
the following reason: “system” (Source ID 7; Source string:
‘SUSPEND_FROM_REVALIDATION’). To resume data
movement on the database, you will need to resume the
database manually. For information about how to resume an
availability database, see SQL Server Books Online.

-> The message clearly tells that the database should be upgraded
before resuming the data movement. Hence In-Place upgrade of
JBSAG2 should be performed.

<TESTING ONLY – DONOT PERFORM ON LIVE SERVERS>

-> Performed In-Place upgrade from SQL Server 2017 to SQL Server 2019 on database server JBSAG2.

-> I checked the Always On dashboard. The database JBDB was still “Not Synchronizing / In Recovery” and was suspended.

-> I resumed the data movement for database JBDB on database server JBSAG2 and it worked fine this time. Below is the Always On dashobaord,

Thank You,
Vivek Janakiraman

Disclaimer:
The views expressed on this blog are mine alone and do not reflect the views of my company or anyone else. All postings on this blog are provided “AS IS” with no warranties, and confers no rights.