-> Customer wants to perform a Disaster Recovery drill on an environment that is similar to an environment as shown above. While preparing for a failover to secondary replica JBSAG2, we have identified that the Log send queue was large. Transaction log is also large in Primary replica JBSAG1. It seems like the log send queue has increased to an larger extent due to network issue and a failover at this stage will lead to huge data loss. We had to help them failover Always ON availability group to secondary replica JBSAG2 from Primary replica JBSAG1.
-> Checking Always ON dashboard,
-> Executing below query on Secondary replica JBSAG2 to get the details of log send queue and last_hardened_lsn,
SELECT AR.replica_server_name, ADC.database_name, AG.name AS AG_name, DHDRS.is_primary_replica, DHDRS.log_send_queue_size, DHDRS.last_hardened_lsn FROM sys.dm_hadr_database_replica_states AS DHDRS INNER JOIN sys.availability_databases_cluster AS ADC ON DHDRS.group_id = ADC.group_id AND DHDRS.group_database_id = ADC.group_database_id INNER JOIN sys.availability_groups AS AG ON AG.group_id = DHDRS.group_id INNER JOIN sys.availability_replicas AS AR ON DHDRS.group_id = AR.group_id AND DHDRS.replica_id = AR.replica_id order by AR.replica_server_name
-> The last_Hardened_lsn in Secondary replica is 197000005084000001 as per above output.
-> Use below query in MSDB to view the log backup within the range of LSN 197000005084000001,
select b.database_name, b.type, bf.physical_device_name, b.backup_start_date, b.backup_finish_date from msdb..backupset b left outer join msdb..backupmediafamily bf on bf.media_set_id = b.media_set_id where b.type in('L') and b.database_name = 'jbdb' and b.first_lsn<'197000005084000001' and b.last_lsn>'197000005084000001' order by b.backup_finish_date desc
-> We should restore all transaction log backup on Secondary replica database starting with backup specified on above screenshot.
-> In my case, I see below backups marked to be restored,
-> Before starting the restore on secondary replica, database JBDB should be taken out of availability group in secondary replica using below query,
alter database [JBDB] set hadr off
-> Once the above query is restored, database JBDB in secondary replica JBSAG2 will be in restoring mode,
-> Create restore query using below tsql. Make sure you check the backup files and confirm that the restore command is correct,
select 'restore log [JBDB] from disk=''' +bf.physical_device_name + ''' with norecovery ' from msdb..backupset b left outer join msdb..backupmediafamily bf on bf.media_set_id = b.media_set_id where b.type in('L') and b.database_name = 'jbdb' and b.last_lsn>'197000005084000001' order by b.backup_finish_date asc
-> Now we perform the log restore using below query,
restore log [JBDB] from disk='c:\temp\JBSCLUSTER$JBSAG\JBDB\LOG\JBSCLUSTER$JBSAG_JBDB_LOG_20200927_135500.trn' with norecovery . . restore log [JBDB] from disk='c:\temp\JBSCLUSTER$JBSAG\JBDB\LOG\JBSCLUSTER$JBSAG_JBDB_LOG_20200927_154500.trn' with norecovery
-> Once the restore is completed. Execute below query to add the database back to availability group on secondary replica and resume data movement,
alter database [JBDB] set hadr availability group = [JBSAG]
alter database [JBDB] set hadr resume
-> Checking Always ON Availability group dashboard and everything seems normal,
-> Failover was initiated after above actions from Primary replica JBSAG1 to secondary replica JBSAG2.
The views expressed on this blog are mine alone and do not reflect the views of my company or anyone else. All postings on this blog are provided “AS IS” with no warranties, and confers no rights.