Zero RPO or 0 RPO with Microsoft SQL Server

Environment

-> Environment has two (2) standalone database servers JBSAG1 and JBSAG2 with SQL server instances installed on it. Database Servers JBSAG1 and JBSAG2 are part of a failover cluster without shared storage and are configured as replicas on an Asynchronous Always On Availability group. Database Server JBSAG1 acts as primary replica and JBSAG2 as secondary replica.

-> Database Server JBSAG1 is on primary datacenter and JBSAG2 is in Secondary datacenter. Distance between database servers JBSAG1 and JBSAG2 is 1500 Kilometers.

-> Zero RPO or 0 RPO may be achieved using below procedures. But not sure if this is supported.

-> Transaction log backups (*.trn) and LDF file for each database on database server JBSAG1 is replicated to a storage that is available on a third datacenter using Synchronous storage replication. Synchronous storage replication will move the changes of LDF file to secondary storage first and upon confirmation from secondary, primary server JBSAG1 LDF will have that change hardened. This is outside of SQL Server and is taken care at the storage level. Distance between “Primary Datacenter” to “Tertiary Datacenter” is 5 kilometers.

-> Lets assume we have only one (1) database called JBFinance on Database server JBSAG1 and that is part of Availability group to database server JBSAG2. Storage replication will ensure database JBFinance database’s log backups and LDF files are replicated to the storage in “Tertiary Datacentre” using Storage replication.

Scenario 1

-> Consider primary replica JBSAG1 goes down and is expected to come online after 12 hours.

-> At this stage a failover to JBSAG2 will result in data loss. In order to avoid any data loss we can utilize the LDF file that are backed up on the Tertiary datacenter using below steps.

-> Storage replication between JBSAG1 to storage in “Tertiary Datacenter” should be stopped by storage team. Storage team will make this storage available. Platforms team will provision this as a drive onto a temporary database server JBSAG4 that already has the same SQL Server version as JBSAG1 on “Tertiary Datacenter”.

-> On temporary database server JBSAG4 we will create a dummy database called JBFinance.

-> Once the dummy database is created, stop the SQL Server on JBSAG4 and replace the dummy database JBFinance’s LDF file with the LDF file that is replicated from Primary database Server JBSAG1 to storage in Tertiary datacenter.

-> We will start the SQL Services on JBSAG4 and the temporary database JBFinance on SQL Server instance JBSAG4 will be in “recovery pending” state.

-> Execute below query on JBSAG4 to take a tail log backup,

Backup log JBFinance to disk = '\\JBSDC\Backup\JBFinance_taillog_13May2021.trn' with NO_TRUNCATE

-> Lets remove the database JBFinance from Availability group on JBSAG2 and leave it in restoring state,

-> Perform a restore on JBFinance database using the tail log backup.

RESTORE LOG [JBFinance] FROM  
DISK = N'\\JBSDC\Backup\JBFinance_taillog_13May2021.trn' 
WITH  FILE = 1,  NOUNLOAD,  STATS = 10
GO

Scenario 2

-> Status of Availability group JBSwiki,

-> Consider Secondary Replica JBSAG2 goes down.

-> In the meantime, Primary replica receives write workload from application while Secondary Replica JBSAG2 is down.

-> Primary replica JBSAG1 goes down and is expected to be online after 12 hours.

-> Secondary replica JBSAG2 comes online now and below is the status,

-> At this stage a failover to JBSAG2 will result in data loss. In order to avoid any data loss we can utilize the LDF file that are backed up on the Tertiary datacenter using below steps.

-> Storage replication between JBSAG1 to storage in “Tertiary Datacenter” should be stopped by storage team. Storage team will make this storage available. Platforms team will provision this as a drive onto a temporary database server JBSAG4 that already has the same SQL Server version as JBSAG1 on “Tertiary Datacenter”.

-> On temporary database server JBSAG4 we will create a dummy database called JBFinance.

-> Once the dummy database is created, stop the SQL Server on JBSAG4 and replace the dummy database JBFinance’s LDF file with the LDF file that is replicated from Primary database Server JBSAG1 to storage in Tertiary datacenter.

-> We will start the SQL Services on JBSAG4 and the temporary database JBFinance on SQL Server instance JBSAG4 will be in “recovery pending” state.

-> Execute below query on JBSAG4 to take a tail log backup,

Backup log JBFinance to disk = '\\JBSDC\Backup\JBFinance_taillog_13May2021.trn' with NO_TRUNCATE

-> Lets remove the database JBFinance from Availability group on JBSAG2 and leave it in restoring state,

-> Perform a restore on JBFinance database using the tail log backup.

RESTORE LOG [JBFinance] FROM  
DISK = N'\\JBSDC\Backup\JBFinance_taillog_13May2021.trn' 
WITH  FILE = 1,  NOUNLOAD,  STATS = 10
GO

-> We could see the changes that were made on primary replica JBSAG1 available on JBSAG2. We can use above approach to achieve 0 RPO.

-> Same approach can be used for Logshipping also. Please find the details below,

Environment

-> Environment hosts two (2) standalone database server JBSAG1 and JBSAG2 with SQL server instances installed on it. JBSAG1 and JBSAG2 utilizes Logshipping with JBSAG1 configured as primary. Log backups as part of Logshipping happens every 5 minutes.

-> Database Server JBSAG1 is on primary datacentre and JBSAG2 is in Secondary datacentre. Distance between database servers JBSAG1 and JBSAG2 is 1500 Kilometers.

-> Transaction log backups (*.trn) and LDF file for each database on database server JBSAG1 is replicated to a storage that is available on a third datacenter using Synchronous storage replication. Synchronous storage replication will move the changes of LDF file to secondary storage first and upon confirmation from secondary, primary server JBSAG1 LDF will have that change hardened. This is outside of SQL Server and is taken care at the storage level. Distance between “Primary Datacenter” to “Tertiary Datacenter” is 5 kilometers.

-> Lets assume we have only one (1) database called JBFinance on Database server JBSAG1 and that is part of logshipping to database server JBSAG2. Storage replication will ensure database JBFinance database’s log backups and LDF files are replicated to the storage in “Tertiary Datacentre” using Storage replication.

-> In an event where database server JBSAG1 goes down, storage replication between JBSAG1 to storage in “Tertiary Datacentre” should be stopped by storage team. Storage team will make this storage available. Platforms team will provision this drive onto a temporary database server JBSAG4 that already has the same SQL Server version as JBSAG1 on “Tertiary Datacente” as a separate drive.

-> On temporary database server JBSAG4 we will create a dummy database called JBFinance. Once the dummy database is created, stop the SQL Server on JBSAG4 and replace the dummy database JBFinance’s LDF file with the LDF file that is replicated from Primary database Server JBSAG1 to storage in Tertiary datacenter.

-> Start the SQL Services on JBSAG4 and the temporary database JBFinance on SQL Server instance JBSAG4 will be in “recovery pending” state.

-> Execute below query on JBSAG4 to take a tail log backup.

Backup log JBFinance to disk = '\\JBSDC\Backup\JBFinance_taillog_13May2021.trn' with NO_TRUNCATE

-> We will then check the last log backup that is restored on secondary database server JBSAG2. With that info we will move all required log backup and tail log backup from JBSAG4 to JBSAG2 and perform the restore. With this approach we will ensure that there will be no data loss.

Below are the PRO’s and CON’s for this approach,

Thank You,
Vivek Janakiraman

Disclaimer:
The views expressed on this blog are mine alone and do not reflect the views of my company or anyone else. All postings on this blog are provided “AS IS” with no warranties, and confers no rights.

The local availability replica of availability group ‘JBAG’ cannot accept signal ‘UNJOIN_DB’ in its current replica role

Environment

-> Environment has two (2) standalone database servers JBSAG1 and JBSAG2 with Always On configured. Below is the Always On dashboard.

-> Primary Replica JBSAG1 goes down and is expected to come online after several hours. Below is the Always ON dashboard,

-> Automatic failover is not enabled as Availability group is configured for Asynchronous commit. Availability group is in Resolving state.

-> Below error occurs when we try to remove the database JBFinance from Availability group JBAG from secondary replica,

The database ‘JBFinance’ failed to leave the availability group ‘JBAG’ on the availability replica ‘JBSAG2’. (Microsoft.SqlServer.Management.SDK.TaskForms)

The local availability replica of availability group ‘JBAG’ cannot accept signal ‘UNJOIN_DB’ in its current replica role, ‘RESOLVING_NORMAL’, and state (configuration is in Windows Server Failover Clustering store, local availability replica has joined). The availability replica signal is invalid given the current replica role. When the signal is permitted based on the current role of the local availability replica, retry the operation. (Microsoft SQL Server, Error: 41121)

Msg 41190, Level 16, State 8, Line 6
Availability group ‘JBAG’ failed to process remove-database command. The local availability replica is not in a state that could process the command. Verify that the availability group is online and that the local availability replica is the primary replica, then retry the command.

-> We cannot drop the database from availability group when it is in resolving state.

-> The option available are as below,

[$] Perform a “Failover with potential data loss”.

ALTER AVAILABILITY GROUP [JBAG] FORCE_FAILOVER_ALLOW_DATA_LOSS;

[$] Drop availability group JBAG and then recover the database.

USE [master]
GO
DROP AVAILABILITY GROUP [JBAG];
GO

restore database JBFinance with recovery
GO

-> We have 2 options as advised above. But I would always select the first option ” Failover with potential data loss” during scenarios like this.

Thank You,
Vivek Janakiraman

Disclaimer:
The views expressed on this blog are mine alone and do not reflect the views of my company or anyone else. All postings on this blog are provided “AS IS” with no warranties, and confers no rights.

The database ‘jbswiki’ has reached its size quota. Partition or delete data, drop indexes, or consult the documentation for possible resolutions.

-> Application fails with below error when trying to connect to Azure SQL database,

Msg 40544, Level 17, State 2, Line 10
The database ‘jbswiki’ has reached its size quota. Partition or delete data, drop indexes, or consult the documentation for possible resolutions.

-> Executing below query on “jbswiki” database to check the used and free space on the database,

if convert(varchar(20),SERVERPROPERTY('productversion')) like '8%' 
SELECT [name], fileid, filename, [size]/128.0 AS 'Total Size in MB', 
[size]/128.0 - CAST(FILEPROPERTY(name, 'SpaceUsed') AS int)/128.0 AS 'Available Space In MB', 
CAST(FILEPROPERTY(name, 'SpaceUsed') AS int)/128.0 AS 'Used Space In MB', 
(100-((([size]/128.0 - CAST(FILEPROPERTY(name, 'SpaceUsed') AS int)/128.0)/([size]/128.0))*100.0)) AS 'percentage Used' 
FROM sysfiles 
else
SELECT @@servername as 'ServerName',db_name() as DBName,[name], file_id, physical_name, [size]/128 AS 
'Total Size in MB', 
[size]/128.0 - CAST(FILEPROPERTY(name, 'SpaceUsed') AS int)/128.0 AS 'Available Space In MB', 
CAST(FILEPROPERTY(name, 'SpaceUsed') AS int)/128.0 AS 'Used Space In MB', 
(100-((([size]/128.0 - CAST(FILEPROPERTY(name, 'SpaceUsed') AS int)/128.0)/([size]/128.0))*100.0)) AS 'percentage Used' 
FROM sys.database_files
go

-> From the screenshot above, it is clear that the database “jbswiki” is full.

-> Lets try a simple insert statement and check the behaviour,

-> Login to Azure portal and select the database jbswiki. On “Overview” tab you will see that the database is full,

-> Click on “Compute + storage” under Settings. Change the “Data Max Size” to an appropriate value. In my case I have changed from 100 to 500 MB. Click Apply,

-> Application connections to database jbswiki started working fine after above change.

-> Below is the view of “Overview” tab after the change,

-> Lets try an insert and check if it is working,

->It worked fine this time.

Thank You,
Vivek Janakiraman

Disclaimer:
The views expressed on this blog are mine alone and do not reflect the views of my company or anyone else. All postings on this blog are provided “AS IS” with no warranties, and confers no rights.