Zero RPO or 0 RPO with Microsoft SQL Server

Environment

-> Environment has two (2) standalone database servers JBSAG1 and JBSAG2 with SQL server instances installed on it. Database Servers JBSAG1 and JBSAG2 are part of a failover cluster without shared storage and are configured as replicas on an Asynchronous Always On Availability group. Database Server JBSAG1 acts as primary replica and JBSAG2 as secondary replica.

-> Database Server JBSAG1 is on primary datacenter and JBSAG2 is in Secondary datacenter. Distance between database servers JBSAG1 and JBSAG2 is 1500 Kilometers.

-> Zero RPO or 0 RPO may be achieved using below procedures. But not sure if this is supported.

-> Transaction log backups (*.trn) and LDF file for each database on database server JBSAG1 is replicated to a storage that is available on a third datacenter using Synchronous storage replication. Synchronous storage replication will move the changes of LDF file to secondary storage first and upon confirmation from secondary, primary server JBSAG1 LDF will have that change hardened. This is outside of SQL Server and is taken care at the storage level. Distance between “Primary Datacenter” to “Tertiary Datacenter” is 5 kilometers.

-> Lets assume we have only one (1) database called JBFinance on Database server JBSAG1 and that is part of Availability group to database server JBSAG2. Storage replication will ensure database JBFinance database’s log backups and LDF files are replicated to the storage in “Tertiary Datacentre” using Storage replication.

Scenario 1

-> Consider primary replica JBSAG1 goes down and is expected to come online after 12 hours.

-> At this stage a failover to JBSAG2 will result in data loss. In order to avoid any data loss we can utilize the LDF file that are backed up on the Tertiary datacenter using below steps.

-> Storage replication between JBSAG1 to storage in “Tertiary Datacenter” should be stopped by storage team. Storage team will make this storage available. Platforms team will provision this as a drive onto a temporary database server JBSAG4 that already has the same SQL Server version as JBSAG1 on “Tertiary Datacenter”.

-> On temporary database server JBSAG4 we will create a dummy database called JBFinance.

-> Once the dummy database is created, stop the SQL Server on JBSAG4 and replace the dummy database JBFinance’s LDF file with the LDF file that is replicated from Primary database Server JBSAG1 to storage in Tertiary datacenter.

-> We will start the SQL Services on JBSAG4 and the temporary database JBFinance on SQL Server instance JBSAG4 will be in “recovery pending” state.

-> Execute below query on JBSAG4 to take a tail log backup,

Backup log JBFinance to disk = '\\JBSDC\Backup\JBFinance_taillog_13May2021.trn' with NO_TRUNCATE

-> Lets remove the database JBFinance from Availability group on JBSAG2 and leave it in restoring state,

-> Perform a restore on JBFinance database using the tail log backup.

RESTORE LOG [JBFinance] FROM  
DISK = N'\\JBSDC\Backup\JBFinance_taillog_13May2021.trn' 
WITH  FILE = 1,  NOUNLOAD,  STATS = 10
GO

Scenario 2

-> Status of Availability group JBSwiki,

-> Consider Secondary Replica JBSAG2 goes down.

-> In the meantime, Primary replica receives write workload from application while Secondary Replica JBSAG2 is down.

-> Primary replica JBSAG1 goes down and is expected to be online after 12 hours.

-> Secondary replica JBSAG2 comes online now and below is the status,

-> At this stage a failover to JBSAG2 will result in data loss. In order to avoid any data loss we can utilize the LDF file that are backed up on the Tertiary datacenter using below steps.

-> Storage replication between JBSAG1 to storage in “Tertiary Datacenter” should be stopped by storage team. Storage team will make this storage available. Platforms team will provision this as a drive onto a temporary database server JBSAG4 that already has the same SQL Server version as JBSAG1 on “Tertiary Datacenter”.

-> On temporary database server JBSAG4 we will create a dummy database called JBFinance.

-> Once the dummy database is created, stop the SQL Server on JBSAG4 and replace the dummy database JBFinance’s LDF file with the LDF file that is replicated from Primary database Server JBSAG1 to storage in Tertiary datacenter.

-> We will start the SQL Services on JBSAG4 and the temporary database JBFinance on SQL Server instance JBSAG4 will be in “recovery pending” state.

-> Execute below query on JBSAG4 to take a tail log backup,

Backup log JBFinance to disk = '\\JBSDC\Backup\JBFinance_taillog_13May2021.trn' with NO_TRUNCATE

-> Lets remove the database JBFinance from Availability group on JBSAG2 and leave it in restoring state,

-> Perform a restore on JBFinance database using the tail log backup.

RESTORE LOG [JBFinance] FROM  
DISK = N'\\JBSDC\Backup\JBFinance_taillog_13May2021.trn' 
WITH  FILE = 1,  NOUNLOAD,  STATS = 10
GO

-> We could see the changes that were made on primary replica JBSAG1 available on JBSAG2. We can use above approach to achieve 0 RPO.

-> Same approach can be used for Logshipping also. Please find the details below,

Environment

-> Environment hosts two (2) standalone database server JBSAG1 and JBSAG2 with SQL server instances installed on it. JBSAG1 and JBSAG2 utilizes Logshipping with JBSAG1 configured as primary. Log backups as part of Logshipping happens every 5 minutes.

-> Database Server JBSAG1 is on primary datacentre and JBSAG2 is in Secondary datacentre. Distance between database servers JBSAG1 and JBSAG2 is 1500 Kilometers.

-> Transaction log backups (*.trn) and LDF file for each database on database server JBSAG1 is replicated to a storage that is available on a third datacenter using Synchronous storage replication. Synchronous storage replication will move the changes of LDF file to secondary storage first and upon confirmation from secondary, primary server JBSAG1 LDF will have that change hardened. This is outside of SQL Server and is taken care at the storage level. Distance between “Primary Datacenter” to “Tertiary Datacenter” is 5 kilometers.

-> Lets assume we have only one (1) database called JBFinance on Database server JBSAG1 and that is part of logshipping to database server JBSAG2. Storage replication will ensure database JBFinance database’s log backups and LDF files are replicated to the storage in “Tertiary Datacentre” using Storage replication.

-> In an event where database server JBSAG1 goes down, storage replication between JBSAG1 to storage in “Tertiary Datacentre” should be stopped by storage team. Storage team will make this storage available. Platforms team will provision this drive onto a temporary database server JBSAG4 that already has the same SQL Server version as JBSAG1 on “Tertiary Datacente” as a separate drive.

-> On temporary database server JBSAG4 we will create a dummy database called JBFinance. Once the dummy database is created, stop the SQL Server on JBSAG4 and replace the dummy database JBFinance’s LDF file with the LDF file that is replicated from Primary database Server JBSAG1 to storage in Tertiary datacenter.

-> Start the SQL Services on JBSAG4 and the temporary database JBFinance on SQL Server instance JBSAG4 will be in “recovery pending” state.

-> Execute below query on JBSAG4 to take a tail log backup.

Backup log JBFinance to disk = '\\JBSDC\Backup\JBFinance_taillog_13May2021.trn' with NO_TRUNCATE

-> We will then check the last log backup that is restored on secondary database server JBSAG2. With that info we will move all required log backup and tail log backup from JBSAG4 to JBSAG2 and perform the restore. With this approach we will ensure that there will be no data loss.

Below are the PRO’s and CON’s for this approach,

Thank You,
Vivek Janakiraman

Disclaimer:
The views expressed on this blog are mine alone and do not reflect the views of my company or anyone else. All postings on this blog are provided “AS IS” with no warranties, and confers no rights.

The local availability replica of availability group ‘JBAG’ cannot accept signal ‘UNJOIN_DB’ in its current replica role

Environment

-> Environment has two (2) standalone database servers JBSAG1 and JBSAG2 with Always On configured. Below is the Always On dashboard.

-> Primary Replica JBSAG1 goes down and is expected to come online after several hours. Below is the Always ON dashboard,

-> Automatic failover is not enabled as Availability group is configured for Asynchronous commit. Availability group is in Resolving state.

-> Below error occurs when we try to remove the database JBFinance from Availability group JBAG from secondary replica,

The database ‘JBFinance’ failed to leave the availability group ‘JBAG’ on the availability replica ‘JBSAG2’. (Microsoft.SqlServer.Management.SDK.TaskForms)

The local availability replica of availability group ‘JBAG’ cannot accept signal ‘UNJOIN_DB’ in its current replica role, ‘RESOLVING_NORMAL’, and state (configuration is in Windows Server Failover Clustering store, local availability replica has joined). The availability replica signal is invalid given the current replica role. When the signal is permitted based on the current role of the local availability replica, retry the operation. (Microsoft SQL Server, Error: 41121)

Msg 41190, Level 16, State 8, Line 6
Availability group ‘JBAG’ failed to process remove-database command. The local availability replica is not in a state that could process the command. Verify that the availability group is online and that the local availability replica is the primary replica, then retry the command.

-> We cannot drop the database from availability group when it is in resolving state.

-> The option available are as below,

[$] Perform a “Failover with potential data loss”.

ALTER AVAILABILITY GROUP [JBAG] FORCE_FAILOVER_ALLOW_DATA_LOSS;

[$] Drop availability group JBAG and then recover the database.

USE [master]
GO
DROP AVAILABILITY GROUP [JBAG];
GO

restore database JBFinance with recovery
GO

-> We have 2 options as advised above. But I would always select the first option ” Failover with potential data loss” during scenarios like this.

Thank You,
Vivek Janakiraman

Disclaimer:
The views expressed on this blog are mine alone and do not reflect the views of my company or anyone else. All postings on this blog are provided “AS IS” with no warranties, and confers no rights.

The connection to the primary replica is not active. The command cannot be processed. (Microsoft SQL Server, Error: 35250)

-> I was configuring Always On Availability group on Database Servers JBSAG1, JBSAG2 and JBSAG3 using the wizard. It all went fine and below is the screenshot,

-> Checked the Always On Availability Group dashboard and below was the status,

-> Checking from SQL Server Management Studio,

-> There is a warning sign under “Availability Databases” and the database is in restoring state on both the secondary replicas.

-> Below messages are found,

This secondary replica is not connected to the primary replica. The connected state is DISCONNECTED.

At least one availability database on this availability replica has an unhealthy data synchronization state. If this is an asynchronous-commit availability replica, all availability databases should be in the SYNCHRONIZING state. If this is a synchronous-commit availability replica, all availability databases should be in the SYNCHRONIZED state.

This secondary replica is not joined to the availability group. For an availability replica to be successfully joined to the availability group, the join state must be Joined Standalone Instance (1) or Joined Failover Cluster (2). For information about how-to join a secondary replica to an availability group, see SQL Server Books Online.

-> I tried to join the database JBDB to Availability group and got below message,

TITLE: Microsoft SQL Server Management Studio

Failed to join the database ‘JBDB’ to the availability group ‘JBAG’ on the availability replica ‘JBSAG2’. (Microsoft.SqlServer.Smo)

For help, click: https://go.microsoft.com/fwlink?ProdName=Microsoft+SQL+Server&ProdVer=16.100.46041.41+(SMO-master-A)&EvtSrc=Microsoft.SqlServer.Management.Smo.ExceptionTemplates.FailedOperationExceptionText&LinkId=20476

ADDITIONAL INFORMATION:

An exception occurred while executing a Transact-SQL statement or batch. (Microsoft.SqlServer.ConnectionInfo)

The connection to the primary replica is not active. The command cannot be processed. (Microsoft SQL Server, Error: 35250)

-> Checking SQL Server Error log in JBSAG2,

2021-04-12 16:01:56.190 spid54 ALTER DB JBDB with AGNAME:JBAG
2021-04-12 16:01:56.190 spid54 ALTER DB param option: SET
2021-04-12 16:02:44.280 spid48s Always On Availability Groups connection with primary database terminated for secondary database ‘JBDB’ on the availability replica ‘JBSAG1’ with Replica ID: {81ddde84-4546-47f7-b78b-6de341c46f34}. This is an informational message only. No user action is r
2021-04-12 16:02:44.280 spid54 State information for database ‘JBDB’ – Hardened Lsn: ‘(130:102920:1)’ Commit LSN: ‘(0:0:0)’ Commit Time: ‘Apr 12 2021 9:46AM’
2021-04-12 16:02:44.290 spid54 State information for database ‘JBDB’ – Hardened Lsn: ‘(130:102920:1)’ Commit LSN: ‘(0:0:0)’ Commit Time: ‘Apr 12 2021 9:46AM’
2021-04-12 16:02:44.290 spid54 Starting up database ‘JBDB’.
2021-04-12 16:02:44.300 spid54 The database ‘JBDB’ is marked RESTORING and is in a state that does not allow recovery to be run.
2021-04-12 16:02:44.310 spid54 State information for database ‘JBDB’ – Hardened Lsn: ‘(130:102920:1)’ Commit LSN: ‘(0:0:0)’ Commit Time: ‘Apr 12 2021 9:46AM’
2021-04-12 16:02:44.310 spid54 State information for database ‘JBDB’ – Hardened Lsn: ‘(130:102920:1)’ Commit LSN: ‘(0:0:0)’ Commit Time: ‘Apr 12 2021 9:46AM’
2021-04-12 16:02:44.310 spid54 Starting up database ‘JBDB’.
2021-04-12 16:02:44.320 spid54 The database ‘JBDB’ is marked RESTORING and is in a state that does not allow recovery to be run.

-> Checking SQL Server Error log in JBSAG3,

2021-04-12 15:46:53.000 Backup Database was restored: Database: JBDB, creation date(time): 2021/04/12(09:06:02), first LSN: 130:102896:1, last LSN: 130:102920:1, number of dump devices: 1, device information: (FILE=1, TYPE=DISK: {‘\JBS-SQL-DC\Backup\JBDB.bak’}). Informational message.
2021-04-12 15:46:53.020 Backup RESTORE DATABASE successfully processed 461154 pages in 167.666 seconds (21.487 MB/sec).
2021-04-12 15:46:53.650 Backup Log was restored. Database: JBDB, creation date(time): 2021/04/12(09:06:02), first LSN: 130:102816:1, last LSN: 130:102920:1, number of dump devices: 1, device information: (FILE=1, TYPE=DISK: {‘\JBS-SQL-DC\Backup\JBDB.trn’}). This is an informational mes

-> Messages/errors in SQL Server error log did not provide any clue.

-> Executing below query on each of the instances,

select r.replica_server_name, r.endpoint_url,
rs.connected_state_desc, rs.last_connect_error_description,
rs.last_connect_error_number, rs.last_connect_error_timestamp
from sys.dm_hadr_availability_replica_states rs join sys.availability_replicas r
on rs.replica_id=r.replica_id
where rs.is_local=1

Primary – JBSAG1

Synchronous Secondary – JBSAG2

Error Message,
Connection attempt failed with error: ‘10060(A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond.)’.

Asynchronous Secondary – JBSAG3

Error Message,
Connection attempt failed with error: ‘10060(A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond.)’.

-> The error message refers to a connection issue between the primary and secondaries.

-> Endpoint uses port 5022. Lets check if port 5022 is not blocked using telnet.

‘telnet’ is not recognized as an internal or external command,
operable program or batch file.

-> Execute below command in powershell to enable telnet.

Install-WindowsFeature -name Telnet-Client

-> Telnet is not working for port 5022. It proves that port 5022 is not open.

-> Let us create a firewall rule to open port 5022.

-> Open “Windows Firewall with Advanced Security” and open “Inbound Rules” on JBSAG1, JBSAG2 and JBSAG3.

-> Wait for 5 mins and check the status now by executing below query,

select r.replica_server_name, r.endpoint_url,
rs.connected_state_desc, rs.last_connect_error_description,
rs.last_connect_error_number, rs.last_connect_error_timestamp
from sys.dm_hadr_availability_replica_states rs join sys.availability_replicas r
on rs.replica_id=r.replica_id
where rs.is_local=1

-> After opening port 5022, We couldn’t see that error anymore.

-> Let’s try joining the database again and it works fine this time,

Thank You,
Vivek Janakiraman

Disclaimer:
The views expressed on this blog are mine alone and do not reflect the views of my company or anyone else. All postings on this blog are provided “AS IS” with no warranties, and confers no rights.