Application Timeout with Multi-subnet Always On Availability Group

-> I recently worked on an Application timeout issue involving Multi-subnet Always On availability group.

Environment

AG_Multisubnet.PNG

-> Environment consists of two (2) Windows 2016 virtual machines. One server is located in the Primary Datacentre and the second in the Secondary Datacentre.

-> These servers were configured as nodes of a two (2) node Windows Server Failover Cluster (without shared storage). An instance of SQL Server 2016 Enterprise Edition is installed on each server; each instance act as an Always-On Availability Group Replica.

-> The replica in the Primary Datacentre will be a member of an Automatic Failover Set with synchronous commit to the replica in secondary Datacentre. Latency between Primary and secondary Datacentre is less than 2 MS.

-> This environment is configured as a Multi-Subnet environment.

-> Application team advised of intermittent connectivity issue .

Issue

-> The availability group Listener is configured with an IP address from each defined  subnet. This means that the Availability group listener will have an IP address of 192.150.10.15 when it resides in Primary datacentre and 192.150.0.15 when it resides on Secondary datacentre.

-> Client Operating system queries the DNS server to resolve the Listener name to IP address. DNS will return 2 IP address in this environment with one (1) IP address that the subnet currently hosting AG Primary replica will be online and the other IP Address that the subnet hosting secondary replica will be offline. Client application using the AlwaysOn Listener can have connectivity issues while connecting to it.

-> Application tries all IP Address one by one and connects to the one that is online. Since this is done serially, there is high possibility that the application reaches its timeout value and the connection terminates with a timeout error.

Workaround\Fix

-> Adding MultiSubnetFailover parameter to True in Application connection string. When True, Application connection tries all AG Listener IP Address in parallel. This will avoid the connection timeout caused when trying serially.

-> If adding MultiSubnetFailover parameter to the Application Connection String is not possible. Then Below should be performed.

-> Parameter RegisterAllprovidersIP should be set to 0. When RegisterAllprovidersIP is set to 0, only the active Listener IP Address is registered in the DNS. When set to 1 (DEFAULT), all of the IP Address the Listener is dependent on is registered with the DNS.


# Note down the LISTENER name from the output of below command
PS C:\Windows\system32> Get-ClusterResource

#  Replace LISTENERNAME_FROM_OUTPUT with the value noted from above output. Check RegisterAllprovidersIP value and it should be 1
PS C:\Windows\system32> Get-ClusterResource -Name LISTENERNAME_FROM_OUTPUT | Get-ClusterParameter

# Execute below to set RegisterAllProvidersIP to 0
Get-ClusterResource -Name LISTENERNAME_FROM_OUTPUT | Set-ClusterParameter RegisterAllProvidersIP 0

# Check RegisterAllprovidersIP value and see if it set to 0
PS C:\Windows\system32> Get-ClusterResource -Name LISTENERNAME_FROM_OUTPUT | Get-ClusterParameter

-> Parameter HostRecordTTL should be set with a value 60 to 300 from the default value of 1200 (20 Minutes). HostRecordTTL parameter decides how long in seconds the Client operating system will query the DNS for the current IP address. Reducing this value can have an adverse effect on your DNS server if there are several servers connecting to the DNS to resolve the Listener IP Address. Hence it is advised as 60 to 300. Personally, I have set this value to 60 and never seen any issue.


# Note down the LISTENER name from the output of below command.
PS C:\Windows\system32> Get-ClusterResource

# Replace LISTENERNAME_FROM_OUTPUT with the value noted from above output. Check HostRecordTTL value and it should be 1200
PS C:\Windows\system32> Get-ClusterResource -Name LISTENERNAME_FROM_OUTPUT | Get-ClusterParameter

# Execute below to set HostRecordTTL to 60
Get-ClusterResource -Name LISTENERNAME_FROM_OUTPUT | Set-ClusterParameter HostRecordTTL 60

# Check HostRecordTTL value and see if it set to 60
PS C:\Windows\system32> Get-ClusterResource -Name LISTENERNAME_FROM_OUTPUT | Get-ClusterParameter

-> Failover the Availability group to have the above changes take effect.

-> Speak to your DNS team and check for the Listener’s Host (A) record in the DNS Server and confirm that these records are not static.

Testing

-> Once the above tasks are performed. Run a “nslookup <Listener_Name>” on each of the replica and make sure you see the IP Address with respect to the subnet currently hosting AG Primary replica on all your replica. It may take sometime to get the correct IP Address on the replica that is not in the subnet currently hosting AG Primary replica. This depends on the DNS replication schedule. Note down the delay and speak to your DNS and Application team. If they are fine with this delay, I think everything is set. If they are not fine, then your DNS Team will basically have to change the DNS Sync as appropriate.

Thank You,
Vivek Janakiraman

Disclaimer:
The views expressed on this blog are mine alone and do not reflect the views of my company or anyone else. All postings on this blog are provided “AS IS” with no warranties, and confers no rights.

Advertisements

Restore PRIMARY Filegroup only on a database using full backup

-> I had to restore a database of size 26 TB in my Development server. This database had 2 filegroups namely PRIMARY and BLOB. Primary Filegroup has tables of size 3 TB and BLOB filegroup has tables of size 23 TB.

-> My requirement for this restore was to just restore PRIMARY filegroup and not BLOB filegroup.

-> Below TSQL can be used to restore PRIMARY Filegroup only from the full database backup.


USE [master]
RESTORE DATABASE [JB_DB] FILEGROUP='PRIMARY'
FROM  DISK = N'C:\DB\JB_DB_01of06.bak',
DISK = N'C:\DB\JB_DB_02of06.bak',
DISK = N'C:\DB\JB_DB_03of06.bak',
DISK = N'C:\DB\JB_DB_04of06.bak',
DISK = N'C:\DB\JB_DB_05of06.bak',
DISK = N'C:\DB\JB_DB_06of06.bak'
WITH PARTIAL, RECOVERY, FILE = 1, NOUNLOAD, STATS = 1, REPLACE
GO

Thank You,
Vivek Janakiraman

Disclaimer:
The views expressed on this blog are mine alone and do not reflect the views of my company or anyone else. All postings on this blog are provided “AS IS” with no warranties, and confers no rights.

Execute job step if a job is not running

-> There are two SQL Server Agent jobs scheduled. Job1 has 6 steps and Job2 as 1 step.

-> Job1 details below,

Job Step Step Details
Step1 Changes Maxdop to 0.
Step2 Starts Job2.
Step3 Run reindexing for 3 Indexes.
Step4 Checks if Job2 is completed and waits for the Job2 to complete before proceeding to next step.
Step5 Changes Maxdop to 1.
Step6 Executes TSQL to create views.

-> Job2 details below,

Job Step Step Details
Step1 Run reindexing for 10 Indexes

-> Step4 of Job1 executes below TSQL that checks if Job2 has completed or not and wait for Job2 to complete.


DECLARE @JOB_NAME SYSNAME = N'JOBNAME';
DECLARE @JOB_NAME1 SYSNAME = N'JOBNAME'; 

Job_Status:
IF NOT EXISTS(
        select 1
        from msdb.dbo.sysjobs_view job
        inner join msdb.dbo.sysjobactivity activity on job.job_id = activity.job_id
        where
            activity.run_Requested_date is not null
        and activity.stop_execution_date is null
        and job.name IN (@JOB_NAME,@JOB_NAME1)<span id="mce_SELREST_start" style="overflow:hidden;line-height:0;"></span>
        )
BEGIN
    PRINT 'Job not running'
END
ELSE
BEGIN
    PRINT 'Job is running';
	waitfor delay '00:01:00'
	goto Job_Status;
END 

Thank You,
Vivek Janakiraman

Disclaimer:
The views expressed on this blog are mine alone and do not reflect the views of my company or anyone else. All postings on this blog are provided “AS IS” with no warranties, and confers no rights.