-> I recently worked on an Application timeout issue involving Multi-subnet Always On availability group.
Environment

-> Environment consists of two (2) Windows 2016 virtual machines. One server is located in the Primary Datacentre and the second in the Secondary Datacentre.
-> These servers were configured as nodes of a two (2) node Windows Server Failover Cluster (without shared storage). An instance of SQL Server 2016 Enterprise Edition is installed on each server; each instance act as an Always-On Availability Group Replica.
-> The replica in the Primary Datacentre will be a member of an Automatic Failover Set with synchronous commit to the replica in secondary Datacentre. Latency between Primary and secondary Datacentre is less than 2 MS.
-> This environment is configured as a Multi-Subnet environment.
-> Application team advised of intermittent connectivity issue .
Issue
-> The availability group Listener is configured with an IP address from each defined subnet. This means that the Availability group listener will have an IP address of 192.150.10.15 when it resides in Primary datacentre and 192.150.0.15 when it resides on Secondary datacentre.
-> Client Operating system queries the DNS server to resolve the Listener name to IP address. DNS will return 2 IP address in this environment with one (1) IP address that the subnet currently hosting AG Primary replica will be online and the other IP Address that the subnet hosting secondary replica will be offline. Client application using the AlwaysOn Listener can have connectivity issues while connecting to it.
-> Application tries all IP Address one by one and connects to the one that is online. Since this is done serially, there is high possibility that the application reaches its timeout value and the connection terminates with a timeout error.
Workaround\Fix
-> Adding MultiSubnetFailover parameter to True in Application connection string. When True, Application connection tries all AG Listener IP Address in parallel. This will avoid the connection timeout caused when trying serially.
-> If adding MultiSubnetFailover parameter to the Application Connection String is not possible. Then Below should be performed.
-> Parameter RegisterAllprovidersIP should be set to 0. When RegisterAllprovidersIP is set to 0, only the active Listener IP Address is registered in the DNS. When set to 1 (DEFAULT), all of the IP Address the Listener is dependent on is registered with the DNS.
# Note down the LISTENER name from the output of below command
PS C:\Windows\system32> Get-ClusterResource
# Replace LISTENERNAME_FROM_OUTPUT with the value noted from above output. Check RegisterAllprovidersIP value and it should be 1
PS C:\Windows\system32> Get-ClusterResource -Name LISTENERNAME_FROM_OUTPUT | Get-ClusterParameter
# Execute below to set RegisterAllProvidersIP to 0
Get-ClusterResource -Name LISTENERNAME_FROM_OUTPUT | Set-ClusterParameter RegisterAllProvidersIP 0
# Check RegisterAllprovidersIP value and see if it set to 0
PS C:\Windows\system32> Get-ClusterResource -Name LISTENERNAME_FROM_OUTPUT | Get-ClusterParameter
-> Parameter HostRecordTTL should be set with a value 60 to 300 from the default value of 1200 (20 Minutes). HostRecordTTL parameter decides how long in seconds the Client operating system will query the DNS for the current IP address. Reducing this value can have an adverse effect on your DNS server if there are several servers connecting to the DNS to resolve the Listener IP Address. Hence it is advised as 60 to 300. Personally, I have set this value to 60 and never seen any issue.
# Note down the LISTENER name from the output of below command.
PS C:\Windows\system32> Get-ClusterResource
# Replace LISTENERNAME_FROM_OUTPUT with the value noted from above output. Check HostRecordTTL value and it should be 1200
PS C:\Windows\system32> Get-ClusterResource -Name LISTENERNAME_FROM_OUTPUT | Get-ClusterParameter
# Execute below to set HostRecordTTL to 60
Get-ClusterResource -Name LISTENERNAME_FROM_OUTPUT | Set-ClusterParameter HostRecordTTL 60
# Check HostRecordTTL value and see if it set to 60
PS C:\Windows\system32> Get-ClusterResource -Name LISTENERNAME_FROM_OUTPUT | Get-ClusterParameter
-> Failover the Availability group to have the above changes take effect.
-> Speak to your DNS team and check for the Listener’s Host (A) record in the DNS Server and confirm that these records are not static.
Testing
-> Once the above tasks are performed. Run a “nslookup <Listener_Name>” on each of the replica and make sure you see the IP Address with respect to the subnet currently hosting AG Primary replica on all your replica. It may take sometime to get the correct IP Address on the replica that is not in the subnet currently hosting AG Primary replica. This depends on the DNS replication schedule. Note down the delay and speak to your DNS and Application team. If they are fine with this delay, I think everything is set. If they are not fine, then your DNS Team will basically have to change the DNS Sync as appropriate.
Thank You,
Vivek Janakiraman
Disclaimer:
The views expressed on this blog are mine alone and do not reflect the views of my company or anyone else. All postings on this blog are provided “AS IS” with no warranties, and confers no rights.