Always ON – Growing LOG SEND QUEUE on Asynchronous replica

Environment

-> Database server JBSAG2 and JBSAG1 are on primary datacentre located in East US.

-> Database server JBSAG3 is on secondary datacentre in Korea Central.

-> Database servers JBSAG1, JBSAG2 and JBSAG3 are part of failover cluster without shared storage.

-> The Log send queue on Asynchronous replica JBSAG3 was continuously growing as shown in below screenshot,

-> It was very clear that there was no issue on the Synchronous replica JBSAG1, as the log send queue was well under control.

-> On the other hand log send queue is huge on Asynchronous replica JBSAG3.

-> I have captured perfmon on database servers JBSAG1, JBSAG2 and JBSAG3. Below are the analysis,

Primary Replica -> JBSAG2

1) Primary JBSAG2 to its Log file

(SQLServer:Databases – Log Bytes Flushed/sec)  refers to how much primary is flushing per second to the log file. In this scenario, we are able to see that for databases JBDB, JBSDBDW and StackOverflow2010 Primary is flushing close to 5+ MB/sec maximum and 1 MB/sec Average to the log file.
We will have to sum Log Bytes Flushed/sec for each database to come to a final value.

2) Primary JBSAG2 to JBSAG1 via network

(SQLServer:Availability Replica -> Bytes Sent to Replica/Sec) refers to how much is sent across the network to the secondary replica. In our case we are able to see that maximum of 5+ MB/sec and 1+ MB/sec on an average sent to secondary JBSAG1 from Primary JBSAG2. This is same as SQLServer:Databases – Log Bytes Flushed/sec which is good.

3) Primary JBSAG2 to JBSAG3 via network

(SQLServer:Availability Replica -> Bytes Sent to Replica/Sec) refers to how much is sent across the network to the secondary replica. In our case we are able to see that maximum of 600+ KB/sec and 400+ KB/sec on an average sent to secondary JBSAG3 from Primary JBSAG2.

It is pretty clear that Secondary JBSAG1 receives data to the tune of 5+ MB/sec which is close to what the primary flushes, whereas Secondary JBSAG3 receives only 600 + KB/sec which is comparably very less.

Secondary Replica -> JBSAG1

1) Secondary JBSAG1 Received from Primary JBSAG2

(SQLServer:Database Replica -> Log Bytes Received/Sec) refers to how much data we receive in Secondary from Primary replica. In our case we are able to see that maximum of 5+ MB/sec and 500+ KB/sec on an average sent to secondary JBSAG1 from Primary JBSAG2. We will have to sum Log Bytes Received/sec for each database to come to a final value.

This is same as SQLServer:Databases – Log Bytes Flushed/sec in Primary, which is good.

2) Redoing received data on Secondary

(SQLServer:Database Replica->Redone Bytes/Sec) refers to how much data will be redone on secondary. In our case we are able to see that maximum of 5+ MB/sec and 500+ KB/sec on an average was redone on secondary JBSAG1. We will have to sum Redone Bytes/sec for each database to come to a final value.

This is same as Log Bytes Received/sec which is good.

Secondary Replica -> JBSAG3

1) Secondary JBSAG3 Received from Primary JBSAG2

(SQLServer:Database Replica -> Log Bytes Received/Sec) refers to how much data we receive in Secondary from Primary replica. In our case we are able to see that maximum of 1+ MB/sec and 200+ KB/sec on an average sent to secondary JBSAG3 from Primary JBSAG2. We will have to sum Log Bytes Received/sec for each database to come to a final value.

This is much lower than what we sent from primary as marked as SQLServer:Databases – Log Bytes Flushed/sec in Primary. We sent 5 MB/sec, but we received just 1 MB/sec. So there is a lag of 4+ MB/sec which is not good.

2) Redoing received data on Secondary

(SQLServer:Database Replica->Redone Bytes/Sec) refers to how much data will be redone on secondary. In our case we are able to see that maximum of 1+ MB/sec and 200+ KB/sec on an average was redone on secondary JBSAG3. We will have to sum Redone Bytes/sec for each database to come to a final value. We are redoing what we received from the primary. Redone Bytes/Sec is same as Log Bytes Received/Sec.

From the above analysis we understand below,

JBSAG2 -> Primary flushes 5 MB/sec to log file -> Primary sends 5 MB/sec to secondary via network.

JBSAG1 -> Secondary receives 5 MB/sec via network -> Secondary redo is also around 5 MB/sec -> Log send queue is around 120+ KB.

JBSAG3 -> Secondary receives 1 MB/sec via network -> Secondary redo is also around 1 MB/sec -> Log send queue is around 750+ MB.

From the above analysis it is clear that Primary JBSAG2 is able to send 5 MB/sec to JBSAG1. but only 1 MB/sec to JBSAG3 and this results in Log send queue buildup in JBSAG3.

-> It seems like it is a network issue between primary and secondary datacentre.

-> We tried a robocopy from JBSAG2 to JBSAG1 and JBSAG3 using a 10 GB single file.

robocopy c:\temp\ \\jbsag1\temp file.zip
robocopy c:\temp\ \\jbsag3\temp file.zip

-> Robocopy indicated speed of 23 MB/sec for copy between JBSAG2 to JBSAG1.

-> Robocopy exhibited speed of 1 MB/sec for copy between JBSAG2 to JBSAG3.

-> Robocopy tests were performed just to rule out any SQL Server issue and to advise network team that a normal copy operation also performs badly on JBSAG3.

-> I involved network and Platforms team to check this further.

-> Network team advised that the network link seems normal and there is no bandwidth issue. Network team advised Platforms team to check this further as it seems like a guest machine issue.

-> In the meantime, we found a different 3 node Always on setup same as this setup with same primary and secondary datacentre. The other 3 node Always On setup did not have this issue. We found that the Asynchronous secondary server in working setup was on a different host machine in the secondary datacentre than the problematic Asynchronous secondary server where the problem is visible.

-> Platforms team migrated the problematic Asynchronous secondary server to the host machine where the working setup’s Asynchronous secondary server was present.

-> This did not resolve the issue. Platforms team took a netsh trace from JBSAG2 to JBSAG1, JBSAG3.

-> The traces showed that during the three way handshake, the TCP window sizes are negotiated as no scaling factor negotiated. This lead to the suspicion with the TCP global parameters for auto tuning could be a cause of this issue. By default, this setting is enabled.

-> We checked this parameter using below command on servers JBSAG1, JBSAG2 and JBSAG3.

netsh interface tcp show global

-> TCP global parameter for auto tuning was in disabled state on JBSAG1 and JBSAG3.

-> TCP global parameter for auto tuning was enabled along with RSS using below command on JBSAG1 and JBSAG3,

netsh interface tcp set global rss=enabled autotuninglevel=Normal

-> Once these were enabled the transfer speeds increased and the TCP window size was negotiated to the highest.

-> This resolved our issue and the Log send queue size on JBSAG3 came under control.

Thank You,
Vivek Janakiraman

Disclaimer:
The views expressed on this blog are mine alone and do not reflect the views of my company or anyone else. All postings on this blog are provided “AS IS” with no warranties, and confers no rights.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s