Proactively Managing Transactional Replication Latency with SQL Server

Transactional replication is a critical component of many SQL Server environments, providing high availability, load balancing, and other essential benefits. However, managing replication latency, the delay between an action occurring on the publisher and it being reflected on the subscriber, is vital for ensuring system performance and data integrity. In this blog post, we’ll explore a proactive approach to monitor and alert on replication latency, helping database administrators (DBAs) maintain optimal system health.

The Issue:

Replication latency can sometimes go unnoticed until it impacts the system performance or data accuracy, leading to potential data loss or business disruptions. Traditional monitoring techniques may not provide real-time alerts or may require significant manual intervention, making them less effective for immediate latency identification and resolution.

The Script:

To address this challenge, we introduce a SQL script designed by Vivek Janakiraman from JBSWiki, specifically crafted to monitor transactional replication latency in SQL Server environments. This script efficiently posts tracer tokens to specified publications and measures the time taken for these tokens to move through the replication components, providing a clear picture of any latency present in the system.

/*
Author: Vivek Janakiraman
Company: JBSWiki
Description: This script is used to alert in case there is Transactional replication Log reader or distribution agent latency.
It posts tracer tokens to specified publications and measures the latency to the distributor and subscriber.
*/

-- Switch to the publisher database to insert tracer tokens.
USE [Publisher_Database_Here] -- Use your publisher database name here.
-- Insert tracer tokens into the specified publications.
EXEC sys.sp_posttracertoken @publication = 'Publication_Name' -- Change appropriate Publication that should be monitored.
EXEC sys.sp_posttracertoken @publication = 'Publication_Name1' -- Change appropriate Publication that should be monitored.
-- Wait for 5 minutes to allow the tokens to propagate.
WAITFOR DELAY '00:05:00'

-- Switch to the distribution database to query latency information.
USE distribution
;WITH LatestEntries AS (
-- Select the latest entries for each publication and agent.
SELECT publication_id, agent_id, MAX(publisher_commit) AS MaxDate
FROM MStracer_tokens t
JOIN MStracer_history h ON t.tracer_id = h.parent_tracer_id
GROUP BY publication_id, agent_id
)
-- Select latency information for the latest tokens.
SELECT c.name, t.publication_id, h.agent_id, t.publisher_commit,
ISNULL(DATEDIFF(s,t.publisher_commit,t.distributor_commit), 299) as 'Time To Dist (sec)',
ISNULL(DATEDIFF(s,t.distributor_commit,h.subscriber_commit), 299) as 'Time To Sub (sec)'
INTO #REPL_LATENCY
FROM MStracer_tokens t
JOIN MStracer_history h ON t.tracer_id = h.parent_tracer_id
JOIN distribution.dbo.MSdistribution_agents c ON h.agent_id = c.id
JOIN LatestEntries le ON t.publication_id = le.publication_id AND h.agent_id = le.agent_id AND t.publisher_commit = le.MaxDate
ORDER BY t.publisher_commit DESC

-- Check if there is any latency beyond acceptable limits and select those records.
IF EXISTS (SELECT 1 FROM #REPL_LATENCY WHERE ([Time To Dist (sec)] > 30 OR [Time To Sub (sec)] > 30))
BEGIN
SELECT name, publication_id, agent_id, publisher_commit, [Time To Dist (sec)], [Time To Sub (sec)]
INTO #REPL_LATENCY_Email
FROM #REPL_LATENCY
WHERE ([Time To Dist (sec)] > 30 OR [Time To Sub (sec)] > 30)
END

-- Prepare the HTML body content for the email alert.
DECLARE @body_content NVARCHAR(MAX);
SET @body_content = N'
<style>
table.GeneratedTable {
width: 100%;
background-color: #D3D3D3;
border-collapse: collapse;
border-width: 2px;
border-color: #A9A9A9;
border-style: solid;
color: #000000;
}
table.GeneratedTable td, table.GeneratedTable th {
border-width: 2px;
border-color: #A9A9A9;
border-style: solid;
padding: 3px;
}
table.GeneratedTable thead {
background-color: #A9A9A9;
}
</style>
<table class="GeneratedTable">
<thead>
<tr>
<th>name</th>
<th>publication_id</th>
<th>agent_id</th>
<th>publisher_commit</th>
<th>[Time To Dist (sec)]</th>
<th>[Time To Sub (sec)]</th>
</tr>
</thead>
<tbody>' +
CAST(
(SELECT td = name, '',
td = publication_id, '',
td = agent_id, '',
td = publisher_commit, '',
td = [Time To Dist (sec)], '',
td = [Time To Sub (sec)], ''
FROM [dbo].#REPL_LATENCY_Email
FOR XML PATH('tr'), TYPE
) AS NVARCHAR(MAX)
) +
N'</tbody>
</table>';

-- Send an email alert if there is any latency issue found.
IF EXISTS (SELECT TOP 1 * FROM [dbo].#REPL_LATENCY_Email)
BEGIN
EXEC msdb.dbo.sp_send_dbmail @profile_name = 'JBSWIKI',
@body = @body_content,
@body_format = 'HTML',
@recipients = 'jvivek2k1@yahoo.com',
@subject = 'ALERT: Transactional Replication Latency Alert';
END

-- Cleanup temporary tables.
DROP TABLE #REPL_LATENCY
DROP TABLE #REPL_LATENCY_Email

The Solution:

The script works by first posting tracer tokens to the specified publications within the publisher database. It then waits for a predetermined amount of time (defaulted to 5 minutes in the script) to allow the tokens to propagate through the system. Following this, the script measures the latency to the distributor and subscriber, providing a detailed report of the time taken in each stage of the replication process.

This information is then used to generate an HTML-formatted email alert if the latency exceeds predefined thresholds (30 seconds in the provided script), allowing for immediate action to be taken. The use of HTML formatting in the email ensures that the information is presented in an easily digestible format, facilitating quick understanding and response by the DBA.

Conclusion:

Proactive monitoring and management of transactional replication latency are paramount for maintaining the health and performance of SQL Server environments. The script provided offers a straightforward and effective solution for DBAs to stay ahead of potential replication issues. By automating the process of latency detection and alerting, this approach not only saves valuable time but also helps in preventing the negative impact of replication latency on business operations.

Remember, while this script serves as a valuable tool in your monitoring arsenal, it’s also important to tailor the solution to your specific environment and requirements. Regularly reviewing and adjusting the latency thresholds and monitoring frequency will ensure you continue to get the most out of your replication setup.

For more tutorials and tips on SQL Server, including performance tuning and database management, be sure to check out our JBSWiki YouTube channel.

Thank You,
Vivek Janakiraman

Disclaimer:
The views expressed on this blog are mine alone and do not reflect the views of my company or anyone else. All postings on this blog are provided “AS IS” with no warranties, and confers no rights.

MODIFY FILE failed for database ‘tempdb’, file id 1. Size of file (41435136 KB) is greater than MAXSIZE (26214400 KB). (Microsoft SQL Server, Error: 5040)


Introduction:

In the dynamic realm of Azure SQL Managed Instances, configuring database file sizes is a routine task. However, challenges can arise, as I recently encountered while attempting to modify the Tempdb database file size. This blog post details the steps taken to address a perplexing error message that surfaced during this process.

Problem Statement:
The objective was straightforward: adjusting the maximum file size of the Tempdb database on an Azure SQL Managed Instance to 25 GB.

However, both through the graphical user interface (GUI) and Transact-SQL (T-SQL) commands, an error persisted:

TITLE: Microsoft SQL Server Management Studio
——————————
Alter failed for Database ‘tempdb’.  (Microsoft.SqlServer.Smo)
For help, click: https://go.microsoft.com/fwlink?ProdName=Microsoft+SQL+Server&ProdVer=16.100.47008.0&EvtSrc=Microsoft.SqlServer.Management.Smo.ExceptionTemplates.FailedOperationExceptionText&EvtID=Alter+Database&LinkId=20476
——————————
ADDITIONAL INFORMATION:
An exception occurred while executing a Transact-SQL statement or batch. (Microsoft.SqlServer.ConnectionInfo)
——————————
MODIFY FILE failed for database ‘tempdb’, file id 1. Size of file (41435136 KB) is greater than MAXSIZE (26214400 KB). (Microsoft SQL Server, Error: 5040)
For help, click: https://docs.microsoft.com/sql/relational-databases/errors-events/mssqlserver-5040-database-engine-error
——————————

The error surfaced when attempting the operation through the graphical user interface (GUI), and repeating the process using Transact-SQL (T-SQL) resulted in the identical error.

The error notification suggests that the existing size of the Tempdb database file (tempdev) exceeds the size I intended to establish. Contrary to this indication, the actual size of tempdev is only 16 MB, and my intention was to set the Maximum File size to 25 GB.

The error message indicates that the file size is reported as 41,435,136 KB, equivalent to 39.5 GB. Attempting to establish the Maximum file size of Tempdev below 39.5 GB resulted in an error. However, setting it to 40 GB proved successful.

This particular Azure SQL Managed Instance is of the Business Critical tier with 80 Vcores and includes 4 replicas. To delve deeper into the issue, I connected to the read replica of this managed instance through a query window, utilizing the ApplicationIntent=ReadOnly parameter.


I’ve established a connection to the user database JBSWiki to confirm that I am indeed connected to the Read Replica, as verified by executing the following query:

SELECT DATABASEPROPERTYEX(DB_NAME(), 'Updateability');

Upon execution, you will observe the result as READ_ONLY, affirming your connection to the Read Replica. It’s crucial to note that running the same query under the context of Tempdb will display READ_WRITE. To avoid confusion, ensure the query is executed on a user database to validate your connection to a Read Replica.

-> I executed below query to check the tempdb size on Read replica,

use tempdb
select file_id, type, type_desc, name, physical_name, size, max_size, growth, is_percent_growth from sys.database_files

-> Below is the output,

The size of tempdev on the read replica was configured at 39.5 GB, creating complications when attempting to adjust the Maximum file size on the Azure SQL Managed Instance. To address this, I reduced the size of tempdb on the read replica to 15 GB using DBCC SHRINKFILE. Following this adjustment, I successfully set the Maximum File size of the Tempdb data file to 25 GB.

Summary:
Navigating the nuances of database file sizes in an Azure SQL Managed Instance, particularly when dealing with read replicas, demands a comprehensive approach. Verifying the reported sizes, understanding the intricacies of database contexts, and proactive measures such as shrinking the Tempdb on read replicas are pivotal for resolving challenges like the ‘MODIFY FILE failed’ error. This journey underscores the significance of a meticulous troubleshooting process, ensuring a smooth configuration experience within the Azure SQL environment.

Thank You,
Vivek Janakiraman

Disclaimer:
The views expressed on this blog are mine alone and do not reflect the views of my company or anyone else. All postings on this blog are provided “AS IS” with no warranties, and confers no rights.

Unveiling Query Performance Insights: Leveraging T-SQL with SQL Server Query Store

Introduction

In the dynamic realm of database management, monitoring and optimizing query performance is paramount. SQL Server offers the Query Store feature, a powerful tool that captures and stores detailed information about query execution plans and runtime statistics. In this blog post, we’ll delve into leveraging T-SQL to extract valuable insights from the Query Store, focusing on queries executed within the last 7 days.

Query Store Essentials

Before we dive into T-SQL scripts, let’s briefly revisit the Query Store. It acts as a repository for query-related data, allowing database administrators to analyze historical performance metrics and troubleshoot issues efficiently.

Comprehensive Query Analysis in the Last 7 Days

To get a comprehensive view of queries executed in the last 7 days, we can use the following T-SQL script:

-- T-SQL script to analyze queries executed in the last 7 days

SELECT
p.query_id query_id,
q.object_id object_id,
ISNULL(OBJECT_NAME(q.object_id),'') object_name,
qt.query_sql_text query_sql_text,
ROUND(CONVERT(float, SUM(rs.avg_duration*rs.count_executions))*0.001,2) total_duration,
ROUND(CONVERT(float, SUM(rs.avg_cpu_time*rs.count_executions))*0.001,2) total_cpu_time,
ROUND(CONVERT(float, SUM(rs.avg_logical_io_reads*rs.count_executions))*8,2) total_logical_io_reads,
ROUND(CONVERT(float, SUM(rs.avg_physical_io_reads*rs.count_executions))*8,2) total_physical_io_reads,
ROUND(CONVERT(float, SUM(rs.avg_logical_io_writes*rs.count_executions))*8,2) total_logical_io_writes,
SUM(rs.count_executions) count_executions,
COUNT(distinct p.plan_id) num_plans
FROM sys.query_store_runtime_stats rs
JOIN sys.query_store_plan p ON p.plan_id = rs.plan_id
JOIN sys.query_store_query q ON q.query_id = p.query_id
JOIN sys.query_store_query_text qt ON q.query_text_id = qt.query_text_id
WHERE rs.first_execution_time BETWEEN (getdate()-7) AND (getdate())
GROUP BY p.query_id, qt.query_sql_text, q.object_id
HAVING COUNT(distinct p.plan_id) >= 1
ORDER BY count_executions DESC

This script provides a detailed breakdown of query performance, including execution counts, total duration, CPU time, and more, for queries executed in the last 7 days.

Specific Query Search Example

Suppose you want to focus on queries containing a specific keyword within the last 7 days. The following T-SQL script demonstrates how to achieve this:

-- T-SQL script to search for a specific query in the last 7 days

SELECT
p.query_id query_id,
q.object_id object_id,
ISNULL(OBJECT_NAME(q.object_id),'') object_name,
qt.query_sql_text query_sql_text,
ROUND(CONVERT(float, SUM(rs.avg_duration*rs.count_executions))*0.001,2) total_duration,
ROUND(CONVERT(float, SUM(rs.avg_cpu_time*rs.count_executions))*0.001,2) total_cpu_time,
ROUND(CONVERT(float, SUM(rs.avg_logical_io_reads*rs.count_executions))*8,2) total_logical_io_reads,
ROUND(CONVERT(float, SUM(rs.avg_physical_io_reads*rs.count_executions))*8,2) total_physical_io_reads,
ROUND(CONVERT(float, SUM(rs.avg_logical_io_writes*rs.count_executions))*8,2) total_logical_io_writes,
SUM(rs.count_executions) count_executions,
COUNT(distinct p.plan_id) num_plans
FROM sys.query_store_runtime_stats rs
JOIN sys.query_store_plan p ON p.plan_id = rs.plan_id
JOIN sys.query_store_query q ON q.query_id = p.query_id
JOIN sys.query_store_query_text qt ON q.query_text_id = qt.query_text_id
WHERE rs.first_execution_time BETWEEN (getdate()-7) AND (getdate())
and qt.query_sql_text LIKE 'DELETE FROM%'
GROUP BY p.query_id, qt.query_sql_text, q.object_id
HAVING COUNT(distinct p.plan_id) >= 1
ORDER BY count_executions DESC

This example narrows down the search to queries with the keyword “DELETE FROM” executed in the last 7 days.

Conclusion

In conclusion, harnessing the power of T-SQL with the SQL Server Query Store provides a robust way to analyze and optimize query performance. The provided scripts offer a starting point for gaining valuable insights into the queries executed in the last 7 days, enabling proactive database management and performance tuning.

Stay tuned for more in-depth explorations of SQL Server features and best practices!

Thank You,
Vivek Janakiraman

Disclaimer:
The views expressed on this blog are mine alone and do not reflect the views of my company or anyone else. All postings on this blog are provided “AS IS” with no warranties, and confers no rights.