Understanding and Resolving Replication Error 20598 in SQL Server

Introduction

In the realm of SQL Server replication, encountering errors can be a daunting aspect of database management. One such error that frequently perplexes database administrators is the replication error 20598. This error typically arises during transactional replication when an UPDATE command is applied to a subscriber that cannot find the corresponding row. This blog post aims to demystify this error, exploring its causes and providing a step-by-step resolution to help you maintain the integrity and performance of your SQL Server replication environment.

Issue

The error message in question is as follows:

Command attempted:

if @@trancount > 0 rollback tran
(Transaction sequence number: 0x00TAd26800029780003600000000, Command ID: 2)
Error messages:
The row was not found at the Subscriber when applying the replicated UPDATE command for Table ‘[dbo].[Table1]’ with Primary Key(s): [Col1] = 45852653, [Col2] = FT_GT_DF (Source: MSSQLServer, Error number: 20598)
Get help: http://help/20598
The row was not found at the Subscriber when applying the replicated UPDATE command for Table ‘[dbo].[Table1]’ with Primary Key(s): [Col1] = 45852653, [Col2] = FT_GT_DF (Source: MSSQLServer, Error number: 20598)
Get help: http://help/20598

This error surfaces during the replication process, specifically when an UPDATE operation is being replicated from the publisher to the subscriber, and the system cannot locate the target row on the subscriber side to apply the update.

Understanding the Error

Error 20598 is a symptom of a larger issue where data synchronization between the publisher and subscriber has been compromised. This discrepancy can occur for several reasons:

  1. Manual Changes: Direct modifications made to the subscriber that bypass replication can lead to inconsistencies.
  2. Failed Replication Commands: Previous commands that failed to replicate successfully can result in missing data on the subscriber.
  3. Initial Snapshot Issues: Problems with the initial snapshot or schema changes that were not correctly replicated can cause missing rows.
  4. Data Cleanup Scripts: Scripts that unintentionally remove data from the subscriber can also contribute to this error.

Identifying the root cause is crucial for applying the correct resolution strategy.

Resolution

Resolving error 20598 involves ensuring that the data between the publisher and subscriber is synchronized. Here are the steps to troubleshoot and fix the issue:

  1. Validate Subscriptions: Use the Replication Monitor or execute the sp_validatemergepublication stored procedure to check for data inconsistencies between the publisher and the subscriber.
  2. Identify the Missing Row: Confirm that the row indicated in the error message is indeed missing from the subscriber. This verification helps in understanding whether it’s a singular issue or part of a larger synchronization problem.
  3. Reinitialize the Subscription: If the data discrepancy is extensive or the missing row is part of a larger issue, consider reinitializing the subscription. This process involves generating a new snapshot of the publication and reapplying it to the subscriber. Be cautious, as this can be resource-intensive for large datasets.
  4. Manually Apply Missing Data: For isolated incidents where only a few rows are affected, manually inserting or updating the missing data on the subscriber can be a quick fix. Ensure the replication agent jobs are stopped while making manual changes to avoid conflicts.
  5. Investigate and Address the Root Cause: After resolving the immediate error, investigate the root cause of the data inconsistency. This may involve reviewing data modification practices, ensuring all changes go through replication, and setting up alerts for failed replication commands.

To address this challenge comprehensively, including identifying the cause of missing rows on the subscriber, implementing a monitoring strategy is paramount. We will employ a trigger on the publisher’s table to monitor write operations. This approach aids in determining whether rows are being deleted or otherwise modified in a manner that could lead to replication errors. Here’s how to set up the monitoring:

Monitoring Setup

First, create a tracking table to log changes:

CREATE TABLE table1_Tracking
(
TrackingID INT IDENTITY(1,1) PRIMARY KEY,
OperationType VARCHAR(10),
ChangedData NVARCHAR(MAX),
PreviousData NVARCHAR(MAX),
ExecutedBy SYSNAME,
ApplicationName NVARCHAR(128),
HostName NVARCHAR(128),
ChangeTime DATETIME DEFAULT GETDATE()
);

Next, establish a trigger to capture insert, update, and delete operations:
CREATE TRIGGER trg_table1_WriteOperations
ON table1
AFTER INSERT, UPDATE, DELETE
NOT FOR REPLICATION
AS
BEGIN
SET NOCOUNT ON;

DECLARE @ExecutedBy SYSNAME = SUSER_SNAME();
DECLARE @ApplicationName NVARCHAR(128) = APP_NAME();
DECLARE @HostName NVARCHAR(128) = HOST_NAME();

IF EXISTS (SELECT * FROM inserted)
BEGIN
IF EXISTS (SELECT * FROM deleted)
BEGIN
-- Handle UPDATE
INSERT INTO table1_Tracking(OperationType, ChangedData, PreviousData, ExecutedBy, ApplicationName, HostName)
SELECT 'UPDATE',
(SELECT * FROM inserted FOR JSON AUTO),
(SELECT * FROM deleted FOR JSON AUTO),
@ExecutedBy, @ApplicationName, @HostName;
END
ELSE
BEGIN
-- Handle INSERT
INSERT INTO table1_Tracking(OperationType, ChangedData, ExecutedBy, ApplicationName, HostName)
SELECT 'INSERT',
(SELECT * FROM inserted FOR JSON AUTO),
@ExecutedBy, @ApplicationName, @HostName;
END
END
ELSE
BEGIN
-- Handle DELETE
INSERT INTO table1_Tracking(OperationType, PreviousData, ExecutedBy, ApplicationName, HostName)
SELECT 'DELETE',
(SELECT * FROM deleted FOR JSON AUTO),
@ExecutedBy, @ApplicationName, @HostName;
END
END;

This trigger captures the nature of the operation (insert, update, delete), the data before and after the change, who executed the operation, from which application, and from which host. This comprehensive tracking enables administrators to analyze patterns that could lead to replication errors, such as unauthorized deletions or updates that bypass replication mechanisms.

Conclusion

Error 20598 in SQL Server replication highlights the challenges of maintaining data consistency across distributed databases. By understanding the underlying causes and meticulously applying the appropriate resolution steps, database administrators can ensure the integrity and reliability of their replication setup. Remember, the key to effectively managing replication errors lies in proactive monitoring, thorough investigation, and understanding the intricacies of SQL Server’s replication mechanisms.

Thank You,
Vivek Janakiraman

Disclaimer:
The views expressed on this blog are mine alone and do not reflect the views of my company or anyone else. All postings on this blog are provided “AS IS” with no warranties, and confers no rights.

Configure Always On Availability Group with Listener – Part6

Prepare Database Server JBSAG1, JBSAG2 and JBSAG3

-> Make sure below ports are opened for JBSAG1, JBSAG2 and JBSAG3.

5022 (Always On Endpoint)

1433 (SQL Server PortListener Port)

58888 (Backend Pool Port)

Configure Cluster

-> Login to Database Server JBSAG1, JBSAG2 and JBSAG3 and execute below Powershell command. Refer this article for more details on Powershell command related to cluster configuration,

Install-WindowsFeature -Name "Failover-Clustering" -IncludeManagementTools

-> Once installed on both JBSAG1, JBSAG2 and JBSAG3. Restart the database servers.

-> Execute below powershell command that runs all cluster validation tests on Database Server JBASAG1, JBSAG2 and JBSAG3.

Test-Cluster –Node JBSAG1,JBSAG2,JBSAG3

-> View the report and make sure all looks fine.

-> Below hostname and IP Address will be utilized for Cluster and Always On Listener Configuration,

East US

West US

-> Execute below Powershell code to create a cluster,

New-Cluster -Name JBSWikiClust -Node JBSAG1,JBSAG2,JBSAG3 -StaticAddress 172.20.1.19,192.168.1.19 -NoStorage

Configure Always On Availability Group

-> Make sure SQL services are started with a proper Service account. In my case I have used account jbswiki\jbswiki as the SQL Service account.

-> Make sure [NT AUTHORITYSYSTEM] has appropriate access to SQL Server Instances.

-> Enable Always On feature for JBSAG1, JBSAG2 and JBSAG3 from Configuration Manager and restart SQL Server,

-> Configure Always On,

Configure Always On Availability Group Listener

-> Below hostname and IP Address will be utilized for Cluster and Always On Listener Configuration,

East US

West US

-> Open Cluster Administrator, Start -> Run -> cluadmin.msc on Primary Replica JBSAG1,

-> Right click “IP Address 192.168.1.0” and click Properties,

-> Once created, Stop the Role as below,

-> Right click AG resource and click properties,

-> Click dependencies and add the IP Address resource as a dependency,

-> Execute below powershell command on JBSAG1, JBSAG2 and JBSAG3 to set the cluster parameters,

Import-Module FailoverClusters
# Availability Group Name
$AGName = "JBSAG"
$ClusterNetworkName = (Get-ClusterNetwork).Name
$IPResourceName = (Get-ClusterResource | Where-Object { $_.ResourceType -eq "IP Address" -and $_.OwnerGroup -eq $AGName}).Name
$ListenerILBIP = (Get-ClusterResource $IPResourceName | Get-ClusterParameter | Where-Object {$_.Name -eq "Address"}).Value
#Load Balancer Listener endpoint port
$ListenerProbePort = 58888
Get-ClusterResource $IPResourceName | Set-ClusterParameter -Multiple @{"Address"="$ListenerILBIP";"ProbePort"=$ListenerProbePort;"SubnetMask"="255.255.255.255";"Network"="$ClusterNetworkName";"EnableDhcp"=0}
Get-ClusterResource $IPResourceName

-> Start the AG role,

-> Connect to the Primary replica using SQL Server Management Studio. Open the properties of Listener JBSAPP and enter the port number as appropriate,

-> Failover and failback Availability group from SQL Server Management Studio and check if Listener connection can be made.

More articles on this Series,

-> Always On Availability Group Configuration on Azure IaaS Virtual Machine utilizing Multi Subnet
-> Always On Availability Group Environment on Azure IaaS Virtual Machine Introduction – Part1
-> Configure appropriate Virtual Network and Subnet for Azure IaaS Virtual Machine – Part2
-> Deploy required Azure IaaS Virtual Machine – Part3
-> Configure Domain Controller and Windows Cluster on appropriate Azure IaaS Virtual Machine – Part4
-> Configure Azure Load Balancer – Part5

Thank You,
Vivek Janakiraman

Disclaimer:
The views expressed on this blog are mine alone and do not reflect the views of my company or anyone else. All postings on this blog are provided “AS IS” with no warranties, and confers no rights.

Configure appropriate Virtual Network and Subnet for Azure IaaS Virtual Machine – Part2

Provision Appropriate Virtual Network and associated Subnet

Virtual Network and Subnet – East US

-> Search for “Virtual Network” in Azure portal and provision it as per below guidance,

-> Click on ” Add subnet” on “IP Address” Tab and fill it up with appropriate details as shown in the screenshot,

-> I will go with the default values for “Security” and “Tags” tab.

-> Click Create and wait for the Virtual Network creation.

Virtual Network and Subnet – West US

-> Search for “Virtual Network” in Azure portal and provision it as per below guidance,

-> I will go with the default values for “Security” and “Tags” tab.

Virtual Network jbswiki_VNET_eastus and jbswiki_VNET_westus Peering

-> Open Virtual Network jbswiki_VNET_eastus and click on “Peerings”.

More articles on this Series,

-> Always On Availability Group Configuration on Azure IaaS Virtual Machine utilizing Multi Subnet
-> Always On Availability Group Environment on Azure IaaS Virtual Machine Introduction – Part1
-> Deploy required Azure IaaS Virtual Machine – Part3
-> Configure Domain Controller and Windows Cluster on appropriate Azure IaaS Virtual Machine – Part4
-> Configure Azure Load Balancer – Part5
-> Configure Always On Availability Group with Listener – Part6

Thank You,
Vivek Janakiraman

Disclaimer:
The views expressed on this blog are mine alone and do not reflect the views of my company or anyone else. All postings on this blog are provided “AS IS” with no warranties, and confers no rights.