Replicated transactions are waiting for next Log backup or for mirroring partner to catch up

Environment

-> We had an environment which is same as above environment. Remote distributor, Publisher and subscriber servers were configured in Always ON availability group.

-> I worked on a disaster recovery drill. As part of the DR, All database server on primary datacentre were brought down. Below is the representation after the primary datacentre was brought down.

-> I failed over all database servers from Primary Datacentre to Secondary Datacentre.

-> Failover completed fine. Below is the view in SQL Server management studio,

-> I checked the Replication monitor and found below message in Log reader agent,

Replicated transactions are waiting for next Log backup or for mirroring partner to catch up.
Initializing
Validating publisher
Starting agent.

-> Message “Replicated transactions are waiting for next Log backup or for mirroring partner to catch up.” is as a result of Always ON availability secondary replica being down.

-> With the current setup transactional replication log reader agent will not be able to to move forward as the asynchronous secondary replicas have not acknowledged the reception of a change.

-> One option to solve this issue is to wait for the Always ON secondary replica to come online and that should fix the issue.

-> But in my case Always ON secondary replica will be online after 12 hours. So waiting for the Always ON secondary replica to come online is not an option.

-> Second option will be enable trace flag 1448. Please check article1 and article2 for more details on this trace flag.

-> Trace flag 1448 enables the replication log reader to move forward even if the asynchronous secondary replicas have not acknowledged the reception of a change. Even with this trace flag enabled, the log reader always waits for the synchronous secondary replicas. The log reader will not go beyond the min ack of the synchronous secondary replicas. This trace flag applies to the instance of SQL Server, not just to an availability group, an availability database, or a log reader instance. This trace flag takes effect immediately without a restart. It can be activated ahead of time or when an asynchronous secondary replica fails.

-> I enabled trace flag 1448 on the Publisher server JBPRIMARY-DR using below query,

dbcc traceon(1448,-1)

-> Once the trace flag 1448 is enabled. Log reader agent completed fine.

-> Please note that this trace flag can be enabled without a SQL Server restart.

Thank You,
Vivek Janakiraman

Disclaimer:
The views expressed on this blog are mine alone and do not reflect the views of my company or anyone else. All postings on this blog are provided “AS IS” with no warranties, and confers no rights.

Agent message code 21021. The subscription does not exist.

Environment

-> I worked on a disaster recovery drill. As part of the DR, I failed over all database servers from Primary Datacentre to Secondary Datacentre.

-> We had an environment which is same as above environment. Remote distributor, Publisher and subscriber servers were configured in Always ON availability group.

-> After the failover, distribution agent failed with below error,

Error messages:
Agent 'JBPUB-PRIMARY-JBREPL-TranProducts-JBREPLSUB-1' is retrying after an error. 8 retries attempted. See agent job history in the Jobs folder for more details.

-> I then checked the distribution agent job in SQL Server Agent and found below error,

Date		9/4/2020 11:41:07 AM
Log		Job History (JBPUB-PRIMARY-JBREPL-TranProducts-JBREPLSUB-1)

Step ID		2
Server		JBDIST-DR
Job Name	JBPUB-PRIMARY-JBREPL-TranProducts-JBREPLSUB-1
Step Name		Run agent.
Duration		00:11:51
Sql Severity	0
Sql Message ID	0
Operator Emailed	
Operator Net sent	
Operator Paged	
Retries Attempted	0

Message
2020-09-04 11:52:54.399 Copyright (c) 2016 Microsoft Corporation
2020-09-04 11:52:54.399 Microsoft SQL Server Replication Agent: distrib
2020-09-04 11:52:54.399 
2020-09-04 11:52:54.399 The timestamps prepended to the output lines are expressed in terms of UTC time.
2020-09-04 11:52:54.399 User-specified agent parameter values:
			-Subscriber JBREPLSUB
			-SubscriberDB JBREPL_SUB
			-Publisher JBPUB-PRIMARY
			-Distributor JBDistributor
			-DistributorSecurityMode 1
			-Publication TranProducts
			-PublisherDB JBREPL
			-Continuous
			-XJOBID 0x3B3FD20E052A694491BD0660C526FA21
			-XJOBNAME JBPUB-PRIMARY-JBREPL-TranProducts-JBREPLSUB-1
			-XSTEPID 2
			-XSUBSYSTEM Distribution
			-XSERVER JBDIST-DR
			-XCMDLINE 0
			-XCancelEventHandle 0000000000001BA8
			-XParentProcessHandle 0000000000000358
2020-09-04 11:52:54.399 Startup Delay: 4172 (msecs)
2020-09-04 11:52:58.581 Connecting to Distributor 'JBDistributor'
2020-09-04 11:52:58.612 Parameter values obtained from agent profile:
			-bcpbatchsize 2147473647
			-commitbatchsize 100
			-commitbatchthreshold 1000
			-historyverboselevel 1
			-keepalivemessageinterval 300
			-logintimeout 15
			-maxbcpthreads 1
			-maxdeliveredtransactions 0
			-pollinginterval 5000
			-querytimeout 1800
			-skiperrors 
			-transactionsperhistory 100
2020-09-04 11:52:58.612 Connecting to Subscriber 'JBREPLSUB'
2020-09-04 11:52:58.659 Initializing
2020-09-04 11:52:58.706 Agent message code 21021. The subscription does not exist.

-> From the replication monitor, Right Click Distribution agent and click on “Agent Profiles”,

-> It seems like there is a custom distribution agent profile created and used in Database Server in Primary Datacentre (JBDIST-PRIMARY) that is not available in Database Server at Secondary Datacentre (JBDIST-DR).

-> Now that we failed over from Primary Datacentre to Secondary Datacentre, this missing “Agent Profile” is causing issue.

-> Execute below query on distribution database in Database Server JBDIST-PRIMARY which is currently a secondary replica after the failover and check what agent profile was used by the distribution agent previously,

select a.id, a.name, a.publisher_database_id,a.publisher_db, a.subscriber_db,
b.profile_id , b.profile_name from MSdistribution_agents a 
INNER JOIN msdb..MSagent_profiles b on a.profile_id = b.profile_id

-> In my case the the “Agent profile” missing is JBS-Load.

-> Now I will create this missing distribution agent profile in replication monitor. Right click Distribution agent and click on “Agent profile” and Click “New”.

-> I could find what were the parameters used as part of Agent profile “JBS-Load” and I created it as below,

-> Stop and start the distribution agent and it should work fine.

-> You can select “Default agent profile” if you are not aware of the custom profile created. But please understand it may not have the setting same as custom profile, which means there are possibilities of issues.

-> It is always a best practice to compare the agent profiles in replication monitor as part of DR planning if you dont want to run into issues like this.

Thank You,
Vivek Janakiraman

Disclaimer:
The views expressed on this blog are mine alone and do not reflect the views of my company or anyone else. All postings on this blog are provided “AS IS” with no warranties, and confers no rights.

The process could not connect to Subscriber

> Transactional Replication is configured between 2 managed instances.

-> More details on how to configure transactional replication between 2 managed instance can be understood from this article.

-> Distribution agent fails with below error,

The process could not connect to Subscriber 'JBSWIKI-MI-DR.37B2CA9FEE0A.DATABASE.WINDOWS.NET'.

Disconnected from Azure Storage '\\jbmirepldata.file.core.windows.net\jbmirepldata' with OS result code: 2250.

Starting agent.
Error messages:

The process could not connect to Subscriber 'JBSWIKI-MI-DR.37B2CA9FEE0A.DATABASE.WINDOWS.NET'. (Source: MSSQL_REPL, Error number: MSSQL_REPL20084)
Get help: http://help/MSSQL_REPL20084

Windows logins are not supported in this version of SQL Server. (Source: MSSQLServer, Error number: 40607)
Get help: http://help/40607
Date		8/13/2020 7:58:29 PM
Log		Job History (jbmi-publisher.37b2ca-JBREPL_Sub-TranProducts-JBSWIKI-MI-DR.37B2CA9-3)

Step ID		2
Server		JBMI-PUBLISHER.37B2CA9FEE0A.DATABASE.WINDOWS.NET
Job Name		jbmi-publisher.37b2ca-JBREPL_Sub-TranProducts-JBSWIKI-MI-DR.37B2CA9-3
Step Name		Run agent.
Duration		00:10:52
Sql Severity	0
Sql Message ID	0
Operator Emailed	
Operator Net sent	
Operator Paged	
Retries Attempted	0

Message
2020-08-13 14:39:13.505 Copyright (c) 2016 Microsoft Corporation
2020-08-13 14:39:13.505 Microsoft SQL Server Replication Agent: distrib
2020-08-13 14:39:13.505 
2020-08-13 14:39:13.505 The timestamps prepended to the output lines are expressed in terms of UTC time.
2020-08-13 14:39:13.505 User-specified agent parameter values:
			-Subscriber JBSWIKI-MI-DR.37B2CA9FEE0A.DATABASE.WINDOWS.NET
			-SubscriberDB JBREPL_Sub
			-Publisher jbmi-publisher.37b2ca9fee0a.database.windows.net
			-Distributor jbmi-publisher.37b2ca9fee0a.database.windows.net
			-DistributorSecurityMode 1
			-Publication TranProducts
			-PublisherDB JBREPL_Sub
			-Continuous
			-XJOBID 0x1F311561B8AC2A498D5487323D49323D
			-XJOBNAME jbmi-publisher.37b2ca-JBREPL_Sub-TranProducts-JBSWIKI-MI-DR.37B2CA9-3
			-XSTEPID 2
			-XSUBSYSTEM Distribution
			-XSERVER jbmi-publisher.37b2ca9fee0a.database.windows.net
			-XCMDLINE 1
			-XDB
			-XHOST np:\\.\pipe\DB4C.0-A9190D899828\sql\query
			-XCancelEventHandle 0000000000001B8C
			-XParentProcessHandle 0000000000001B24
2020-08-13 14:39:13.505 Startup Delay: 5810 (msecs)
2020-08-13 14:39:19.354 Connecting to Distributor 'jbmi-publisher.37b2ca9fee0a.database.windows.net'
2020-08-13 14:39:19.447 Obtained Azure Storage Connection String for jbmirepldata
2020-08-13 14:39:19.447 Connecting to Azure Files Storage '\\jbmirepldata.file.core.windows.net\jbmirepldata'
2020-08-13 14:39:21.872 Parameter values obtained from agent profile:
			-bcpbatchsize 0
			-commitbatchsize 100
			-commitbatchthreshold 1000
			-historyverboselevel 1
			-keepalivemessageinterval 300
			-logintimeout 15
			-maxbcpthreads 1
			-maxdeliveredtransactions 0
			-pollinginterval 5000
			-querytimeout 1800
			-skiperrors 
			-transactionsperhistory 100
2020-08-13 14:39:21.872 Connecting to Subscriber 'JBSWIKI-MI-DR.37B2CA9FEE0A.DATABASE.WINDOWS.NET'
2020-08-13 14:39:21.934 Disconnected from Azure Storage '\\jbmirepldata.file.core.windows.net\jbmirepldata' with OS result code: 0.
2020-08-13 14:39:21.950 Agent message code 20084. The process could not connect to Subscriber 'JBSWIKI-MI-DR.37B2CA9FEE0A.DATABASE.WINDOWS.NET'.
2020-08-13 14:39:21.981 Category:NULL
Source:  Microsoft OLE DB Driver for SQL Server
Number:  40607
Message: Windows logins are not supported in this version of SQL Server.

-> Azure SQL Managed Instance is connected using a SQL login. But the error states that windows logins are not supported. It seems like we have not keyed in the SQL logins properly while configuring the replication.

-> Open subscription properties by performing below actions,

-> Start the distribution agent and it will complete fine this time.

Thank You,
Vivek Janakiraman

Disclaimer:
The views expressed on this blog are mine alone and do not reflect the views of my company or anyone else. All postings on this blog are provided “AS IS” with no warranties, and confers no rights.