Environment

-> Below is the replication topology,

-> I have three (3) publications. Publication JBDBREPL has 3 articles, Publication Jimburepl has 1 article an Publication Jalorepl has 1 article.
-> Distribution database size is too huge for such a simple replication setup. Distribution database is close to 20 GB.
-> Checking replication related object details,

-> MSrepl_Commands and MSrepl_transactions table are too huge. They are close to 55+ Million rows.
-> First thing that comes to my mind is that Distribution cleanup job is either failing or getting blocked. Check Distribution cleanup job history,

-> The job is executing fine every 10 mins. Lets check if cleanup is happening from the job history,

Date 11/8/2020 12:05:00 AM
Log Job History (Distribution clean up: distribution)
Step ID 1
Server JBRESEARCH\IN2019
Job Name Distribution clean up: distribution
Step Name Run agent.
Duration 00:00:28
Sql Severity 0
Sql Message ID 21010
Operator Emailed
Operator Net sent
Operator Paged
Retries Attempted 0
Message
Executed as user: NT SERVICE\SQLAgent$IN2019. Deleted 0 row(s) per millisecond from MSrepl_commands [SQLSTATE 01000] (Message 22121) Deleted 0 row(s) per millisecond from MSrepl_transactions [SQLSTATE 01000] (Message 22121) Deleted 0 row(s) per millisecond from MSrepl_commands [SQLSTATE 01000] (Message 22121) Deleted 0 row(s) per millisecond from MSrepl_transactions [SQLSTATE 01000] (Message 22121) Deleted 0 row(s) per millisecond from MSrepl_commands [SQLSTATE 01000] (Message 22121) Deleted 0 row(s) per millisecond from MSrepl_transactions [SQLSTATE 01000] (Message 22121) Deleted 0 row(s) per millisecond from MSrepl_commands [SQLSTATE 01000] (Message 22121) Deleted 0 row(s) per millisecond from MSrepl_transactions [SQLSTATE 01000] (Message 22121) Removed 0 replicated transactions consisting of 0 statements in 19304 milliseconds (0 rows/millisec). [SQLSTATE 01000] (Message 21010). The step succeeded.
-> Lets execute below query manually,
EXEC dbo.sp_MSdistribution_cleanup @min_distretention = 0, @max_distretention = 72

Deleted 0 row(s) per millisecond from MSrepl_commands
Deleted 0 row(s) per millisecond from MSrepl_transactions
Deleted 0 row(s) per millisecond from MSrepl_commands
Deleted 0 row(s) per millisecond from MSrepl_transactions
Deleted 0 row(s) per millisecond from MSrepl_commands
Deleted 0 row(s) per millisecond from MSrepl_transactions
Deleted 0 row(s) per millisecond from MSrepl_commands
Deleted 0 row(s) per millisecond from MSrepl_transactions
Removed 0 replicated transactions consisting of 0 statements in 17633 milliseconds (0 rows/millisec).
Completion time: 2020-11-08T00:19:44.9257397+05:30
-> It seems like the Distribution cleanup job is completing fine, but no rows are getting cleaned up.
-> Executing below query to check min distrib retention, max distrib retention and history retention,
sp_helpdistributor

-> The setting seems normal. The most common reason for distribution database growing is when immediate_sync is set to true. Lets check what immediate_sync is set to in our case,

-> It is set to 0 for all 3 publications. So it is not the cause.
-> Execute below query which will provide databases which have obsolete publications/subscriptions that are not being cleaned up, MinDate that reveals the date of oldest transaction in Msrepl_Transactions table, MaxDate that reveals the date of latest transaction in Msrepl_Transactions table and Commands refers to total data available.
USE distribution
GO
SELECT
t.publisher_database_id,
MinDate = MIN(entry_Time),
MaxDate = MAX(entry_Time),
Commands = COUNT_BIG(*)
FROM
dbo.msrepl_transactions t
join dbo.msrepl_commands s
ON s.publisher_database_id = t.publisher_database_id
AND s.xact_seqno = t.xact_seqno
WHERE
entry_time < GETDATE()
GROUP BY
t.publisher_database_id
GO
-> This query execution time depends on the number of records present in table Msrepl_Transaction and Msrepl_Commands.

-> From the above data it is clear that we have 55 Million rows and that the oldest transaction is on 05 November 2020 19:55. It seems like cleanup is not happening after 05 November 2020 19:55.
-> Execute below query which I took from this article.
set nocount on
declare @publisher_database_id int,
@min_cutoff_time datetime,
@max_cleanup_xact_seqno varbinary(16),
@retention int,
@rptdtl int
select @retention = 72 --If the retention period has been changed, then update this to that value
select @publisher_database_id = NULL --if you want to look for specific database, change this from NULL to value from MSPublisher_databases table.
select @rptdtl = 2 --if you want more information about agents keeping the xact_seqno to a minimum, change this to 2.
select @min_cutoff_time = dateadd(hour, -@retention, getdate())
declare @min_agent_sub_xact_seqno varbinary(16)
,@max_agent_hist_xact_seqno varbinary(16)
,@active int
,@initiated int
,@agent_id int
,@min_xact_seqno varbinary(16)
--Loop through each database
if @publisher_database_id is null
begin
declare pub_db scroll cursor for
select id from MSpublisher_databases
end
else
begin
declare pub_db scroll cursor for
select id from MSpublisher_databases
where id = @publisher_database_id
end
open pub_db
fetch first from pub_db into @publisher_database_id
while @@FETCH_STATUS = 0
begin
-- set @min_xact_seqno to NULL and reset it with the first prospect of min_seqno we found later
select @min_xact_seqno = NULL
select @active = 2
select @initiated = 3
-- -- cursor through each agent with it's smallest sub xact seqno --
declare #tmpAgentSubSeqno cursor local forward_only for
select a.id, min(s2.subscription_seqno) from MSsubscriptions s2 join MSdistribution_agents a
on (a.id = s2.agent_id)
where s2.status in( @active, @initiated ) and
/* Note must filter out virtual anonymous agents !!!
a.subscriber_id <> @virtual_anonymous and */
-- filter out subscriptions to immediate_sync publications
not exists (select * from MSpublications p
where s2.publication_id = p.publication_id
and p.immediate_sync = 1)
and a.publisher_database_id = @publisher_database_id
group by a.id
open #tmpAgentSubSeqno
fetch #tmpAgentSubSeqno into @agent_id, @min_agent_sub_xact_seqno
if (@@fetch_status = -1) -- rowcount = 0 (no subscriptions)
begin
-- If we have a publication which allows for init from backup with a min_autonosync_lsn set
-- we don't want this proc to signal cleanup of all commands
-- Note that if we filter out immediate_sync publications here as they will already have the
-- desired outcome. The difference is that those with min_autonosync_lsn set have a watermark
-- at which to begin blocking cleanup.
if not exists (select * from dbo.MSpublications msp join MSpublisher_databases mspd
ON mspd.publisher_id = msp.publisher_id
and mspd.publisher_db = msp.publisher_db
where mspd.id = @publisher_database_id
and msp.immediate_sync = 1)
begin
select top(1) @min_xact_seqno = msp.min_autonosync_lsn from dbo.MSpublications msp join MSpublisher_databases mspd
ON mspd.publisher_id = msp.publisher_id
and mspd.publisher_db = msp.publisher_db
where mspd.id = @publisher_database_id
and msp.allow_initialize_from_backup <> 0
and msp.min_autonosync_lsn is not null
and msp.immediate_sync = 0
order by msp.min_autonosync_lsn asc
end
end
while (@@fetch_status <> -1)
begin
--
--always clear the local variable, next query may not return any resultset
--
set @max_agent_hist_xact_seqno = NULL
--
--find last history entry for current agent, if no history then the query below should leave @max_agent_xact_seqno as NULL
--
select top 1 @max_agent_hist_xact_seqno = xact_seqno from MSdistribution_history
where agent_id = @agent_id
order by timestamp desc
if @rptdtl = 2 begin select @publisher_database_id as Pub_DB_ID,@agent_id as AgentID,@max_agent_hist_xact_seqno as Max_Agent_Hist_Xact_Seqno, @min_agent_sub_xact_seqno as Min_Agent_Sub_Xact_Seqno end -- --now find the last xact_seqno this agent has delivered: --if last history was written after initsync, use histry xact_seqno otherwise use initsync xact_seqno -- if isnull(@max_agent_hist_xact_seqno, @min_agent_sub_xact_seqno) <= @min_agent_sub_xact_seqno begin set @max_agent_hist_xact_seqno = @min_agent_sub_xact_seqno end --@min_xact_seqno was set to NULL to start with, the first time we get here, it'll gets set to a non-NULL value --then we graduately move to the smallest hist/sub seqno if ((@min_xact_seqno is null) or (@min_xact_seqno > @max_agent_hist_xact_seqno)) begin set @min_xact_seqno = @max_agent_hist_xact_seqno end fetch #tmpAgentSubSeqno into @agent_id, @min_agent_sub_xact_seqno
end
close #tmpAgentSubSeqno
deallocate #tmpAgentSubSeqno
/*
** Optimized query to get the maximum cleanup xact_seqno
/ /
** If the query below returns nothing, nothing can be deleted.
** Reset @max_cleanup_xact_seqno to 0.
*/
select @max_cleanup_xact_seqno = 0x00
-- Use top 1 to avoid warning message of "Null in aggregate…" which will make
-- sqlserver agent job having failing status
if @min_xact_seqno is NULL and @rptdtl = 2 and @agent_id is NULL
begin
select @publisher_database_id as Pub_DB_ID, 'No immediate sync publications, so is not dependent on Agent History'
end
select top 1 @max_cleanup_xact_seqno = xact_seqno from MSrepl_transactions with (nolock)
where publisher_database_id = @publisher_database_id
and (xact_seqno < @min_xact_seqno
or @min_xact_seqno IS NULL)
and entry_time <= @min_cutoff_time
order by xact_seqno desc
select @publisher_database_id as Pub_DB_ID, @max_cleanup_xact_seqno as Max_Cleanup_Xact_Seqno
fetch next from pub_db into @publisher_database_id
end
close pub_db
deallocate pub_db
-> Below is the output of the above query,

-> Lets use below query to check MSrepl_transactions to reveal the date when transactions were recorded in the Distribution database using Max_Agent_Hist_Xact_Seqno from above output,
SELECT [publisher_database_id], [xact_id],[xact_seqno],CONVERT(char(10), [entry_time],126) [entry_time]
FROM [distribution].[dbo].[MSrepl_transactions]
WHERE xact_seqno =0x000059B4000014480003000000000000 --Agent ID 11
SELECT [publisher_database_id],[xact_id],[xact_seqno],CONVERT(char(10), [entry_time],126) [entry_time]
FROM [distribution].[dbo].[MSrepl_transactions]
WHERE xact_seqno =0x000059B4000014480003000000000000 --Agent ID 14
SELECT [publisher_database_id],[xact_id],[xact_seqno],CONVERT(char(10), [entry_time],126) [entry_time]
FROM [distribution].[dbo].[MSrepl_transactions]
WHERE xact_seqno =0x00000053000071100009000000000000 --Agent ID 19

-> From the output above it is clear that Agent ID 19 which has the entry_time 05 November 2020 has issues.
-> Lets use below query to find the publication associated with Agent ID 19.
select id as Agent_ID, publication as Publication from MSdistribution_agents where id in (19)

-> In my case it is publication JaloRepl. Let check the replication monitor and see if there are any errors with this publication,

-> It seems like the distribution agent was stopped and the stop date is around 05 November 2020 19:55.
-> One way is to check if this publication is required and can be removed if it is not required. In my case, the publication was required.
-> I tried starting it and got below error,

Command attempted:
if @@trancount > 0 rollback tran
(Transaction sequence number: 0x0000005300007120000400000000, Command ID: 1)
Error messages:
The row was not found at the Subscriber when applying the replicated UPDATE command for Table ‘[dbo].[jalo]’ with Primary Key(s): [sno] = 2 (Source: MSSQLServer, Error number: 20598)
Get help: http://help/20598
The row was not found at the Subscriber when applying the replicated UPDATE command for Table ‘[dbo].[jalo]’ with Primary Key(s): [sno] = 2 (Source: MSSQLServer, Error number: 20598)
Get help: http://help/20598
-> It seems like the distribution agent was stopped after this error without troubleshooting further. We found the missing row and created it on the subscriber. Distribution agent started delivering data after this. Distribution cleanup started working. I checked the distribution cleanup job history and was able to confirm that.

-> I tried executing below query manually again,
EXEC dbo.sp_MSdistribution_cleanup @min_distretention = 0, @max_distretention = 72

-> I let the distribution cleanup job run for sometime and below are the MSrepl_transactions and MSrepl_commands stats below,

Thank You,
Vivek Janakiraman
Disclaimer:
The views expressed on this blog are mine alone and do not reflect the views of my company or anyone else. All postings on this blog are provided “AS IS” with no warranties, and confers no rights.