Sunday, March 11, 2012

Deadlock between Distribution Agent and Distribution Agent Cleanup

This is occurring regularly on SQL Server 2000 build 878.
The problem is a deadlock in the Distribution database. The Distribution
Agent spid is executing the SELECT statement below:
select @.max_xact_seqno = max(xact_seqno) from MSrepl_commands (READPAST)
where
publisher_database_id = @.publisher_database_id and
command_id = 1 and
type <> -2147483611
which is found in sp_MSget_repl_commands. It holds an Intent Shared page
lock on a data page in the MSrepl_commands table.
The Distribution Agent Cleanup spid is found to be running the command below:
DELETE MSrepl_commands WITH (PAGLOCK) where
publisher_database_id = @.publisher_database_id and
xact_seqno <= @.max_xact_seqno
located in the stored procedure sp_MSdelete_publisherdb_trans. This spid
holds an exclusive page lock on another data page in MSrepl_commands.
Both spids then attempt to obtain the same lock type on the page which is
locked by the other.
The Distribution Agent runs continuously and the Cleanup job is scheduled
for every 10 minutes. The Publication, Distribution and Subscription
databases are all on the same instance (3rd party vendor solution, not mine!)
in an active/active Win2003 cluster configuration. The articles are all
stored procedure executions.
Has anybody else seen this deadlock? Is it just a timing issue? Why is the
PAGLOCK hint used in sp_MSdelete_publisherdb_trans as above?
(I can't find any articles which correlate exactly to this problem)
Kind Regards
Andrew Pike
SQL Server DBA
Accenture UK
Do you have anonymous subscribers or named. With named subscribers the
distribution clean up agent cleans up more aggressively and you may see
problems like this when a subscriber has been offline for some time.
First off issue a select * from distribution.dbo.MSdistribution_status to
see how many undelivered vs delivered commands there are. If there are a
high number of delivered commands, I would stop the SQL Server Agent and run
the distribution clean up agent manually.
I can't comment on why the decision was made to implement the two types of
locks, but in general MS has done a lot of research to deliver optimal
performance. For example the 27 in sp_MSadd_repl_commands27 comes from
tests that they did to find the optimal number of commands to send to the
distribution database in a batch from the log reader agent. And yes, they
tested a range of commands to find which offered best performance.
It looks like the readpast is to prevent locking, and the page lock is to
prevent a table lock.
Hilary Cotter
Looking for a SQL Server replication book?
http://www.nwsu.com/0974973602.html
Looking for a FAQ on Indexing Services/SQL FTS
http://www.indexserverfaq.com
"Andrew Pike" <AndrewPike@.discussions.microsoft.com> wrote in message
news:CFB322D7-1D17-4064-AB36-6338A49C90A4@.microsoft.com...
> This is occurring regularly on SQL Server 2000 build 878.
> The problem is a deadlock in the Distribution database. The Distribution
> Agent spid is executing the SELECT statement below:
> select @.max_xact_seqno = max(xact_seqno) from MSrepl_commands (READPAST)
> where
> publisher_database_id = @.publisher_database_id and
> command_id = 1 and
> type <> -2147483611
> which is found in sp_MSget_repl_commands. It holds an Intent Shared page
> lock on a data page in the MSrepl_commands table.
> The Distribution Agent Cleanup spid is found to be running the command
> below:
> DELETE MSrepl_commands WITH (PAGLOCK) where
> publisher_database_id = @.publisher_database_id and
> xact_seqno <= @.max_xact_seqno
> located in the stored procedure sp_MSdelete_publisherdb_trans. This spid
> holds an exclusive page lock on another data page in MSrepl_commands.
> Both spids then attempt to obtain the same lock type on the page which is
> locked by the other.
> The Distribution Agent runs continuously and the Cleanup job is scheduled
> for every 10 minutes. The Publication, Distribution and Subscription
> databases are all on the same instance (3rd party vendor solution, not
> mine!)
> in an active/active Win2003 cluster configuration. The articles are all
> stored procedure executions.
> Has anybody else seen this deadlock? Is it just a timing issue? Why is
> the
> PAGLOCK hint used in sp_MSdelete_publisherdb_trans as above?
> (I can't find any articles which correlate exactly to this problem)
> Kind Regards
> Andrew Pike
> --
> SQL Server DBA
> Accenture UK
>

No comments:

Post a Comment