A transport-level error has occurred when receiving results from the server

-> Application logs shows below error everyday intermittently between 12:00 AM to 12:10 AM,

Timestamp: 5/17/2020 10:05:58 PM
Message: HandlingInstanceID: c190c221-3e43-4702-83fa-533122cb3c81
An exception of type ‘System.Data.SqlClient.SqlException’ occurred and was caught.
———————————————————————————-
05/18/2020 00:05:58
Type : System.Data.SqlClient.SqlException, System.Data, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089
Message : A transport-level error has occurred when receiving results from the server. (provider: TCP Provider, error: 0 – The semaphore timeout period has expired.)
Source : .Net SqlClient Data Provider
Help link :
Errors : System.Data.SqlClient.SqlErrorCollection
ClientConnectionId : c43b433e-4649-48f5-b0bf-05dca3d83432
Class : 20
LineNumber : 0
Number : 121
Procedure :
Server : JBDB01
State : 0
ErrorCode : -2146232060
Data : System.Collections.ListDictionaryInternal
TargetSite : Void OnError(System.Data.SqlClient.SqlException, Boolean, System.Action`1[System.Action])
HResult : -2146232060
Stack Trace : at System.Data.SqlClient.SqlConnection.OnError(SqlException exception, Boolean breakConnection, Action`1 wrapCloseInAction)
at System.Data.SqlClient.TdsParser.ThrowExceptionAndWarning(TdsParserStateObject stateObj, Boolean callerHasConnectionLock, Boolean asyncClose)
at System.Data.SqlClient.TdsParserStateObject.ReadSniError(TdsParserStateObject stateObj, UInt32 error)
at System.Data.SqlClient.TdsParserStateObject.ReadSniSyncOverAsync()
at System.Data.SqlClient.TdsParserStateObject.TryReadNetworkPacket()
at System.Data.SqlClient.TdsParserStateObject.TryPrepareBuffer()
at System.Data.SqlClient.TdsParserStateObject.TryReadByte(Byte& value)
at System.Data.SqlClient.TdsParser.TryRun(RunBehavior runBehavior, SqlCommand cmdHandler, SqlDataReader dataStream, BulkCopySimpleResultSet bulkCopyHandler, TdsParserStateObject stateObj, Boolean& dataReady)
at System.Data.SqlClient.SqlDataReader.TryConsumeMetaData()
at System.Data.SqlClient.SqlDataReader.get_MetaData()
at System.Data.SqlClient.SqlCommand.FinishExecuteReader(SqlDataReader ds, RunBehavior runBehavior, String resetOptionsString, Boolean isInternal, Boolean forDescribeParameterEncryption, Boolean shouldCacheForAlwaysEncrypted)
at System.Data.SqlClient.SqlCommand.RunExecuteReaderTds(CommandBehavior cmdBehavior, RunBehavior runBehavior, Boolean returnStream, Boolean async, Int32 timeout, Task& task, Boolean asyncWrite, Boolean inRetry, SqlDataReader ds, Boolean describeParameterEncryptionRequest)
at System.Data.SqlClient.SqlCommand.RunExecuteReader(CommandBehavior cmdBehavior, RunBehavior runBehavior, Boolean returnStream, String method, TaskCompletionSource`1 completion, Int32 timeout, Task& task, Boolean& usedCache, Boolean asyncWrite, Boolean inRetry)
at System.Data.SqlClient.SqlCommand.RunExecuteReader(CommandBehavior cmdBehavior, RunBehavior runBehavior, Boolean returnStream, String method)
at System.Data.SqlClient.SqlCommand.ExecuteReader(CommandBehavior behavior, String method)
at System.Data.Common.DbDataAdapter.FillInternal(DataSet dataset, DataTable[] datatables, Int32 startRecord, Int32 maxRecords, String srcTable, IDbCommand command, CommandBehavior behavior)
at System.Data.Common.DbDataAdapter.Fill(DataSet dataSet, Int32 startRecord, Int32 maxRecords, String srcTable, IDbCommand command, CommandBehavior behavior)
at System.Data.Common.DbDataAdapter.Fill(DataSet dataSet)
at Microsoft.Practices.EnterpriseLibrary.Data.Database.DoLoadDataSet(IDbCommand command, DataSet dataSet, String[] tableNames)
at Microsoft.Practices.EnterpriseLibrary.Data.Database.LoadDataSet(DbCommand command, DataSet dataSet, String[] tableNames)
at Data.Export.GetNextCreatedExport()
at Business.Export.GetNextCreatedExport()
at Facade.Export.GetNextCreatedExport()
Additional Info:
MachineName : JBAPP01
TimeStamp : 5/17/2020 10:05:58 PM
FullName : Microsoft.Practices.EnterpriseLibrary.ExceptionHandling, Version=6.0.0.0, Culture=neutral, PublicKeyToken=daf167d5325e0d31
ThreadIdentity : System User
Inner Exception
—————
Type : System.ComponentModel.Win32Exception, System, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089
Message : The semaphore timeout period has expired
Source :
Help link :
NativeErrorCode : 121
ErrorCode : -2147467259
Data : System.Collections.ListDictionaryInternal
TargetSite :
HResult : -2147467259
Stack Trace : The stack trace is unavailable.
Category: Error
Priority: 2
EventId: 100
Severity: Error
Title:Enterprise Library Exception Handling
Machine: JBAPP01
Process Id: 6868
Win32 Thread Id: 11312
Thread Name:
Extended Properties: HelpLink.ProdName – Microsoft SQL Server
HelpLink.EvtSrc – MSSQLServer
HelpLink.EvtID – 121
HelpLink.BaseHelpUrl – http://go.microsoft.com/fwlink
HelpLink.LinkId – 20476

-> The error message at first glance seems like a network issue between Application and database server OR unresponsive database server due to high resource utilization.

-> I setup a perfmon collection on the database server and also started a continuous ping on Application server JBAPP01 pinging Database Server JBDB01 continuously using below powershell command,

ping.exe -t JBDB01|Foreach{"{0} - {1}" -f (Get-Date),$_} > C:\temp\ping\JBDB01.txt

-> The same issue happened. I checked the perfmon counters on the database server. Resource utilization was normal.

-> SQL Server error log and Event Viewer logs on database server did not have any errors during the time the error happened on the application server.

-> Checked the output of the ping command on Application Server and got below details,

5/18/2020 12:05:45 AM – Reply from 192.36.67.12: bytes=32 time<1ms TTL=127
5/18/2020 12:05:46 AM – Reply from 192.36.67.12: bytes=32 time<1ms TTL=127
5/18/2020 12:05:51 AM – Request timed out.
5/18/2020 12:05:56 AM – Request timed out.
5/18/2020 12:06:01 AM – Request timed out.
5/18/2020 12:06:06 AM – Request timed out.
5/18/2020 12:06:11 AM – Request timed out.
5/18/2020 12:06:12 AM – Reply from 192.36.67.12: bytes=32 time<1ms TTL=127
5/18/2020 12:06:13 AM – Reply from 192.36.67.12: bytes=32 time<1ms TTL=127

-> From the above output it is clear that the application is experiencing connectivity errors when there is packet loss as indicated by “Request Timed Out” from above output.

-> This was taken up with Network team. They worked with Windows team and advised that the packet loss are as a result of daily Server snapshot that was performed at that time.

-> Application team updated us that the functionality that is failing every day between 12:00 AM to 12:10 AM is some sort of health monitor that checks if all components of the Application including Database server is healthy. Since the error doesn’t have any business impact, they changed the schedule to skip between 12:00 AM to 12:15 AM and this resolved the issue.

Thank You,
Vivek Janakiraman

Disclaimer:
The views expressed on this blog are mine alone and do not reflect the views of my company or anyone else. All postings on this blog are provided “AS IS” with no warranties, and confers no rights.

 

 

 

Conversations with State Disconnected_inbound piling up in Sys.Conversation_Endpoint

-> I was working on a Service broker issue where conversations with State Disconnected_inbound was piling up in Sys.Conversation_Endpoint.

-> Each new conversation that is initiated ends up with a state Disconnected_inbound in Sys.Conversation_Endpoint.

-> DI Disconnected inbound. The remote side of the conversation has issued an END CONVERSATION. The conversation remains in this state until the local side of the conversation issues an END CONVERSATION. An application might still receive messages for the conversation. Because the remote side of the conversation has ended the conversation, an application cannot send messages on this conversation. When an application issues an END CONVERSATION, the conversation moves to the CD (Closed) state.

-> I confirmed with my Client that Service broker is indeed working and that the results are written to the required destination. It is just that these conversations are not getting closed properly.

-> I checked the stored procedures associated with Service broker on that database and was able to see below code,


SELECT TOP 1
             @ConversationHandle = @ConversationHandle
FROM
             @Messages;

-> There was a mistake in above code. Instead of ConversationHandle it was wrongly depicted as @ConversationHandle.

-> I changed the above code to below,

SELECT TOP 1
             @ConversationHandle = ConversationHandle 
FROM
             @Messages;

-> Usage of @ConversationHandle instead of ConversationHandle always results in NULL for @ConversationHandle, which in turn doesn’t end the conversation properly.

-> This resolved the issue. It was a minor mistake, but had huge impact on the overall process.

Thank You,
Vivek Janakiraman

Disclaimer:
The views expressed on this blog are mine alone and do not reflect the views of my company or anyone else. All postings on this blog are provided “AS IS” with no warranties, and confers no rights.

DBCC Shrinkfile executes forever

-> Client attempted to reclaim what he believes to be significant unused space in a datafile by running DBCC SHRINKFILE.

-> Shrinkfile process was running for over 39 hours. Client was not sure if it was a matter of just waiting, or if it is in a state where it will never complete.

-> I thought that DBCC Shrinkfile is supposed to take lot of time and I was under the impression that it is just a matter of waiting for the operation to complete.

-> I tried checking the process and wanted to confirm if it was progressing and then advise client to wait or just cancel it, as shrink file is not a recommended operation. Client advised me that they purged a huge table and this is an one time operation and it released close to 92% free space. Now they wanted to shrink it just for one time which makes sense.

-> I executed below query to check the progress of shrink operation and it showed that it is 100% complete. But the SPID was still active,

SELECT start_time, percent_complete, DATEADD (ss,DATEDIFF
(ss,start_time,getdate()) / percent_complete * 100,start_time) AS Expected_end_time, st.text as SQL, command as Current_Command,session_id FROM sys.dm_exec_requests er CROSS APPLY
sys.dm_exec_sql_text(er.sql_handle) st
WHERE percent_complete != 0

-> Below was the screen grab of the shrink process,

Blog93_1

-> DBCC Shrinkfile shows as 100% complete. But the process is blocked by SPID 48.

-> SPID 48 shows as NULL. DBCC Inputbuffer also returns NULL. Seems like this is a system process.

-> I stopped the shrinkfile and restarted it. It starts up without any issue. It progresses well, But as soon as it reaches 100% there is a blocking and is always blocked by this system process. This behavior is evident every time I start the Shrink operation on this data file.

-> More details on SPID 48 below,

Blog93_2

-> Spid 48 seems to have a cmd type of “BRKR TASK” and wait type on that process is “BROKER_TRANSMISSION_TABLE”.

-> I did a guess that this system process should be related to Service broker. It seems like some operation related to Servcie Broker is not letting Shrink operation to complete.

-> Executed Sp_Lock and below was the details,

Blog93_3

-> From the above screenshot it is clear that these 2 operations require access to object with objectID 73.

-> Object ID 73 is a system object sysdercv. This object exists in every database, contains a row for each receiving endpoint of a service broker conversation.

-> Object sys.conversation_endpoints has 197483543 rows. Out of which 169473219 rows are Started Outbound.
Blog93_4

-> Started Outbound refers to SQL Server has processed a BEGIN Conversation, but no messages have yet been sent. Checking conversations in sys.conversation_endpoint for state STARTED_OUTBOUND.
Blog93_5
-> Notice the lifetime for these entries, these entries can be reclaimed only after the lifetime message.

-> Advised Client to add a lifetime to every conversation. Below code adds a 120 seconds lifetime for each conversation,


begin dialog conversation @h
	from service [JB_SB_source]
	to service N'JB_SB_target'
	with encryption = off, lifetime = 120;

-> Also advised Client to add an END CONVERSATION in the code which handles the message.

-> Advised Client to execute below code to perform cleanup from sys.conversation_endpoints with state STARTED_OUTBOUND that has a lifetime value greater than 1 month from current date. Below code was copied from this article.


declare @h uniqueidentifier, @count int = 0;
begin transaction;
while (1=1)
begin
	set @h = null;
	select top(1) @h = conversation_handle
		from sys.conversation_endpoints
		where state_desc = N'STARTED_OUTBOUND'
		and security_timestamp = '1900-01-01 00:00:00.000'
		and lifetime '>' dateadd(month, 1, getutcdate())
	if (@h is null)
	begin
		break
	end
	end conversation @h with cleanup;
	set @count += 1;
	if (@count '>' 1000)
	begin
		commit;
		set @count = 0;
		begin transaction;
	end
end
commit

-> After these changes DBCC SHRINKFILE completed fine without any issues.

Thank You,
Vivek Janakiraman

Disclaimer:
The views expressed on this blog are mine alone and do not reflect the views of my company or anyone else. All postings on this blog are provided “AS IS” with no warranties, and confers no rights.