SQL services fails after applying service pack

-> I applied service pack on one of the SQL instance in a 2 node cluster. After applying the service pack, SQL services failed to come online.

-> I applied service pack on passive node first and then on the active node. The install went fine without any issues. But after the service pack upgrade SQL server did not start.

-> Checked the SQL server error log and found that the SQL server fails during script upgrade mode with below error.

2016-12-08 23:57:25.43 spid8s      DBCC execution completed. If DBCC printed error messages, contact your system administrator.
2016-12-08 23:57:25.43 spid8s      Error: 537, Severity: 16, State: 3.
2016-12-08 23:57:25.43 spid8s      Invalid length parameter passed to the LEFT or SUBSTRING function.
2016-12-08 23:57:25.43 spid8s      Error: 912, Severity: 21, State: 2.
2016-12-08 23:57:25.43 spid8s      Script level upgrade for database ‘master’ failed because upgrade step ‘sqlagent100_msdb_upgrade.sql’encountered error 537, state 3, severity 16. This is a serious error condition which might interfere with regular operation and the database will be taken offline. If the error happened during upgrade of the ‘master’ database, it will prevent the entire SQL Server instance from starting. Examine the previous errorlog entries for errors, take the appropriate corrective actions and re-start the database so that the script upgrade steps run to completion.
2016-12-08 23:57:25.43 spid8s      Error: 3417, Severity: 21, State: 3.
2016-12-08 23:57:25.43 spid8s      Cannot recover the master database. SQL Server is unable to run. Restore master from a full backup, repair it, or rebuild it. For more information about how to rebuild the master database, see SQL Server Books Online.
2016-12-08 23:57:25.43 spid8s      SQL Trace was stopped due to server shutdown. Trace ID = ‘1’. This is an informational message only; no user action is required.
2016-12-08 23:57:25.46 Logon       Error: 18401, Severity: 14, State: 1.
2016-12-08 23:57:25.46 Logon       Login failed for user ‘iuser_SitefinityFrontier’. Reason: Server is in script upgrade mode. Only administrator can connect at this time. [CLIENT: 66.133.172.174]

-> Added -T902 to stop script upgrade mode. Started the SQL services, it started without issues. Checked the owner for MSDB and Master. Both were having SA as owner.

902 Bypasses execution of database upgrade script when installing a Cumulative Update or Service Pack.

-> Searched for sqlagent100_msdb_upgrade.sql file and opened it.

-> The script upgrade mode was failing on below set of statement in file sqlagent100_msdb_upgrade.sql,

SELECT @device_directory = SUBSTRING(filename, 1, CHARINDEX(N’master.mdf’, LOWER(filename)) – 1)  FROM master.dbo.sysaltfiles
WHERE (name = N’master’)

-> The SQL server instance did have more than one database with logical name “master”.

-> Query “”SELECT db_name(dbid) FROM master.dbo.sysaltfiles WHERE (name = N’master’)”” returned 3 databases including master database in my SQL server instance.

-> Variable @device_directory can contain only one value. But in our case 3 values were returned and thats the reason we got this error.

-> I detached the 2 databases.

-> Ran Query “”SELECT db_name(dbid) FROM master.dbo.sysaltfiles WHERE (name = N’master’)”” and it returned just master database.

-> Removed the startup parameter -T902 and started the SQL service from cluadmin.msc.

-> SQL server started this time without issues. Script upgrade mode completed fine without any issues and the version changed to appropriate value.

-> Failed over the SQL instance to other node and it came online without issues.

Thank You,
Vivek Janakiraman

Disclaimer:
The views expressed on this blog are mine alone and do not reflect the views of my company or anyone else. All postings on this blog are provided “AS IS” with no warranties, and confers no rights.

Advertisements

The database owner SID recorded in the master database differs from the database owner SID recorded in database ‘msdb’. You should correct this situation by resetting the owner of database ‘msdb’ using the ALTER AUTHORIZATION statement.

-> We were working on an upgrade project. We were performing an upgrade of a standalone SQL server 2005 to SQL server 2012. The upgrade failed with the error “The database owner SID recorded in the master database differs from the database owner SID recorded in database ‘msdb’. You should correct this situation by resetting the owner of database ‘msdb’ using the ALTER AUTHORIZATION statement.”.

-> The SQL server version changed to SQL server 2012. We tried a repair and it failed.

-> We tried starting the SQL services and got the below error in the errorlog.

.
.
2016-01-05 20:35:00.25 spid5s      Error: 33009, Severity: 16, State: 2.
2016-01-05 20:35:00.25 spid5s      The database owner SID recorded in the master database differs from the database owner SID recorded in database ‘msdb’. You should correct this situation by resetting the owner of database ‘msdb’ using the ALTER AUTHORIZATION statement.
2016-01-05 20:35:00.25 spid5s      The failed batch of t-sql statements :
2016-01-05 20:35:00.25 spid5s      Error: 912, Severity: 21, State: 2.
2016-01-05 20:35:00.25 spid5s      Script level upgrade for database ‘master’ failed because upgrade step ‘sqlagent100_msdb_upgrade.sql’ encountered error 33009, state 2, severity 16. This is a serious error condition which might interfere with regular operation and the database will be taken offline. If the error happened during upgrade of the ‘master’ database, it will prevent the entire SQL Server instance from starting. Examine the previous errorlog entries for errors, take the appropriate corrective actions and re-start the database so that the script upgrade steps run to completion.
2016-01-05 20:35:00.26 spid5s      Error: 3417, Severity: 21, State: 3.
2016-01-05 20:35:00.26 spid5s      Cannot recover the master database. SQL Server is unable to run. Restore master from a full backup, repair it, or rebuild it. For more information about how to rebuild the master database, see SQL Server Books Online.
2016-01-05 20:35:00.26 spid5s      SQL Trace was stopped due to server shutdown. Trace ID = ‘1’. This is an informational message only; no user action is required.

-> I added trace flag -T902 in the SQL server startup parameter and started the SQL services to see if it starts, it started without any issues.

902 Bypasses execution of database upgrade script when installing a Cumulative Update or Service Pack.

-> Went into SQL server management studio and ran the below query. “select * from sys.databases”.

-> The owner for master database was SA -> sid: 0x01. But MSDB was something else. Executed the below command and changed the owner of MSDB to SA.

use msdb
go
sp_changeobjectowner ‘sa’
go

-> Once this was done, I removed the trace flag -T902 from the SQL server startup parameter and started the SQL services and it started fine without any issues.

->  Please note that you cannot change the owner of system databases (Master, Model and tempdb) as per link http://technet.microsoft.com/en-us/library/ms190909(v=SQL.105).aspx.

Thank You,
Vivek Janakiraman

Disclaimer:
The views expressed on this blog are mine alone and do not reflect the views of my company or anyone else. All postings on this blog are provided “AS IS” with no warranties, and confers no rights.

SQL resources not coming online after patching

-> I had a requirement to install service pack on a 2-node cluster server. It was an active-passive server.

-> I installed the service pack on the passive node successfully. I tried failing over the SQL services from Active node to Passive node and it failed.

-> Checked the cluster.log and found the below errors,

0001164.00001b70::2016/12/01-05:10:32.854 WARN [ClNet] Tcpip is not bound to adapter E8BF15A3-3617-4818-94E4-7BCC37A3B45D.
000014dc.00000868::2016/12/01-05:12:41.025 ERR  SQL Server <SQL Server>: [sqsrvres] OnlineThread: SQLClusterResourceWorkerWaitForCompletion failed (status 0)
0000101c.00001078::2016/12/01-05:12:41.025 WARN [FM] FmpHandleResourceTransition: Resource Name = 3c028454-7dc4-4943-8bb9-89712945d2f2 [SQL Server] old state=129 new state=4
0000101c.00001078::2016/12/01-05:12:41.025 WARN [FM] FmpHandleResourceTransition: Resource failed, post a work item
000014dc.00001b1c::2016/12/01-05:14:43.117 ERR  SQL Server <SQL Server>: [sqsrvres] OnlineThread: SQLClusterResourceWorkerWaitForCompletion failed (status 0)
0000101c.00001078::2016/12/01-05:14:43.117 WARN [FM] FmpHandleResourceTransition: Resource Name = 3c028454-7dc4-4943-8bb9-89712945d2f2 [SQL Server] old state=129 new state=4
0000101c.00001078::2016/12/01-05:14:43.117 WARN [FM] FmpHandleResourceTransition: Resource failed, post a work item
0000101c.00001078::2016/12/01-05:14:43.117 WARN [FM] Group failure for group <56e7dc42-6936-4227-9ed7-eea24233be74>. Create thread to take offline and move.
00001164.000013ac::2016/12/01-05:14:44.086 WARN Physical Disk <R: >: Offline, Locking volume failed, error 5.
00001164.000013ac::2016/12/01-05:14:45.164 WARN Physical Disk <R: >: Offline, Locking volume failed, error 5.
0000101c.00001028::2016/12/01-05:16:20.834 WARN [INIT] The cluster service is shutting down.
0000101c.00001028::2016/12/01-05:16:20.834 WARN [FM] Shutdown: Failover Manager requested to shutdown groups.

-> Eventviewer log contains the error as below,

 OnlineThread: SQLClusterResourceWorkerWaitForCompletion failed (status 0)

-> Tried starting the SQL service from services.msc. The SQL server came online after script upgrade mode. Connected the SQL server instance from SQL server management studio and the version changed as appropriate.

-> Enabled verbose logging for cluster by setting VerboseLogging to 1 in HKEY_LOCAL_MACHINE\Cluster\Groups\<GUID>\Parameters\VerboseLogging.

-> Started the SQL services again from cluadmin.msc. It failed and got the below message on Cluster.log.

00001184.000019d4::2016/12/04-07:20:30.771 INFO SQL Server <SQL Server>: [sqsrvres] Worker Thread (F2DF70): Worker thread starting …
00001184.000019d4::2016/12/04-07:20:30.771 INFO SQL Server <SQL Server>: [sqsrvres] Worker Thread (F2DF70): Entering SQLClusterSharedDataUpgradeWorker thread.
00001184.000019d4::2016/12/04-07:20:30.771 INFO SQL Server <SQL Server>: [sqsrvres] Worker Thread (F2DF70): SqlDataRoot = ‘R:\MSSQL10.MSSQLSERVER\MSSQL’,         CurrentPatchLevel = ‘10.3.5500.0’,         SharedDataPatchLevel = ‘10.1.2531.0’
00001184.000019d4::2016/12/04-07:20:30.771 INFO SQL Server <SQL Server>: [sqsrvres] Worker Thread (F2DF70): FTDataPath = ‘R:\MSSQL10.MSSQLSERVER\MSSQL\FTData’,         SQLGroup = ‘S-1-5-21-1897246555-1303978887-2451887125-1005’
00001184.000019d4::2016/12/04-07:20:30.771 INFO SQL Server <SQL Server>: [sqsrvres] Worker Thread (F2DF70): Entering DoSQLDataRootApplyACL (R:\MSSQL10.MSSQLSERVER\MSSQL\Data).
00001184.000019d4::2016/12/04-07:20:30.771 INFO SQL Server <SQL Server>: [sqsrvres] Worker Thread (F2DF70): ACL-ing (R:\MSSQL10.MSSQLSERVER\MSSQL\Data) as full-control to SQLGroup.
00001184.000019d4::2016/12/04-07:20:30.787 WARN SQL Server <SQL Server>: [sqsrvres] Worker Thread (F2DF70): DoSQLDataRootApplyACL : Failed to apply security to SQLDataRoot (122).
00001184.000019d4::2016/12/04-07:20:30.787 INFO SQL Server <SQL Server>: [sqsrvres] Worker Thread (F2DF70): Exiting DoSQLDataRootApplyACL (0).
00001184.000019d4::2016/12/04-07:20:30.787 INFO SQL Server <SQL Server>: [sqsrvres] Worker Thread (F2DF70): Entering DoIFTSSharedDataUpgrade.
00001184.00001e54::2016/12/04-07:20:30.975 INFO SQL Server <SQL Server>: [sqsrvres] Worker Thread (F2DF70): Calling SQLClusterResourceWorker::WaitForCompletion (200)

-> The cause seems to be that the SQL resource DLL detects the installation of a patch during the first failover to the newly patched node. Thus it spawns a worker thread to update the databases accordingly (e.g. create updated system stored procedures and views, etc.). As a first step this worker thread will re-create the NTFS file permissions on the data folders of this SQL instance. As the resource DLL runs in the process RHS.EXE (Failover Cluster Resource Host Subsystem), it runs with the credentials of NT AUTHORITY\SYSTEM.

-> Provided full control to SYSTEM, cluster account, SQL service account to below location.

R:\MSSQL10.MSSQLSERVER\MSSQL\Data
R:\MSSQL10.MSSQLSERVER\MSSQL\
R:\

-> Tried running icacls R:\MSSQL10.MSSQLSERVER\MSSQL\Data, icacls R:\MSSQL10.MSSQLSERVER\MSSQL and icacls R:\ and was able to see that all the above account were having full permission.

-> Started a procmon and failed over the SQL services from active node to passive node. The SQL services failed in cluadmin.msc, but still clusterresourcemonitor was working with files on folder R:\MSSQL10.MSSQLSERVER\MSSQL\LOG.

-> It seems like the worker thread that recreates the NTFS file permissions on the log folder of this SQL instance is causing the issue.

-> The log folder was having 373275 files and was of size 23GB.

-> Tried renaming the folder R:\MSSQL10.MSSQLSERVER\MSSQL\LOG to R:\MSSQL10.MSSQLSERVER\MSSQL\LOG_old and created a dummy LOG folder.

-> Failed over the SQL services from active to passive node. SQL services came online without any issues.

-> It seems that during the first failover after a patch the worker thread while re-creating the NTFS file permissions on MSSQL folder had to move through huge lot of files and eventually timed out and that is the reason why this failed out earlier.

-> Patched the active node. Failover and failback worked fine.

Thank You,
Vivek Janakiraman

Disclaimer:
The views expressed on this blog are mine alone and do not reflect the views of my company or anyone else. All postings on this blog are provided “AS IS” with no warranties, and confers no rights.

Database ‘model’ cannot be opened. It is in the middle of a restore.

-> SQL server was not coming online in one of the database server.

-> I checked the SQL server log and found the below error,

.
.
2016-12-25 19:43:43.02 spid12s     Starting up database ‘msdb’.
2016-12-25 19:43:43.03 spid10s     Starting up database ‘model’.
2016-12-25 19:43:43.05 spid10s     The database ‘model’ is marked RESTORING and is in a state that does not allow recovery to be run.
2016-12-25 19:43:43.05 spid10s     Error: 927, Severity: 14, State: 2.
2016-12-25 19:43:43.05 spid10s     Database ‘model’ cannot be opened. It is in the middle of a restore.
2016-12-25 19:43:43.05 spid10s     Could not create tempdb. You may not have enough disk space available. Free additional disk space by deleting other files on the tempdb drive and then restart SQL Server. Check for additional errors in the event log that may indicate why the tempdb files could not be initialized.
2016-12-25 19:43:43.05 spid10s     SQL Trace was stopped due to server shutdown. Trace ID = ‘1’. This is an informational message only; no user action is required.
.
.

-> As per the SQL server errorlog, it seems like the model database is restoring. But in reality we see this error when model database is corrupted. The error related to tempdb is because tempdb is recreated everytime SQL server starts using model database.

-> Open services.msc and get to the SQL server services. Right click the service and click on properties. In General tab, look for “path to executable”. Copy the SQLServr.exe location from there.

-> Open a command prompt and navigate to the location. Now type the below command,

sqlservr.exe -s<InstanceName> -T3608 -T3609 -c -f
3608 Recovers master database only. Skips automatic recovery (at startup) for all databases except the master database.
3609 Skips the creation of the tempdb database at startup. Use this trace flag if the tempdb database is problematic or problems exist in the model database.
 -c Start as a console and not service.
 -f Minimal configuration mode.

-> Open another command prompt and use the below command,

SQLCMD -E -SADMIN:<Hostname\InstanceName>

-> Execute the below command,

use master
go
sp_detach_db ‘model
go

-> Execute the below command to attach the database,

CREATE DATABASE [model] ON (FILENAME = N'<Location>\model.mdf’ ),( FILENAME = N'<Location>\modellog.ldf’ )
FOR ATTACH

-> If you get an error that the files cannot be attached. You will have to copy the MDF and LDF file from a different instance of same build and issue the above command again.

Thank You,
Vivek Janakiraman

Disclaimer:
The views expressed on this blog are mine alone and do not reflect the views of my company or anyone else. All postings on this blog are provided “AS IS” with no warranties, and confers no rights.