Optimizing queries using XML in SQL Server

-> I recently worked on tuning a query that scans through XML data.
-> Below is the query,


set statistics io on
SELECT t.COL1 FROM Table1 t WHERE COL3=N'1' or COL3=N'10' or COL3=N'102'
or COL3=N'103' or COL3=N'104' or COL3=N'105' or COL3=N'106' or COL3=N'11'
or COL3=N'12' or COL3=N'120' or COL3=N'121' or COL3=N'122' or COL3=N'123'
or COL3=N'124' or COL3=N'125' or COL3=N'126' or COL3=N'127' or COL3=N'129'
or COL3=N'13' or COL3=N'130' or COL3=N'131' or COL3=N'132' or COL3=N'133'
or COL3=N'134' or COL3=N'135' or COL3=N'136' or COL3=N'137' or COL3=N'138'
or COL3=N'139' or COL3=N'14' or COL3=N'141' or COL3=N'143' or COL3=N'144'
or COL3=N'145' or COL3=N'146' or COL3=N'147' or COL3=N'148' or COL3=N'149'
or COL3=N'15' or COL3=N'150' or COL3=N'151' or COL3=N'152' or COL3=N'153'
or COL3=N'154' or COL3=N'155' or COL3=N'156' or COL3=N'157' or COL3=N'158'
or COL3=N'159' or COL3=N'16' or COL3=N'160' or COL3=N'161' or COL3=N'162'
or COL3=N'163' or COL3=N'164' or COL3=N'166' or COL3=N'167' or COL3=N'168'
or COL3=N'169' or COL3=N'17' or COL3=N'170' or COL3=N'171' or COL3=N'173'
or COL3=N'174' or COL3=N'175' or COL3=N'176' or COL3=N'177' or COL3=N'178'
or COL3=N'179' or COL3=N'18' or COL3=N'184' or COL3=N'185' or COL3=N'186'
or COL3=N'187' or COL3=N'188' or COL3=N'189' or COL3=N'19' or COL3=N'190'
or COL3=N'191' or COL3=N'192' or COL3=N'193' or COL3=N'194' or COL3=N'195'
or COL3=N'196' or COL3=N'197' or COL3=N'198' or COL3=N'199' or COL3=N'2'
or COL3=N'20' or COL3=N'200' or COL3=N'201' or COL3=N'202' or COL3=N'203'
or COL3=N'204' or COL3=N'205' or COL3=N'206' or COL3=N'207' or COL3=N'208'
or COL3=N'209' or COL3=N'21' or COL3=N'210' or COL3=N'211' or COL3=N'212'
or COL3=N'213' or COL3=N'214' or COL3=N'215' or COL3=N'216' or COL3=N'217'
or COL3=N'218' or COL3=N'22' or COL3=N'227' or COL3=N'228' or COL3=N'229'
or COL3=N'23' or COL3=N'230' or COL3=N'233' or COL3=N'234' or COL3=N'235'
or COL3=N'236' or COL3=N'237' or COL3=N'238' or COL3=N'239' or COL3=N'24'
or COL3=N'240' or COL3=N'241' or COL3=N'242' or COL3=N'25' or COL3=N'26'
or COL3=N'27' or COL3=N'28' or COL3=N'29' or COL3=N'3' or COL3=N'30'
or COL3=N'309' or COL3=N'31' or COL3=N'32' or COL3=N'33' or COL3=N'34'
or COL3=N'35' or COL3=N'36' or COL3=N'37' or COL3=N'38' or COL3=N'39'
or COL3=N'4' or COL3=N'40' or COL3=N'41' or COL3=N'42' or COL3=N'43'
or COL3=N'44' or COL3=N'45' or COL3=N'46' or COL3=N'47' or COL3=N'48'
or COL3=N'49' or COL3=N'5' or COL3=N'51' or COL3=N'52' or COL3=N'53'
or COL3=N'54' or COL3=N'55' or COL3=N'56' or COL3=N'57' or COL3=N'58'
or COL3=N'59' or COL3=N'6' or COL3=N'62' or COL3=N'64' or COL3=N'66'
or COL3=N'67' or COL3=N'68' or COL3=N'69' or COL3=N'7' or COL3=N'70'
or COL3=N'71' or COL3=N'72' or COL3=N'73' or COL3=N'74' or COL3=N'76'
or COL3=N'77' or COL3=N'78' or COL3=N'79' or COL3=N'8' or COL3=N'80'
or COL3=N'81' or COL3=N'82' or COL3=N'83' or COL3=N'84' or COL3=N'85'
or COL3=N'86' or COL3=N'87' or COL3=N'88' or COL3=N'89' or COL3=N'9'
or COL3=N'90' or COL3=N'91' or COL3=N'92' or COL3=N'93' or COL3=N'94'
or COL3=N'95' or COL3=N'96' or COL3=N'97' or COL3=N'98'
ORDER BY COL6 ,COL5
,ISNULL(COL2.value('(/row/c9)[1]',  'VARCHAR(256)'), CHAR(1)) ,COL1
set statistics io off

-> Object Table1 definition,


CREATE TABLE [dbo].[Table1](
            [COL1] [varchar](255) NOT NULL,
            [COL2] [xml] NULL,
            [COL3]  AS ([dbo].[udTable1_C4]([COL2])) PERSISTED,
            [COL4]  AS ([dbo].[udTable1_C2]([COL2])) PERSISTED,
            [COL5]  AS ([dbo].[udTable1_C8]([COL2])) PERSISTED,
            [COL6]  AS ([dbo].[udTable1_C1]([COL2])) PERSISTED,
            [COL7]  AS ([dbo].[udTable1_C3]([COL2])) PERSISTED,
CONSTRAINT [PK_Table1] PRIMARY KEY CLUSTERED
(
            [COL1] ASC
)WITH (PAD_INDEX = ON, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY] TEXTIMAGE_ON [PRIMARY]

-> Execution plan for the above query,
Blog46_1

-> Checking the “Table Valued Function” operator from the execution plan,
Blog46_2

-> We are spending most of our time reading the XML data from Table Table1.

(3227544 row(s) affected)
Table ‘Worktable’. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table ‘Table1’. Scan count 1, logical reads 2097889, physical reads 0, read-ahead reads 0, lob logical reads 424529, lob physical reads 0, lob read-ahead reads 36.

-> Lets introduce a new computed column using below code to avoid the costly read on XML column,


CREATE FUNCTION [dbo].[udTable1_C9](@COL2 XML)
RETURNS nvarchar(20)
WITH SCHEMABINDING
BEGIN
RETURN ISNULL(@COL2.value('(/row/c9)[1]',  'VARCHAR(256)'), CHAR(1))
END
GO

ALTER TABLE dbo.Table1 ADD COL8  AS ([dbo].[udTable1_C9]([COL2])) PERSISTED
GO

CREATE INDEX IX_TABLE1_C3_C6_C5_C8_C1 ON Table1 (COL3) INCLUDE (COL6,COL5,COL8,COL1)
GO

-> Executing below query. The only change to this query is that we are using the computed column in order by.


set statistics io on
SELECT t.COL1 FROM Table1 t WHERE COL3=N'1' or COL3=N'10' or COL3=N'102'
or COL3=N'103' or COL3=N'104' or COL3=N'105' or COL3=N'106' or COL3=N'11'
or COL3=N'12' or COL3=N'120' or COL3=N'121' or COL3=N'122' or COL3=N'123'
or COL3=N'124' or COL3=N'125' or COL3=N'126' or COL3=N'127' or COL3=N'129'
or COL3=N'13' or COL3=N'130' or COL3=N'131' or COL3=N'132' or COL3=N'133'
or COL3=N'134' or COL3=N'135' or COL3=N'136' or COL3=N'137' or COL3=N'138'
or COL3=N'139' or COL3=N'14' or COL3=N'141' or COL3=N'143' or COL3=N'144'
or COL3=N'145' or COL3=N'146' or COL3=N'147' or COL3=N'148' or COL3=N'149'
or COL3=N'15' or COL3=N'150' or COL3=N'151' or COL3=N'152' or COL3=N'153'
or COL3=N'154' or COL3=N'155' or COL3=N'156' or COL3=N'157' or COL3=N'158'
or COL3=N'159' or COL3=N'16' or COL3=N'160' or COL3=N'161' or COL3=N'162'
or COL3=N'163' or COL3=N'164' or COL3=N'166' or COL3=N'167' or COL3=N'168'
or COL3=N'169' or COL3=N'17' or COL3=N'170' or COL3=N'171' or COL3=N'173'
or COL3=N'174' or COL3=N'175' or COL3=N'176' or COL3=N'177' or COL3=N'178'
or COL3=N'179' or COL3=N'18' or COL3=N'184' or COL3=N'185' or COL3=N'186'
or COL3=N'187' or COL3=N'188' or COL3=N'189' or COL3=N'19' or COL3=N'190'
or COL3=N'191' or COL3=N'192' or COL3=N'193' or COL3=N'194' or COL3=N'195'
or COL3=N'196' or COL3=N'197' or COL3=N'198' or COL3=N'199' or COL3=N'2'
or COL3=N'20' or COL3=N'200' or COL3=N'201' or COL3=N'202' or COL3=N'203'
or COL3=N'204' or COL3=N'205' or COL3=N'206' or COL3=N'207' or COL3=N'208'
or COL3=N'209' or COL3=N'21' or COL3=N'210' or COL3=N'211' or COL3=N'212'
or COL3=N'213' or COL3=N'214' or COL3=N'215' or COL3=N'216' or COL3=N'217'
or COL3=N'218' or COL3=N'22' or COL3=N'227' or COL3=N'228' or COL3=N'229'
or COL3=N'23' or COL3=N'230' or COL3=N'233' or COL3=N'234' or COL3=N'235'
or COL3=N'236' or COL3=N'237' or COL3=N'238' or COL3=N'239' or COL3=N'24'
or COL3=N'240' or COL3=N'241' or COL3=N'242' or COL3=N'25' or COL3=N'26'
or COL3=N'27' or COL3=N'28' or COL3=N'29' or COL3=N'3' or COL3=N'30'
or COL3=N'309' or COL3=N'31' or COL3=N'32' or COL3=N'33' or COL3=N'34'
or COL3=N'35' or COL3=N'36' or COL3=N'37' or COL3=N'38' or COL3=N'39'
or COL3=N'4' or COL3=N'40' or COL3=N'41' or COL3=N'42' or COL3=N'43'
or COL3=N'44' or COL3=N'45' or COL3=N'46' or COL3=N'47' or COL3=N'48'
or COL3=N'49' or COL3=N'5' or COL3=N'51' or COL3=N'52' or COL3=N'53'
or COL3=N'54' or COL3=N'55' or COL3=N'56' or COL3=N'57' or COL3=N'58'
or COL3=N'59' or COL3=N'6' or COL3=N'62' or COL3=N'64' or COL3=N'66'
or COL3=N'67' or COL3=N'68' or COL3=N'69' or COL3=N'7' or COL3=N'70'
or COL3=N'71' or COL3=N'72' or COL3=N'73' or COL3=N'74' or COL3=N'76'
or COL3=N'77' or COL3=N'78' or COL3=N'79' or COL3=N'8' or COL3=N'80'
or COL3=N'81' or COL3=N'82' or COL3=N'83' or COL3=N'84' or COL3=N'85'
or COL3=N'86' or COL3=N'87' or COL3=N'88' or COL3=N'89' or COL3=N'9'
or COL3=N'90' or COL3=N'91' or COL3=N'92' or COL3=N'93' or COL3=N'94'
or COL3=N'95' or COL3=N'96' or COL3=N'97' or COL3=N'98'
ORDER BY COL6 ,COL5 ,ISNULL(COL8, CHAR(1)) ,COL1
set statistics io off

-> Below execution plan uses the index created. It no longer performs a read on the XML column,
Blog46_3

-> Execution stats below for the query,

(3227544 row(s) affected)
Table ‘Table1’. Scan count 202, logical reads 20644, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

-> Reads are greatly reduced and the query performs comparably good. All good till now as far as scalar promotion is concerned. Lets look at issues that this computed column can cause,

1) All queries using Table1 will be serial after computed column with scalar function is defined on Table1.
2) Parallel rebuild on this table will not be possible if SQL Server Enterprise edition is used.
3) More space used by the computed column and indexes.
4) Many more….

-> It is very important to test with important workloads before using computed columns in production.

-> In my case the script cannot be rewritten and this query is very important for our business. Further Table1 had computed columns with scalar function before I created this, so went with this choice.

-> I would have definitely thought twice if this was the first computed column for Table1.

Thank You,
Vivek Janakiraman

Disclaimer:
The views expressed on this blog are mine alone and do not reflect the views of my company or anyone else. All postings on this blog are provided “AS IS” with no warranties, and confers no rights

Advertisements

Why Bookmark Lookup is BAD?

-> As per article “https://technet.microsoft.com/en-us/library/ms180920(v=sql.105).aspx“. The Bookmark Lookup operator uses a bookmark (row ID or clustering key) to look up the corresponding row in the table or clustered index. The Argument column contains the bookmark label used to look up the row in the table or clustered index. The Argument column also contains the name of the table or clustered index in which the row is looked up. If the WITH PREFETCH clause appears in the Argument column, the query processor has determined that it is optimal to use asynchronous prefetching (read-ahead) when looking up bookmarks in the table or clustered index.

-> Create below database and objects,

create database JB_BookMark
go
use JB_BookMark
go
create table Table1(
Col1 int primary key identity(1,1),
Col2 int,
Col3 int,
Col4 varchar(100))
CREATE NONCLUSTERED INDEX IX_Table1 ON dbo.Table1(Col2)

-> Populate the table using below tsql query,

set nocount on
declare @i int=1
declare @j int=65
while (@i<=100000)
begin
if(@j=91)
begin
set @j=65
end
insert into Table1 values(@i,rand(@i*9)*1000,replicate(CHAR(@j),5))
set @i=@i+1
set @j=@j+1
end

-> Rebuild the indexes to remove any fragmentation,

USE [JB_BookMark]
GO
ALTER INDEX [PK__Table1__A259EE54D34C8103] ON [dbo].[Table1] REBUILD
GO
USE [JB_BookMark]
GO
ALTER INDEX [IX_Table1] ON [dbo].[Table1] REBUILD
GO

-> Execute the below query with actual execution plan enabled,

set statistics time on
set statistics IO on
select col1,col2,col3,col4 from table1
where col2 <=1
set statistics time off
set statistics IO off

Blog20_3.png

Blog20_4.PNG

-> The query cost is at 0.0065704. The operator “Index seek” returns 1 row(s).  The operator “Key Lookup” scans through the cluster index for each row that comes from the “Index seek” operator. In this case it is just 1 time. Logical reads done is around 4. The query overall took 49 MS to complete.

-> Execute the below query with actual execution plan enabled,

set statistics time on
set statistics IO on
select col1,col2,col3,col4 from table1
where col2 <=10
set statistics time off
set statistics IO off

Blog20_1.png

Blog20_2.PNG

-> The query cost is at 0.0327403. The operator “Index seek” returns 10 rows.  The operator “Key Lookup” scans through the cluster index for each row that comes from the “Index seek” operator. In this case it is 10 times. Logical reads done is around 22. The query overall took 62 MS to complete.

-> Execute the below query with actual execution plan enabled,

set statistics time on
set statistics IO on
select col1,col2,col3,col4 from table1
where col2 <=100
set statistics time off
set statistics IO off

Blog20_5.png

Blog20_6.PNG

-> The query cost is at 0.291569. The operator “Index seek” returns 100 rows.  The operator “Key Lookup” scans through the cluster index for each row that comes from the “Index seek” operator. In this case it is 100 times. Logical reads done is around 217. The query overall took 370 MS to complete.

-> It seems like whenever the execution plan for a query has key lookup, its performance degrades with the number of rows that it has to process. In our case, below is the final test result,

Blog20_7

-> We can create below index to avoid key lookup,

USE [JB_BookMark]
GO
CREATE NONCLUSTERED INDEX [IX_Table1_1]
ON [dbo].[Table1] ([Col2])
INCLUDE ([Col1],[Col3],[Col4])
GO

-> Execute the below query after the above index is created with actual execution plan enabled,

set statistics time on
set statistics IO on
select col1,col2,col3,col4 from table1
where col2 <=100
set statistics time off
set statistics IO off

Blog20_8.png

Blog20_9

-> Once we remove the key lookup, the query completes in 1 MS with fewer cost and logical reads.

-> So this means, we can create covering index for all queries that has key lookup? Creating covering index will remove the key lookup and the performance of the query will definitely increase, but it will also increase the space used by the index, database maintenance time, backup size. So it is wise to test things before taking a decision.

Thank You,
Vivek Janakiraman

Disclaimer:
The views expressed on this blog are mine alone and do not reflect the views of my company or anyone else. All postings on this blog are provided “AS IS” with no warranties, and confers no rights.

Column data may be deleted when you update another variable-length column in a table of a database upgraded from SQL Server 2005

-> <Copied from https://support.microsoft.com/en-us/help/3120595/fix-column-data-may-be-deleted-when-you-update-another-variable-length-column-in-a-table-of-a-database-upgraded-from-sql-server-2005>

When you update a value in a variable-length column in a table of a database that was upgraded from Microsoft SQL Server 2005, data in another variable-length column in the same row is deleted.

This issue occurs in Microsoft SQL Server 2014 and Microsoft SQL Server 2012. The following scenario may expose a database to this issue. This list represents a sequence of events:

-> A database is created in SQL Server 2005.
-> In SQL Server 2005, you alter a table, and then you drop one or more variable-length columns at the end of the table definition.
-> You add new variable-length columns that have a similar data type and length of the dropped columns.
-> The database is upgraded from SQL Server 2005 to SQL Server 2014 or SQL Server 2012.
-> You later update the value of another variable-length column, and the updated data has to overflow to a new page.

-> </Copied from https://support.microsoft.com/en-us/help/3120595/fix-column-data-may-be-deleted-when-you-update-another-variable-length-column-in-a-table-of-a-database-upgraded-from-sql-server-2005>

-> Lets follow the below demo to reproduce the issue. Create the below database and objects in SQL server 2005.

use master
go
create database test_dropCol
go

use test_dropCol
go

CREATE TABLE [dbo].test(
       [Id] [int] IDENTITY(1,1) NOT NULL,
       col1 [nvarchar](18) NOT NULL,
col2 [datetime] NOT NULL,
col3 [varchar](max) NULL,
col4 [varchar](max) NULL,
col5 [varchar](max) NULL,
CONSTRAINT[PK_ID]PRIMARY KEYCLUSTERED
(
[Id] ASC
)
) ON [PRIMARY]

— Drop the column
BEGIN TRANSACTION
GO
ALTER TABLE dbo.test
DROP COLUMN col5
GO
COMMIT

— Looking at the leaf_offset
SELECT DISTINCT OBJECT_NAME(sp.[object_id]) AS TableWithDroppedCols
FROM sys.system_internals_partition_columns sipc1 WITH (NOLOCK)
INNER JOIN sys.system_internals_partition_columns sipc2 WITH (NOLOCK)
ON sipc1.[partition_id] = sipc2.[partition_id] AND
sipc1.leaf_offset = sipc2.leaf_offset AND
sipc1.leaf_offset< 0 AND
sipc1.is_dropped < sipc2.is_dropped
INNER JOIN sys.partitions sp WITH (NOLOCK) ON sipc1.[partition_id] = sp.[partition_id];
go
select sipc.partition_column_id, c.name,c.max_length, c.system_type_id,sipc.is_dropped, sipc.leaf_offset
from sys.system_internals_partition_columns sipc left outer join sys.partitions p on sipc.partition_id=p.partition_id
left outer join sys.columns c on c.object_id = p.object_id and c.column_id = sipc.partition_column_id
where p.object_id=object_id(‘test’)
go

 Blog19_1
— Creating another column
BEGIN TRANSACTION
GO

ALTER TABLE dbo.test ADD
col6 varchar(MAX) NULL
GO
COMMIT

— Looking at the leaf_offset again
SELECT DISTINCT OBJECT_NAME(sp.[object_id]) AS TableWithDroppedCols
FROM sys.system_internals_partition_columns sipc1 WITH (NOLOCK)
INNER JOIN sys.system_internals_partition_columns sipc2 WITH (NOLOCK)
ON sipc1.[partition_id] = sipc2.[partition_id] AND
sipc1.leaf_offset = sipc2.leaf_offset AND
sipc1.leaf_offset< 0 AND
sipc1.is_dropped < sipc2.is_dropped
INNER JOIN sys.partitions sp WITH (NOLOCK) ON sipc1.[partition_id] = sp.[partition_id];
go
select sipc.partition_column_id, c.name,c.max_length, c.system_type_id,sipc.is_dropped, sipc.leaf_offset
from sys.system_internals_partition_columns sipc left outer join sys.partitions p on sipc.partition_id=p.partition_id
left outer join sys.columns c on c.object_id = p.object_id and c.column_id = sipc.partition_column_id
where p.object_id=object_id(‘test’)
go
— I could see -4 leaf offset for the newly created column and also the deleted column. The select query listed in “More Information” in KB3120595 returns that table test now.

Blog19_3

-> Execute the below insert (we are also populating the last column which is having same offset as dropped column)


insert into test values(replicate (‘a’, 18),getdate(),replicate (‘a’, 8000),replicate (‘a’, 8000), replicate (‘a’, 8000))
go
select * from test
go

Blog19_2

2) Take backup of this database and restore in SQL server 2012 or SQL server 2014  (on the build that doesn’t have the fix mentioned in 3120595)

3) Now execute the below query to reproduce data loss issue in SQL server 2012 or SQL server 2014– (Note – we are updating column before the new column that has same offset as dropped column)

begin tran
select * from test
go

update test
set col4 = replicate(‘a’, 8000)
go

select * from test
go

Blog19_4

-> <Copied from https://support.microsoft.com/en-us/help/3120595/fix-column-data-may-be-deleted-when-you-update-another-variable-length-column-in-a-table-of-a-database-upgraded-from-sql-server-2005>

Workaround
Important Apply this workaround when you upgrade the database from SQL Server 2005 to SQL Server 2014 or SQL Server 2012 and before you let UPDATE statements be run in the upgraded database.
To work around this issue if you drop columns in a table, make sure that you do either of the following before you update or delete rows:

-> Rebuild the clustered index by using ALTER INDEX (…) REBUILD
-> Rebuild the heap by using ALTER TABLE (…) REBUILD

Resolution
Note This fix only prevents future occurrences of the issue.

Important If you are upgrading the database from SQL Server 2005 to SQL Server 2014 or SQL Server 2012, make sure that you refer to the “Workaround” section.

This issue was first fixed in the following cumulative update for SQL Server:

-> Cumulative Update 13 for SQL Server 2014
-> Cumulative Update 6 for SQL Server 2014 Service Pack 1
-> Cumulative Update 10 for SQL Server 2012 Service Pack 2
-> Cumulative Update 2 for SQL Server 2012 Service Pack 3

-> </Copied from https://support.microsoft.com/en-us/help/3120595/fix-column-data-may-be-deleted-when-you-update-another-variable-length-column-in-a-table-of-a-database-upgraded-from-sql-server-2005>

Thank You,
Vivek Janakiraman

Disclaimer:
The views expressed on this blog are mine alone and do not reflect the views of my company or anyone else. All postings on this blog are provided “AS IS” with no warranties, and confers no rights.