Why Bookmark Lookup is BAD?

-> As per article “https://technet.microsoft.com/en-us/library/ms180920(v=sql.105).aspx“. The Bookmark Lookup operator uses a bookmark (row ID or clustering key) to look up the corresponding row in the table or clustered index. The Argument column contains the bookmark label used to look up the row in the table or clustered index. The Argument column also contains the name of the table or clustered index in which the row is looked up. If the WITH PREFETCH clause appears in the Argument column, the query processor has determined that it is optimal to use asynchronous prefetching (read-ahead) when looking up bookmarks in the table or clustered index.

-> Create below database and objects,

create database JB_BookMark
go
use JB_BookMark
go
create table Table1(
Col1 int primary key identity(1,1),
Col2 int,
Col3 int,
Col4 varchar(100))
CREATE NONCLUSTERED INDEX IX_Table1 ON dbo.Table1(Col2)

-> Populate the table using below tsql query,

set nocount on
declare @i int=1
declare @j int=65
while (@i<=100000)
begin
if(@j=91)
begin
set @j=65
end
insert into Table1 values(@i,rand(@i*9)*1000,replicate(CHAR(@j),5))
set @i=@i+1
set @j=@j+1
end

-> Rebuild the indexes to remove any fragmentation,

USE [JB_BookMark]
GO
ALTER INDEX [PK__Table1__A259EE54D34C8103] ON [dbo].[Table1] REBUILD
GO
USE [JB_BookMark]
GO
ALTER INDEX [IX_Table1] ON [dbo].[Table1] REBUILD
GO

-> Execute the below query with actual execution plan enabled,

set statistics time on
set statistics IO on
select col1,col2,col3,col4 from table1
where col2 <=1
set statistics time off
set statistics IO off

Blog20_3.png

Blog20_4.PNG

-> The query cost is at 0.0065704. The operator “Index seek” returns 1 row(s).  The operator “Key Lookup” scans through the cluster index for each row that comes from the “Index seek” operator. In this case it is just 1 time. Logical reads done is around 4. The query overall took 49 MS to complete.

-> Execute the below query with actual execution plan enabled,

set statistics time on
set statistics IO on
select col1,col2,col3,col4 from table1
where col2 <=10
set statistics time off
set statistics IO off

Blog20_1.png

Blog20_2.PNG

-> The query cost is at 0.0327403. The operator “Index seek” returns 10 rows.  The operator “Key Lookup” scans through the cluster index for each row that comes from the “Index seek” operator. In this case it is 10 times. Logical reads done is around 22. The query overall took 62 MS to complete.

-> Execute the below query with actual execution plan enabled,

set statistics time on
set statistics IO on
select col1,col2,col3,col4 from table1
where col2 <=100
set statistics time off
set statistics IO off

Blog20_5.png

Blog20_6.PNG

-> The query cost is at 0.291569. The operator “Index seek” returns 100 rows.  The operator “Key Lookup” scans through the cluster index for each row that comes from the “Index seek” operator. In this case it is 100 times. Logical reads done is around 217. The query overall took 370 MS to complete.

-> It seems like whenever the execution plan for a query has key lookup, its performance degrades with the number of rows that it has to process. In our case, below is the final test result,

Blog20_7

-> We can create below index to avoid key lookup,

USE [JB_BookMark]
GO
CREATE NONCLUSTERED INDEX [IX_Table1_1]
ON [dbo].[Table1] ([Col2])
INCLUDE ([Col1],[Col3],[Col4])
GO

-> Execute the below query after the above index is created with actual execution plan enabled,

set statistics time on
set statistics IO on
select col1,col2,col3,col4 from table1
where col2 <=100
set statistics time off
set statistics IO off

Blog20_8.png

Blog20_9

-> Once we remove the key lookup, the query completes in 1 MS with fewer cost and logical reads.

-> So this means, we can create covering index for all queries that has key lookup? Creating covering index will remove the key lookup and the performance of the query will definitely increase, but it will also increase the space used by the index, database maintenance time, backup size. So it is wise to test things before taking a decision.

Thank You,
Vivek Janakiraman

Disclaimer:
The views expressed on this blog are mine alone and do not reflect the views of my company or anyone else. All postings on this blog are provided “AS IS” with no warranties, and confers no rights.

Advertisements

Column data may be deleted when you update another variable-length column in a table of a database upgraded from SQL Server 2005

-> <Copied from https://support.microsoft.com/en-us/help/3120595/fix-column-data-may-be-deleted-when-you-update-another-variable-length-column-in-a-table-of-a-database-upgraded-from-sql-server-2005>

When you update a value in a variable-length column in a table of a database that was upgraded from Microsoft SQL Server 2005, data in another variable-length column in the same row is deleted.

This issue occurs in Microsoft SQL Server 2014 and Microsoft SQL Server 2012. The following scenario may expose a database to this issue. This list represents a sequence of events:

-> A database is created in SQL Server 2005.
-> In SQL Server 2005, you alter a table, and then you drop one or more variable-length columns at the end of the table definition.
-> You add new variable-length columns that have a similar data type and length of the dropped columns.
-> The database is upgraded from SQL Server 2005 to SQL Server 2014 or SQL Server 2012.
-> You later update the value of another variable-length column, and the updated data has to overflow to a new page.

-> </Copied from https://support.microsoft.com/en-us/help/3120595/fix-column-data-may-be-deleted-when-you-update-another-variable-length-column-in-a-table-of-a-database-upgraded-from-sql-server-2005>

-> Lets follow the below demo to reproduce the issue. Create the below database and objects in SQL server 2005.

use master
go
create database test_dropCol
go

use test_dropCol
go

CREATE TABLE [dbo].test(
       [Id] [int] IDENTITY(1,1) NOT NULL,
       col1 [nvarchar](18) NOT NULL,
col2 [datetime] NOT NULL,
col3 [varchar](max) NULL,
col4 [varchar](max) NULL,
col5 [varchar](max) NULL,
CONSTRAINT[PK_ID]PRIMARY KEYCLUSTERED
(
[Id] ASC
)
) ON [PRIMARY]

— Drop the column
BEGIN TRANSACTION
GO
ALTER TABLE dbo.test
DROP COLUMN col5
GO
COMMIT

— Looking at the leaf_offset
SELECT DISTINCT OBJECT_NAME(sp.[object_id]) AS TableWithDroppedCols
FROM sys.system_internals_partition_columns sipc1 WITH (NOLOCK)
INNER JOIN sys.system_internals_partition_columns sipc2 WITH (NOLOCK)
ON sipc1.[partition_id] = sipc2.[partition_id] AND
sipc1.leaf_offset = sipc2.leaf_offset AND
sipc1.leaf_offset< 0 AND
sipc1.is_dropped < sipc2.is_dropped
INNER JOIN sys.partitions sp WITH (NOLOCK) ON sipc1.[partition_id] = sp.[partition_id];
go
select sipc.partition_column_id, c.name,c.max_length, c.system_type_id,sipc.is_dropped, sipc.leaf_offset
from sys.system_internals_partition_columns sipc left outer join sys.partitions p on sipc.partition_id=p.partition_id
left outer join sys.columns c on c.object_id = p.object_id and c.column_id = sipc.partition_column_id
where p.object_id=object_id(‘test’)
go

 Blog19_1
— Creating another column
BEGIN TRANSACTION
GO

ALTER TABLE dbo.test ADD
col6 varchar(MAX) NULL
GO
COMMIT

— Looking at the leaf_offset again
SELECT DISTINCT OBJECT_NAME(sp.[object_id]) AS TableWithDroppedCols
FROM sys.system_internals_partition_columns sipc1 WITH (NOLOCK)
INNER JOIN sys.system_internals_partition_columns sipc2 WITH (NOLOCK)
ON sipc1.[partition_id] = sipc2.[partition_id] AND
sipc1.leaf_offset = sipc2.leaf_offset AND
sipc1.leaf_offset< 0 AND
sipc1.is_dropped < sipc2.is_dropped
INNER JOIN sys.partitions sp WITH (NOLOCK) ON sipc1.[partition_id] = sp.[partition_id];
go
select sipc.partition_column_id, c.name,c.max_length, c.system_type_id,sipc.is_dropped, sipc.leaf_offset
from sys.system_internals_partition_columns sipc left outer join sys.partitions p on sipc.partition_id=p.partition_id
left outer join sys.columns c on c.object_id = p.object_id and c.column_id = sipc.partition_column_id
where p.object_id=object_id(‘test’)
go
— I could see -4 leaf offset for the newly created column and also the deleted column. The select query listed in “More Information” in KB3120595 returns that table test now.

Blog19_3

-> Execute the below insert (we are also populating the last column which is having same offset as dropped column)


insert into test values(replicate (‘a’, 18),getdate(),replicate (‘a’, 8000),replicate (‘a’, 8000), replicate (‘a’, 8000))
go
select * from test
go

Blog19_2

2) Take backup of this database and restore in SQL server 2012 or SQL server 2014  (on the build that doesn’t have the fix mentioned in 3120595)

3) Now execute the below query to reproduce data loss issue in SQL server 2012 or SQL server 2014– (Note – we are updating column before the new column that has same offset as dropped column)

begin tran
select * from test
go

update test
set col4 = replicate(‘a’, 8000)
go

select * from test
go

Blog19_4

-> <Copied from https://support.microsoft.com/en-us/help/3120595/fix-column-data-may-be-deleted-when-you-update-another-variable-length-column-in-a-table-of-a-database-upgraded-from-sql-server-2005>

Workaround
Important Apply this workaround when you upgrade the database from SQL Server 2005 to SQL Server 2014 or SQL Server 2012 and before you let UPDATE statements be run in the upgraded database.
To work around this issue if you drop columns in a table, make sure that you do either of the following before you update or delete rows:

-> Rebuild the clustered index by using ALTER INDEX (…) REBUILD
-> Rebuild the heap by using ALTER TABLE (…) REBUILD

Resolution
Note This fix only prevents future occurrences of the issue.

Important If you are upgrading the database from SQL Server 2005 to SQL Server 2014 or SQL Server 2012, make sure that you refer to the “Workaround” section.

This issue was first fixed in the following cumulative update for SQL Server:

-> Cumulative Update 13 for SQL Server 2014
-> Cumulative Update 6 for SQL Server 2014 Service Pack 1
-> Cumulative Update 10 for SQL Server 2012 Service Pack 2
-> Cumulative Update 2 for SQL Server 2012 Service Pack 3

-> </Copied from https://support.microsoft.com/en-us/help/3120595/fix-column-data-may-be-deleted-when-you-update-another-variable-length-column-in-a-table-of-a-database-upgraded-from-sql-server-2005>

Thank You,
Vivek Janakiraman

Disclaimer:
The views expressed on this blog are mine alone and do not reflect the views of my company or anyone else. All postings on this blog are provided “AS IS” with no warranties, and confers no rights.

Table Variable or Temp Table

-> Create the below objects. Initially we start with fewer rows.

create database JB_TableVariable
go
use JB_TableVariable
go
create table Table1(
Col1 int primary key identity (1,1)
, Col2 char(2000)
, Col3 int
, Col4 int
)
create table Table2(
Col1 int primary key identity (1,1)
, Col2 char(2000)
, Col3 int
, Col4 int
)
create table Table3(
Col1 int primary key identity (1,1)
, Col2 char(2000)
, Col3 int
, Col4 int
)
set nocount on
go
insert into Table1 values (SUBSTRING(CONVERT(varchar(255), NEWID()), 0, 8),CONVERT(INT,rand()*1000000),1)
GO 500
insert into Table1 values (SUBSTRING(CONVERT(varchar(255), NEWID()), 0, 8),CONVERT(INT,rand()*1000000),0)
GO 500
insert into Table2 values (SUBSTRING(CONVERT(varchar(255), NEWID()), 0, 8),CONVERT(INT,rand()*1000000),1)
GO 500
insert into Table2 values (SUBSTRING(CONVERT(varchar(255), NEWID()), 0, 8),CONVERT(INT,rand()*1000000),0)
GO 500
insert into Table3 values (SUBSTRING(CONVERT(varchar(255), NEWID()), 0, 8),CONVERT(INT,rand()*1000000),1)
GO 500
insert into Table3 values (SUBSTRING(CONVERT(varchar(255), NEWID()), 0, 8),CONVERT(INT,rand()*1000000),0)
GO 500

-> Create the below stored procedures. Stored procedure sp_tableVariable uses table variable and stored procedure sp_temptable uses temporary table.

— Stored procedure using Table variable
create proc sp_tableVariable
WITH RECOMPILE
as
begin
DECLARE @Table4 TABLE(
Col1 int primary key
, Col2 char(2000)
, Col3 int
, Col4 int
)
insert into @Table4
select a.col1, b.col2, a.col3,b.col4
from Table1 a INNER JOIN Table2 b
ON a.col1 = b.col1
select a.col1, b.col2, a.col3
from Table1 a INNER JOIN @Table4 b
ON a.col1 = b.col1
where b.col4 = 1
order by b.col3 desc
end
— Stored procedure using Temporary table
create proc sp_temptable
WITH RECOMPILE
as
begin
create TABLE #table5(
Col1 int primary key
, Col2 char(2000)
, Col3 int
, Col4 int
)
insert into #table5
select a.col1, b.col2, a.col3,b.col4
from Table1 a INNER JOIN Table2 b
ON a.col1 = b.col1
select a.col1, b.col2, a.col3
from Table1 a INNER JOIN #table5 b
ON a.col1 = b.col1
where b.col4 = 1
order by b.col3 desc
drop table #table5
end

-> Executing stored procedure sp_tableVariable that uses table variable. The query completed within 1 second.

blog17_1

-> Executing stored procedure sp_Temptable that uses temporary table. The query completed within 1 second.

Blog17_2.PNG

-> In this case, when we started initially with 500 rows. Stored procedure execution using temporary table and table variable seems same.

-> Now inserting 5000 more rows and updating the statistics.

use JB_TableVariable
go
set nocount on
go

insert into Table1 values (SUBSTRING(CONVERT(varchar(255), NEWID()), 0, 8),CONVERT(INT,rand()*1000000),1)
GO 5000
insert into Table2 values (SUBSTRING(CONVERT(varchar(255), NEWID()), 0, 8),CONVERT(INT,rand()*1000000),1)
GO 5000
insert into Table3 values (SUBSTRING(CONVERT(varchar(255), NEWID()), 0, 8),CONVERT(INT,rand()*1000000),1)
GO 5000

update statistics Table1 with fullscan
update statistics Table2 with fullscan
update statistics Table3 with fullscan

-> Executing stored procedure sp_tableVariable that uses table variable. The query completed in 3 seconds.

Blog17_3.PNG

-> Executing stored procedure sp_Temptable that uses temporary table. The query completed in 1 second.

Blog17_4.PNG

-> Now inserting 50000 more rows and updating the statistics.

use JB_TableVariable
go
set nocount on
go
insert into Table1 values (SUBSTRING(CONVERT(varchar(255), NEWID()), 0, 8),CONVERT(INT,rand()*1000000),1)
GO 50000
insert into Table2 values (SUBSTRING(CONVERT(varchar(255), NEWID()), 0, 8),CONVERT(INT,rand()*1000000),1)
GO 50000
insert into Table3 values (SUBSTRING(CONVERT(varchar(255), NEWID()), 0, 8),CONVERT(INT,rand()*1000000),1)
GO 50000
update statistics Table1 with fullscan
update statistics Table2 with fullscan
update statistics Table3 with fullscan

-> Executing stored procedure sp_tableVariable that uses table variable. The query completed in 1 minute 37 seconds.

Blog17_5.PNG

-> Not a good sign, the query this time took close to 97 seconds. From the execution plan, it is clear that we are spending most of our time in sorting the data and also it is evident that the sort is spilling to tempdb. Let us look at the sort and see why it is spilling to tempdb,

Blog17_5_1.PNG

-> The sort operator advises us that the optimizer estimated 1 row, but actually processed 55500 rows. Due to the bad estimates, sort spilled to tempdb to a greater extent. The warnings section updates us that the optimizer estimated and provided 1024KB granted memory and this made close to 33713 pages to be written to disk (Tempdb) as we did not have sufficient memory for the sort to process 55500 rows in memory.

-> Further we are seeing a nested loop join as the estimates for the clustered index scan on @table4 and subsequent sort operator as 1. The problem with table variables is that when used, the optimizer will always estimate the numbers of rows for table variable as 1 no matter how many millions of rows it contains.

Blog17_5_2.PNG

-> This resulted in the stored procedure performing very bad. Lets look at the other stored procedure that uses temporary table.

-> Executing stored procedure sp_Temptable that uses temporary table. The query completed in 11 seconds.

blog17_6

-> Stored procedure sp_Temptable comparably is quiet faster, the query took close to 11 seconds. From the execution plan, we are spending most of our time in scanning data from temporary table #table5 and table Table1. We are seeing a merge join and then a sort is employed. Lets look at the sort operator. We are seeing a warning symbol in the sort  operator indicating sort spilling to tempdb, but lets see how serious is that,

blog17_6_1

-> The sort operator advises us that the optimizer estimated 55546.3 rows and actually processed 55500 rows. The estimates seems pretty decent. The sort spilled to tempdb. The warnings section updates us that the optimizer estimated and provided 139040KB granted memory and this made close to 13875 pages to be written to disk (Tempdb). Sort spilling to tempdb is definitely an issue and should be resolved to best extent possible. But in our case the spill seems comparably less and the query completed fine. Removing the spill can make this query even faster.

-> We are seeing a merge join as the estimates for the clustered index scan on #table4 and subsequent sort operator are all fine.

Blog17_6_2.PNG

-> This resulted in the stored procedure sp_Temptable performing comparatively faster.

-> So does this tell us that we should always use temporary tables over table variables? No not at all. Each and every query should be tested with different combinations and we should come to a concluson based on the results.

-> Table variables and temporary tables work efficiently when used appropriately. Temporary tables allows you to add indexes after the temporary tables are created along with implicit indexes that are created as part of primary key or foreign keys. But table variables allow only implicit indexes that are created as part of primary key or foreign keys.

-> Table variables and temporary tables have their own disadvantages. You will see metadata contention when using temporary tables, but not with table variables.

Thank You,
Vivek Janakiraman

Disclaimer:
The views expressed on this blog are mine alone and do not reflect the views of my company or anyone else. All postings on this blog are provided “AS IS” with no warranties, and confers no rights.