Saturday, September 20, 2008

Filtered Index and Statistics in SQL Server 2008

Filtered Index and Statistics in SQL Server 2008
By : Kasim Wirama, MCDBA

Up to SQL Server 2005, there is no way to create index for only subset rows in a table though some value is needed to query. Imagine that a table has million number of rows and only certain value in a column is queried in filtering using WHERE clause of SELECT query. It needs additional space to keep tracking for those million of rows and rebuild or reorganize those index will take longer to finish. SQL Server 2008 introduces filtered index and statistics that will give you better option to generate index only subset of rows that are needed to satisfy query with certain criteria. The syntax of index declaration in SQL Server 2008 is :
CREATE [UNIQUE][CLUSTERED|NONCLUSTERED] INDEX
ON . (,….)
[INCLUDE ()]
[WHERE ]
Operand here includes >, >=, <, <=, IS NULL, IS NOT NULL, IN, <>, =. BETWEEN currently isn’t supported in SQL Server 2008 filtered index. Some limitation in filtered index is that it doesn’t support computed column, indexed view, UDT, hierarchyid data type, and geometry/geography data type.
To demonstrate use of filtered index, I use AdventureWorks database for SQL Server 2008.
Example :
1. To filter currencyRateID column only for not null value in sales.salesorderheader table, create the following filtered index here :

create index ix_currencyrateid on sales.salesorderheader(currencyrateid)
where currencyrateid is not null

This index will be used by optimizer for this following query :
select * from Sales.SalesOrderHeader as soh
where soh.CurrencyRateID = 6486

Execution plan for the query is :
|--Compute Scalar(DEFINE:([soh].[SalesOrderNumber]=[AdventureWorks].[Sales].[SalesOrderHeader].[SalesOrderNumber] as [soh].[SalesOrderNumber], [soh].[TotalDue]=[AdventureWorks].[Sales].[SalesOrderHeader].[TotalDue] as [soh].[TotalDue]))
|--Nested Loops(Inner Join, OUTER REFERENCES:([soh].[SalesOrderID]))
|--Compute Scalar(DEFINE:([soh].[SalesOrderNumber]=isnull(N'SO'+CONVERT(nvarchar(23),[AdventureWorks].[Sales].[SalesOrderHeader].[SalesOrderID] as [soh].[SalesOrderID],0),N'*** ERROR ***')))
| |--Index Seek(OBJECT:([AdventureWorks].[Sales].[SalesOrderHeader].[ix_currencyrateid] AS [soh]), SEEK:([soh].[CurrencyRateID]=(6486)) ORDERED FORWARD)
|--Compute Scalar(DEFINE:([soh].[TotalDue]=isnull(([AdventureWorks].[Sales].[SalesOrderHeader].[SubTotal] as [soh].[SubTotal]+[AdventureWorks].[Sales].[SalesOrderHeader].[TaxAmt] as [soh].[TaxAmt])+[AdventureWorks].[Sales].[SalesOrderHeader].[Freight] as [soh].[Freight],($0.0000))))
|--Clustered Index Seek(OBJECT:([AdventureWorks].[Sales].[SalesOrderHeader].[PK_SalesOrderHeader_SalesOrderID] AS [soh]), SEEK:([soh].[SalesOrderID]=[AdventureWorks].[Sales].[SalesOrderHeader].[SalesOrderID] as [soh].[SalesOrderID]) LOOKUP ORDERED FORWARD)

Notice the bold section, optimizer uses ix_1 index for query with currencyrateid filter.

2. Filtered index can also covered range, for example a query just query freight cost between 300 and 1000, so this index will satisfy the query condition.

create index ix_300_to_1000 on sales.salesorderheader(freight)
where freight >= 300 and freight <= 1000;

Here is query sample where optimizer will use ix_300_to_1000 index.
select * from Sales.SalesOrderHeader as soh
where soh.Freight between 500 and 700;

Execution plan for above query is :
|--Compute Scalar(DEFINE:([soh].[SalesOrderNumber]=[AdventureWorks].[Sales].[SalesOrderHeader].[SalesOrderNumber] as [soh].[SalesOrderNumber], [soh].[TotalDue]=[AdventureWorks].[Sales].[SalesOrderHeader].[TotalDue] as [soh].[TotalDue]))
|--Nested Loops(Inner Join, OUTER REFERENCES:([soh].[SalesOrderID], [Expr1004]) WITH UNORDERED PREFETCH)
|--Compute Scalar(DEFINE:([soh].[SalesOrderNumber]=isnull(N'SO'+CONVERT(nvarchar(23),[AdventureWorks].[Sales].[SalesOrderHeader].[SalesOrderID] as [soh].[SalesOrderID],0),N'*** ERROR ***')))
| |--Index Seek(OBJECT:([AdventureWorks].[Sales].[SalesOrderHeader].[ix_300_to_1000] AS [soh]), SEEK:([soh].[Freight] >= ($500.0000) AND [soh].[Freight] <= ($700.0000)) ORDERED FORWARD)
|--Compute Scalar(DEFINE:([soh].[TotalDue]=isnull(([AdventureWorks].[Sales].[SalesOrderHeader].[SubTotal] as [soh].[SubTotal]+[AdventureWorks].[Sales].[SalesOrderHeader].[TaxAmt] as [soh].[TaxAmt])+[AdventureWorks].[Sales].[SalesOrderHeader].[Freight] as [soh].[Freight],($0.0000))))
|--Clustered Index Seek(OBJECT:([AdventureWorks].[Sales].[SalesOrderHeader].[PK_SalesOrderHeader_SalesOrderID] AS [soh]), SEEK:([soh].[SalesOrderID]=[AdventureWorks].[Sales].[SalesOrderHeader].[SalesOrderID] as [soh].[SalesOrderID]) LOOKUP ORDERED FORWARD)

Notice the bold section, it seems that optimizer uses filtered index (ix_300_to_1000).

Not only SQL Server 2008 could create filtered index, but also similar syntax applies to statistics. Here is the syntax for filtered statistics :
CREATE STATISTICS
ON .
(,….)
[WHERE

Extended Event, Unified Way for Performance Tracing

Extended Event, Unified Way for Performance Tracing
By : Kasim Wirama, MCDBA

SQL Server has provided multiple tools for tracing performance related data, such as profiler, system monitor, Windows event log file and SQL Server log file. Each tools will collect different data (and sometimes overlapping data) with their own characteristics. There are issues regarding to multiple tools here :
1. Each tools has different format, data collection, data analysis. So you need to use that tools to analyze different data characteristics and format.
2. It is difficult to grab correct information because of difficulty to see big picture of database performance for the existing tool doesn’t integrate data from multiple source such as windows operating system and application that accessing database. As we are aware that SQL Server is not separate entity from operating system and application so that performance issue might not solely come from database.
To address this issue, SQL Server 2008 introduces extended event as part of database engine with generic event handling capability.
Extended events offers same functionality as Profiler has; but with fewer resource overhead. The minus point is that there is no GUI for extended event currently. So it is my wish list that Microsoft will create GUI for extended event and service broker.

Saturday, September 13, 2008

Exploring Change Data Capture (SQL Server 2008)

Exploring Change Data Capture (SQL Server 2008)
By : Kasim Wirama, MCDBA

This posting, I would like to share about CDC in SQL Server 2008. Before SQL Server 2008 version, if you need information about record changes in a table, you need to create custom solution such as writing trigger, or use timestamp column, etc. These solutions will negate your SQL Server performance on OLTP database. CDC feature comes for this requirement and it will give little overhead to SQL Server performance on OLTP database.
I would show you how I use CDC feature and hopefully you will get insight after going through my walk through and realize the power and flexibility of CDC.
First I create database for exploring CDC.
Create database cdc_db
Go
Use cdc_db
Go
Next step is to enable cdc on cdc_db database

exec sys.sp_cdc_enable_db
Sometimes when restore a database it gives error :
Msg 22830, Level 16, State 1, Procedure sp_cdc_enable_db_internal, Line 186
Could not update the metadata that indicates database cdc_db is enabled for Change Data Capture. The failure occurred when executing the command 'SetCDCTracked(Value = 1)'. The error returned was 15517: 'Cannot execute as the database principal because the principal "dbo" does not exist, this type of principal cannot be impersonated, or you do not have permission.'. Use the action and error to determine the cause of the failure and resubmit the request.
Msg 266, Level 16, State 2, Procedure sp_cdc_enable_db_internal, Line 0
Transaction count after EXECUTE indicates a mismatching number of BEGIN and COMMIT statements. Previous count = 0, current count = 1.
Msg 266, Level 16, State 2, Procedure sp_cdc_enable_db, Line 0
Transaction count after EXECUTE indicates a mismatching number of BEGIN and COMMIT statements. Previous count = 0, current count = 1.
Msg 3998, Level 16, State 1, Line 1
Uncommittable transaction is detected at the end of the batch. The transaction is rolled back.

It is because mapping between login at SQL server level and user at database level has broken, so issue this commad should resolve this broken mapping :
exec sp_changedbowner 'sa','dbo'
now create these table here :
create table testno
(
col1 int not null,
col2 varchar(20) not null,
col3 varchar(30) not null
);
Go

create table testyes
(
col1 int not null,
col2 varchar(20) not null,
col3 varchar(30) not null,
constraint pk_test primary key clustered (col1)
);
Go
Testno represents a table without primary key, and tableyes represents a table with primary key

insert into testno values (1,'satu', 'one');
insert into testyes values (1,'satu', 'one');
next, I enable cdc on testno and testyes table
exec sys.sp_cdc_enable_table @source_schema = 'dbo',
@source_name = 'testno', @role_name = null;

exec sys.sp_cdc_enable_table @source_schema = 'dbo',
@source_name = 'testyes', @role_name = null;
Go to SSMS (SQL Server Management Studio), expand cdc_db database folder, Programmability folder, Functions folder, table-valued functions, there are 3 table valued functions created :
1. fn_cdc_get_all_changes_dbo_testno
2. fn_cdc_get_all_changes_dbo_testyes
3. fn_cdc_get_net_changes_dbo_testyes
Notice that these functions has prefix fn_cdc_get_[all|net]_changes_ where capture instance are tracked tables. “All” gives information all changes for each row; and “net” gives information last changes for each row.
From my observation, it seems that a table with no primary key has only “all” cdc function created while a table with primary key has either “all” and “net” cdc function. In most situation net changes cdc function gives equal or smaller resultset than all changes cdc function so it is beneficial to have a table with primary key applied on it.
After enabling cdc on the two tables, I apply some DML statements on them :
insert into dbo.testno values (2,'dua','two');
insert into dbo.testno values (3,'tiga','three');
delete from dbo.testno where col1 = 1;
update dbo.testno set col2 = 'dua1' where col1 = 2;

insert into dbo.testyes values (2,'dua','two');
insert into dbo.testyes values (3,'tiga','three');
delete from dbo.testyes where col1 = 1;
update dbo.testyes set col2 = 'dua1' where col1 = 2;
now I would like to query changes on testno table, by running cdc script below:
declare @begin binary(10), @end binary(10);
set @begin = sys.fn_cdc_get_min_lsn('dbo_testno');
set @end = sys.fn_cdc_get_max_lsn();
select * from cdc.fn_cdc_get_all_changes_dbo_testno(@begin,@end,'all update old');
go

it gives resultset here :
__$start_lsn __$seqval __$operation __$update_mask col1 col2 col3
0x000000 0x0000003500000BE40002 2 0x07 2 dua two
0x000000 0x0000003500000BE50002 2 0x07 3 tiga three
0x000000 0x0000003500000BE60002 1 0x07 1 satu one
0x000000 0x0000003500000BE70002 3 0x02 2 dua two
0x000000 0x0000003500000BE70002 4 0x02 2 dua1 two

Notice that for insert operation, __$operation = 2. For delete operation, __$operation = 1. For update operation, __$operation = 3 to old value and __$operation = 4 to new value.


Issue this cdc script below :

declare @begin binary(10), @end binary(10);
set @begin = sys.fn_cdc_get_min_lsn('dbo_testno');
set @end = sys.fn_cdc_get_max_lsn();
select * from cdc.fn_cdc_get_all_changes_dbo_testno(@begin,@end,'all');
go

it gives resultset here :

__$start_lsn __$seqval __$operation __$update_mask col1 col2 col3
0x000000 0x0000003500000BE40002 2 0x07 2 dua two
0x000000 0x0000003500000BE50002 2 0x07 3 tiga three
0x000000 0x0000003500000BE60002 1 0x07 1 satu one
0x000000 0x0000003500000BE70002 4 0x02 2 dua1 two

It seems that with ‘all’ value of third parameter of the cdc function above eliminates __$operation = 3.

The same pattern applies to testyes table.

Now I turn my attention to cdc.fn_cdc_get_net_changes_dbo_testyes cdc function
The third parameter value of cdc.fn_cdc_get_net_changes cdc function is “ALL”, “ALL WITH MASK”, “ALL WITH MERGE”

Run this first script below
declare @begin binary(10), @end binary(10);
set @begin = sys.fn_cdc_get_min_lsn('dbo_testyes');
set @end = sys.fn_cdc_get_max_lsn();
select * from cdc.fn_cdc_get_net_changes_dbo_testyes(@begin,@end,'all');
go

the resultset is :

__$start_lsn __$operation __$update_mask col1 col2 col3
0x0000003500000BFA0003 2 NULL 3 tiga three
0x0000003500000BFB0005 1 NULL 1 satu one
0x0000003500000BFC0004 2 NULL 2 dua1 two

Option “All” will give end changes delete and insert only operation (__$operation), so update will be viewed as insert operation.

Run this second script below
declare @begin binary(10), @end binary(10);
set @begin = sys.fn_cdc_get_min_lsn('dbo_testyes');
set @end = sys.fn_cdc_get_max_lsn();
select * from cdc.fn_cdc_get_net_changes_dbo_testyes(@begin,@end,'all with mask');
go

the resultset is :
__$start_lsn __$operation __$update_mask col1 col2 col3
0x0000003500000BFB0005 1 NULL 1 satu one
0x0000003500000BFC0004 2 NULL 2 dua1 two
0x0000003500000BFA0003 2 NULL 3 tiga three

If you read at BOL, it says that for UPDATE (operation = 4) operation, “all with mask” option will gives value in __$update_mask column for affected column(s). But it is not clear on this sample.


Run this third script below
declare @begin binary(10), @end binary(10);
set @begin = sys.fn_cdc_get_min_lsn('dbo_testyes');
set @end = sys.fn_cdc_get_max_lsn();
select * from cdc.fn_cdc_get_net_changes_dbo_testyes(@begin,@end,'all with merge');
go

the resultset is :
__$start_lsn __$operation __$update_mask col1 col2 col3
0x0000003500000BFA0003 5 NULL 3 tiga three
0x0000003500000BFB0005 1 NULL 1 satu one
0x0000003500000BFC0004 5 NULL 2 dua1 two

For “all with merge” operation, __$operation column will be 5 for insert and update operation, 5 here means that the operation will be insert/update on destination table to make sure the corresponding record on source will synchronize on destination.

Currently, CDC feature could be found only at SQL Server 2008 Enterprise Edition