Saturday, September 13, 2008

Exploring Change Data Capture (SQL Server 2008)

Exploring Change Data Capture (SQL Server 2008)
By : Kasim Wirama, MCDBA

This posting, I would like to share about CDC in SQL Server 2008. Before SQL Server 2008 version, if you need information about record changes in a table, you need to create custom solution such as writing trigger, or use timestamp column, etc. These solutions will negate your SQL Server performance on OLTP database. CDC feature comes for this requirement and it will give little overhead to SQL Server performance on OLTP database.
I would show you how I use CDC feature and hopefully you will get insight after going through my walk through and realize the power and flexibility of CDC.
First I create database for exploring CDC.
Create database cdc_db
Go
Use cdc_db
Go
Next step is to enable cdc on cdc_db database

exec sys.sp_cdc_enable_db
Sometimes when restore a database it gives error :
Msg 22830, Level 16, State 1, Procedure sp_cdc_enable_db_internal, Line 186
Could not update the metadata that indicates database cdc_db is enabled for Change Data Capture. The failure occurred when executing the command 'SetCDCTracked(Value = 1)'. The error returned was 15517: 'Cannot execute as the database principal because the principal "dbo" does not exist, this type of principal cannot be impersonated, or you do not have permission.'. Use the action and error to determine the cause of the failure and resubmit the request.
Msg 266, Level 16, State 2, Procedure sp_cdc_enable_db_internal, Line 0
Transaction count after EXECUTE indicates a mismatching number of BEGIN and COMMIT statements. Previous count = 0, current count = 1.
Msg 266, Level 16, State 2, Procedure sp_cdc_enable_db, Line 0
Transaction count after EXECUTE indicates a mismatching number of BEGIN and COMMIT statements. Previous count = 0, current count = 1.
Msg 3998, Level 16, State 1, Line 1
Uncommittable transaction is detected at the end of the batch. The transaction is rolled back.

It is because mapping between login at SQL server level and user at database level has broken, so issue this commad should resolve this broken mapping :
exec sp_changedbowner 'sa','dbo'
now create these table here :
create table testno
(
col1 int not null,
col2 varchar(20) not null,
col3 varchar(30) not null
);
Go

create table testyes
(
col1 int not null,
col2 varchar(20) not null,
col3 varchar(30) not null,
constraint pk_test primary key clustered (col1)
);
Go
Testno represents a table without primary key, and tableyes represents a table with primary key

insert into testno values (1,'satu', 'one');
insert into testyes values (1,'satu', 'one');
next, I enable cdc on testno and testyes table
exec sys.sp_cdc_enable_table @source_schema = 'dbo',
@source_name = 'testno', @role_name = null;

exec sys.sp_cdc_enable_table @source_schema = 'dbo',
@source_name = 'testyes', @role_name = null;
Go to SSMS (SQL Server Management Studio), expand cdc_db database folder, Programmability folder, Functions folder, table-valued functions, there are 3 table valued functions created :
1. fn_cdc_get_all_changes_dbo_testno
2. fn_cdc_get_all_changes_dbo_testyes
3. fn_cdc_get_net_changes_dbo_testyes
Notice that these functions has prefix fn_cdc_get_[all|net]_changes_ where capture instance are tracked tables. “All” gives information all changes for each row; and “net” gives information last changes for each row.
From my observation, it seems that a table with no primary key has only “all” cdc function created while a table with primary key has either “all” and “net” cdc function. In most situation net changes cdc function gives equal or smaller resultset than all changes cdc function so it is beneficial to have a table with primary key applied on it.
After enabling cdc on the two tables, I apply some DML statements on them :
insert into dbo.testno values (2,'dua','two');
insert into dbo.testno values (3,'tiga','three');
delete from dbo.testno where col1 = 1;
update dbo.testno set col2 = 'dua1' where col1 = 2;

insert into dbo.testyes values (2,'dua','two');
insert into dbo.testyes values (3,'tiga','three');
delete from dbo.testyes where col1 = 1;
update dbo.testyes set col2 = 'dua1' where col1 = 2;
now I would like to query changes on testno table, by running cdc script below:
declare @begin binary(10), @end binary(10);
set @begin = sys.fn_cdc_get_min_lsn('dbo_testno');
set @end = sys.fn_cdc_get_max_lsn();
select * from cdc.fn_cdc_get_all_changes_dbo_testno(@begin,@end,'all update old');
go

it gives resultset here :
__$start_lsn __$seqval __$operation __$update_mask col1 col2 col3
0x000000 0x0000003500000BE40002 2 0x07 2 dua two
0x000000 0x0000003500000BE50002 2 0x07 3 tiga three
0x000000 0x0000003500000BE60002 1 0x07 1 satu one
0x000000 0x0000003500000BE70002 3 0x02 2 dua two
0x000000 0x0000003500000BE70002 4 0x02 2 dua1 two

Notice that for insert operation, __$operation = 2. For delete operation, __$operation = 1. For update operation, __$operation = 3 to old value and __$operation = 4 to new value.


Issue this cdc script below :

declare @begin binary(10), @end binary(10);
set @begin = sys.fn_cdc_get_min_lsn('dbo_testno');
set @end = sys.fn_cdc_get_max_lsn();
select * from cdc.fn_cdc_get_all_changes_dbo_testno(@begin,@end,'all');
go

it gives resultset here :

__$start_lsn __$seqval __$operation __$update_mask col1 col2 col3
0x000000 0x0000003500000BE40002 2 0x07 2 dua two
0x000000 0x0000003500000BE50002 2 0x07 3 tiga three
0x000000 0x0000003500000BE60002 1 0x07 1 satu one
0x000000 0x0000003500000BE70002 4 0x02 2 dua1 two

It seems that with ‘all’ value of third parameter of the cdc function above eliminates __$operation = 3.

The same pattern applies to testyes table.

Now I turn my attention to cdc.fn_cdc_get_net_changes_dbo_testyes cdc function
The third parameter value of cdc.fn_cdc_get_net_changes cdc function is “ALL”, “ALL WITH MASK”, “ALL WITH MERGE”

Run this first script below
declare @begin binary(10), @end binary(10);
set @begin = sys.fn_cdc_get_min_lsn('dbo_testyes');
set @end = sys.fn_cdc_get_max_lsn();
select * from cdc.fn_cdc_get_net_changes_dbo_testyes(@begin,@end,'all');
go

the resultset is :

__$start_lsn __$operation __$update_mask col1 col2 col3
0x0000003500000BFA0003 2 NULL 3 tiga three
0x0000003500000BFB0005 1 NULL 1 satu one
0x0000003500000BFC0004 2 NULL 2 dua1 two

Option “All” will give end changes delete and insert only operation (__$operation), so update will be viewed as insert operation.

Run this second script below
declare @begin binary(10), @end binary(10);
set @begin = sys.fn_cdc_get_min_lsn('dbo_testyes');
set @end = sys.fn_cdc_get_max_lsn();
select * from cdc.fn_cdc_get_net_changes_dbo_testyes(@begin,@end,'all with mask');
go

the resultset is :
__$start_lsn __$operation __$update_mask col1 col2 col3
0x0000003500000BFB0005 1 NULL 1 satu one
0x0000003500000BFC0004 2 NULL 2 dua1 two
0x0000003500000BFA0003 2 NULL 3 tiga three

If you read at BOL, it says that for UPDATE (operation = 4) operation, “all with mask” option will gives value in __$update_mask column for affected column(s). But it is not clear on this sample.


Run this third script below
declare @begin binary(10), @end binary(10);
set @begin = sys.fn_cdc_get_min_lsn('dbo_testyes');
set @end = sys.fn_cdc_get_max_lsn();
select * from cdc.fn_cdc_get_net_changes_dbo_testyes(@begin,@end,'all with merge');
go

the resultset is :
__$start_lsn __$operation __$update_mask col1 col2 col3
0x0000003500000BFA0003 5 NULL 3 tiga three
0x0000003500000BFB0005 1 NULL 1 satu one
0x0000003500000BFC0004 5 NULL 2 dua1 two

For “all with merge” operation, __$operation column will be 5 for insert and update operation, 5 here means that the operation will be insert/update on destination table to make sure the corresponding record on source will synchronize on destination.

Currently, CDC feature could be found only at SQL Server 2008 Enterprise Edition

1 comment:

FredM said...

Go to SSMS (SQL Server Management Studio), expand cdc_db database folder,...

I set up everything correctly in SQL Server 2008 SP1, but I don't see the cdc_db folder. I used the templates to confirm that the one database and one table are enabled for CDC, but I don't see the cdc_cb database folder. Could you be more specific as to where to look for it?