Sunday, August 23, 2009

Change Data Capture - Part 2

As I mention in my previous post on Change Data Capture in SQL Server 2008, Microsoft provides table valued functions (TVF's) to access changed data. The main TVFs used to query the change data are:

cdc.fn_cdc_get_all_changes_capture_instance
and
cdc.fn_cdc_get_net_changes_capture_instance

Each of these TVF's require two LSN's or log sequence numbers to identify the beginning and ending of the range of changes to be queried. Two functions are provided to aid in determining the min and max LSN available. They are:

sys.fn_cdc_get_min_lsn and sys.fn_cdc_get_max_lsn


Get All Changes

The first TVF, cdc.fn_cdc_get_all_changes_capture_instance
,will return one row per changed row on the source table. This means that if more than one column is modified in a transaction then one row is returned in the result set for the given data modification.

The TVF accepts the following paramters when utilized.

  • Min LSN

  • Max LSN

  • Row Filter option



The min LSN is obtained using the sys.fn_cdc_get_min_lsn function. Using Employees table from the previous post as an example, the min lsn can be obtained as follows

SELECT sys.fn_cdc_get_min_lsn ('dbo_Employees')AS min_lsn;


The max LSN is obtained using the sys.fn_cdc_get_max_lsn function. This function does not require any inputs and returns the max lsn for the captured data of the source table. To get the max LSN from the Employees table:

SELECT sys.fn_cdc_get_max_lsn()AS max_lsn;


The row filter option has two settings, All and All Update Old. The All option will return all changes made to the source table in one row. The All Update Old option will return one row for an update showing the previous value and one row showing the changed value.

Putting it all together to get the changes:

DECLARE @minLSN BINARY(10),@maxLSN BINARY(10)
SELECT @minLSN=sys.fn_cdc_get_min_lsn ('dbo_Employees');
SELECT @maxLSN=sys.fn_cdc_get_max_lsn();
select * from cdc.fn_cdc_get_all_changes_dbo_Employees(@minLSN,@maxLSN,N'All');



Get Net Changes

The second TVF available to work with captured change data is sys.fn_cdc_get_min_lsn. This TVF returns a row that specifies the whole change to a source row if more than one transaction has made modifications to that data. To use this feature the @supports_net_changes parameter must be set to 1 when enabling CDC on a given table.


The input parameters are the same as cdc.fn_cdc_get_all_changes_capture_instance
however the row filter option is slightly different. The options available are All, All With Mask, and All With Merge. All will return all net changes. All With Mask will return the final row plus the operation needed to apply the row. The All With Merge will return the final row with the resturn column _$oprtation being either a 1 ( indicating a delete ) or a 5 ( indicating either an insert or update is needed to apply the change )


Here is an example utilizing the net changes.


DECLARE @minLSN BINARY(10),@maxLSN BINARY(10)
SELECT @minLSN=sys.fn_cdc_get_min_lsn ('dbo_Employees');
SELECT @maxLSN=sys.fn_cdc_get_max_lsn();
select * from cdc.fn_cdc_get_net_changes_dbo_Employees(@minLSN,@maxLSN,N'all');



One last item to mention is that Microsoft provides a function that correlate time to LSN's. This is useful if you have a specific time period in which you would like to capture changes made to data. The function is sys.fn_cdc_map_time_to_lsn and can be used to determine lsn boundries based on time to be used in acquiring the change data.

No comments:

Post a Comment