Saturday, August 15, 2009

Change Data Capture in SQL Server 2008

Some applications require the ability to understand the full history of how data has been modified over time. A new feature included in SQL Server 2008 is Change Data Capture. Change Data Capture (CDC) allows for the ability to query data that has been modified through INSERT, UPDATE, or DELETE statements on a database table.

The SQL Server 2008 database engine provides the necessary framework to implement Change Data Capture. It is currently available in SQL Server 2008 Enterprise, Developer and Evaluation editions. The CDC process is accomplished through the use of the SQL Server transaction logs. As data modifications are recorded in the database transaction log, the CDC process reads the log files and captures the changes for those tables that have been identified to participate in CDC. It then writes those modifications to the Change Capture Data tables. Each table participating in CDC has a corresponding capture table that follows the naming standard and is owned by the database user cdc to hold the modified data. SQL Server allows for the placement of the table containing the modified data to be placed on a separate filegroup. This is a recommended configuration however the default location for the table will be the default filegroup for the database.

Once a database has been enabled for CDC, two SQL Agent jobs are created to manage the capture process. One job is for the purpose of capturing data on those tables that have been identified for CDC. The other job is created for the purpose of cleaning up the collected change data. Two procedures are provided to view and modify configuration parameters of these jobs. They are named sys.sp_cdc_change_job and sys.sp_cdc_help_jobs. Changes made through these procedures do not go into effect until the job is stopped and restarted.

The following is a simple demo that walks through enabling CDC and querying changes made.

Set Up

Create a test table to execute INSERT, UPDATE, and DELETE.


CREATE TABLE [dbo].[EMPLOYEES](
[EMP_ID] [int] IDENTITY(1,1) NOT NULL,
[EMP_NAME] [varchar](100) NOT NULL,
[EMP_NUMBER] [int] NOT NULL,
[EMP_TITLE] [varchar](50) NOT NULL,
[EMP_HIRE_DATE] [date] NOT NULL,
[EMP_TERM_DATE] [date] NULL,
CONSTRAINT [PK_EMPLOYEES] PRIMARY KEY CLUSTERED
(
[EMP_ID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY])


Create database role for the management of the cdc tables.

CREATE ROLE [ChangeData_Admin] AUTHORIZATION [cdc] 


To utilize CDC, it must be enabled for the database. This is accomplished using the following stored procedure.

EXECUTE sys.sp_cdc_enable_db;


Once CDC has been enabled for the database, the tables you wish to participate in CDC can then be identified through sys.sp_cdc_enable_table.


EXECUTE sys.sp_cdc_enable_table
@source_schema = N'dbo'
, @source_name = N'Employees'
, @role_name = N'ChangeData_Admin';


The cdc agent jobs are created as a result of this procedure


Results
------------
Job 'cdc.TestDB_1_capture' started successfully.
Job 'cdc.TestDB_1_cleanup' started successfully.


Now we are ready to insert data into our table.


INSERT INTO [Test_db1].[dbo].[EMPLOYEES]
([EMP_NAME]
,[EMP_NUMBER]
,[EMP_HIRE_DATE]
,[EMP_TERM_DATE]
,[EMP_TITLE])
VALUES
('Steve Stark'
,1001
,'05/15/2008'
,null
,'Associate Developer')
GO

INSERT INTO [Test_db1].[dbo].[EMPLOYEES]
([EMP_NAME]
,[EMP_NUMBER]
,[EMP_HIRE_DATE]
,[EMP_TERM_DATE]
,[EMP_TITLE])
VALUES
('Mary Doe'
,1002
,'10/24/2003'
,null
,'Database Administrator')
GO


To see the changes we can query the cdc.dbo_employees_CT table. Microsoft does provide table valued functions to work with captured change data. For the sake of this post, I will directly query the capture table to view the results. For more information on how to use the provided table value functions, go here.


SELECT * FROM cdc.dbo_employees_CT


Here you will see two rows for the previous insert statements.

Now update some data:


UPDATE dbo.EMPLOYEES
SET EMP_TITLE = 'Development Director'
WHERE EMP_ID = 1


Again query cdc.dbo_employees_CT to observe the changes. Since we performed an update there will be two rows captured. The first is the old value and the second is the new value. Captured delete data will also only contain one row displaying the value prior to the deletion.


Clean Up


DROP TABLE dbo.EMPLOYEES;
EXECUTE sys.sp_cdc_disable_db;

No comments:

Post a Comment