Tuesday 19 July 2011

DBCC CHECKDB (Transact-SQL) for Very Large Data Base


CHECKDB Consistency Checking Options for a VLDB (Very Large Data Base) This is a question that comes up a lot - how to run consistency checks on a VLDB
Nowadays hundreds of GBs or 1 TB or more databases are now common on SQL Server 2000 and 2005. Any experienced DBA knows the value of running consistency checks, even when the system is behaving perfectly and the hardware is rock-solid. The two problems that people have with running a full CHECKDB on their VLDB are:
  • It takes a long time to run.
  • It uses lots of resources – memory, CPU, IO bandwidth, tempdb space.
Even with a decent sized maintenance window, the CHECKDB may run over into normal operations. There's also the case of a system that's already pegged in more or more resource dimensions. Whatever the case, there are a number of options:
  • Don't run consistency checks
  • Run CHECKDB using the WITH PHYSICAL_ONLY option
  • Use SQL Server 2005's partitioning feature and devise a consistency checking plan around that
  • Figure out your own scheme to divide up the consistency checking work over several days
  • Offload the consistency checks to a separate system
Use WITH PHYSICAL_ONLY
A full CHECKDB does a lot of stuff - see previous posts in this series for more details. You can vastly reduce the run-time and resource usage of CHECKDB by using the WITH PHYSICAL_ONLY option. With this option, CHECKDB will:
  • Run the equivalent of DBCC CHECKALLOC (i.e. check all the allocation structures)
  • Read and audit every allocated page in the database
So it skips all the logical checks, inter-page checks, and things like DBCC CHECKCATALOG. The fact that all allocated pages are read means that:
  • If page checksums are enabled in SQL Server 2005, any corruptions caused by the IO subsystem will be discovered as the page checksum will be checked as part of reading the page into the buffer pool
So there's a trade-off of consistency checking depth against run-time and resource usage - but this option will pick up problems caused by the IO subsystem as long as page checksums are enabled and present.
Use the SQL Server 2005 partitioning feature
If you're using the partitioning feature in SQL Server 2005 then you're already setup for this. Given that you've hopefully got your partitions stored on separate filegroups, you can use the DBCC CHECKFILEGROUP command.
It makes sense that you don't need to check the read-only filegroups as often as the current month's filegroup so an example consistency checking scheme could be:
  • Run a DBCC CHECKFILEGROUP on each read-only filegroup every week or two
  • Run a DBCC CHECKFILEGROUP on the read-write filegroup every day or two (depending on the stability of the hardware, the criticality of the data, and the frequency and comprehensiveness of your backup strategy).
I know of several companies who've made the decision to move to SQL Server 2005 in part because of this capability to easily divide up the consistency checking.
Beware that until SP2 of SQL Server 2005, DBCC CHECKFILEGROUP would not check a table at all if it was split over multiple filegroups. This is now fixed and DBCC CHECKFILEGROUP will check partitions on the specified filegroup even if the table is now completely contained on the filegroup.
Figure out your own way to partition the checks
If you're on SQL Server 2000, or you just haven't partitioned your database on SQL Server 2005, then there are ways you can split up the consistency checking workload so that it fits within a maintenance window. Here's one scheme that I've recommended to several customers:
  • Figure out your largest tables (by number of pages) and split the total number into 7 buckets, such that there are a roughly equal number of database pages in each bucket.
  • Take all the remaining tables in the database and divide them equally between the 7 buckets (using number of pages again)
  • On Sunday:
    • Run a DBCC CHECKALLOC
    • Run a DBCC CHECKCATALOG
    • Run a DBCC CHECKTABLE on each table in the first bucket
  • On Monday, Tuesday, Wednesday:
    • Run a DBCC CHECKTABLE on each table in the 2nd, 3rd, 4th buckets, respectively
  • On Thursday:
    • Run a DBCC CHECKALLOC
    • Run a DBCC CHECKTABLE on each table in the 5th bucket
  • On Friday and Saturday:
    • Run a DBCC CHECKTABLE on each table in the 6th and 7th buckets, respectively
In pre-RTM builds of SQL Server 2005, DBCC CHECKTABLE could not bind to the critical system tables, just like with T-SQL - but that's fixed so you can cover all system tables in SQL Server 2000 and 2005 using the method above.
There's one drawback to this method - a new internal database snapshot is created each time you start a new DBCC command, even for a DBCC CHECKTABLE. If the update workload on the database is significant, then there could be a lot of transaction log to recover each time the database snapshot is created - leading to a long total run-time Use a separate system
This alternative is relatively simple - restore your backup (you are taking regular backups, right?) on another system and run a full CHECKDB on the restored database. This offloads the consistency checking burden from the production system and also allows you to check that your backups are valid.
  • If the production database is several TB, you need the same several TB on the spare box. This equates to a non-trivial amount of money - initial capital investment plus ongoing storage management costs. (Hopefully a future release will alleviate this – while at Microsoft I invented and patented a mechanism for consistency checking a database in a backup without restoring it.)
  • If the consistency checks find an error, you don't know for sure that the database is corrupt on the production system. The only way to know for sure is to run a consistency check on the production system. This is a small price to pay though, because most of the time the consistency checks on the spare system will be ok, so you know the production database was clean at the time the backup was taken.
Summary
We have many choices to allow you to run consistency checks, so there's really no excuse for not knowing  that something's gone wrong with your database....Cyaa next time...

1 comment: