Understanding SQL Server Data Files, Pages & Extents
SQL Server data files, pages & extents are very much related to each other. These are architectural terms which are used to store data physically inside the database. First let’s start with data files then we will understand how data files are related to pages and then to extents.
- Manage Transaction Log File during Data Load
- Why Should You Always Turn Off Database Auto Shrink Property?
- What is the Best Value for Database File Autogrowth?
- SQL Server Architecture Interview Questions & Answers
SQL Server Data Files
When we create any SQL Server database, few operating system files are created and placed in to their respective directories. These files are known as database files. SQL Server databases have three types of files to handle data and log separately.
- Primary data file
- Secondary data files
- Log files
Every database has only one primary data file but none to multiple secondary data files whereas same database has minimum one log file and can have multiple log files depending on the requirements. Log files hold all the log information that is used to recover the database whereas all data files keeps or stores the actual data of the database.
The recommended file extension for primary, secondary data file and log file are .mdf, .ndf and .ldf respectively although it is not mandatory to keep same extension names but it helps you identify the different kinds of files and their use.
Each database file has two names one is logical file name another one is physical file name. The logical file name is the name used to refer to the physical file in all Transact-SQL statements whereas physical file name is the name that appear on the OS directory path.
Pages & Extents
We read in above section that every database has data files to store actual data. Data stores inside these data files in form of Pages so Page is the fundamental unit of data storage in SQL Server but Log files do not store any data, they contain a series of log records so log files do not have any pages.
The disk space allocated to a data file (.mdf or .ndf) in a database is logically divided into pages numbered contiguously from 0 to n. Pages in a SQL Server data file are numbered sequentially, starting with zero (0) for the first page in the file. Each file in a database has a unique file ID number. To uniquely identify a page in a database, both the file ID and the page number are required.
The size of page is 8 KB. Each page begins with 96-byte header as shown in above image that is used to store system information about the page. This information includes the page number, page type, the amount of free space on the page, and the allocation unit ID of the object that owns the page.
Data rows are put on the page serially, starting immediately after the header. A row offset table starts at the end of the page, and each row offset table contains one entry for each row on the page. Each entry records how far the first byte of the row is from the start of the page. The entries in the row offset table are in reverse sequence from the sequence of the rows on the page.
Extents are a collection of eight physically contiguous pages and are used to efficiently manage the pages. All pages are stored in extents.
The first page in each file is a file header page that contains information about the attributes of the file. Several of the other pages at the start of the file also contain system information, such as allocation maps. Below are the list of page types used in the data files of a SQL Server database.
- Data: Data Page stores data rows with all data, except large object data types like text, ntext, image, nvarchar(max), varchar(max), varbinary(max), and xml data, when text in row is set to ON.
- Index: Stores Index data.
- Text/Image: Stores large object data types like text, ntext, image, nvarchar(max), varchar(max), varbinary(max), and xml data
- Global Allocation Map: GAM pages record what extents have been allocated. Each GAM covers 64,000 extents, or almost 4 GB of data. The GAM has one bit for each extent in the interval it covers. If the bit is 1, the extent is free; if the bit is 0, the extent is allocated.
- Shared Global Allocation Map: SGAM pages record which extents are currently being used as mixed extents and also have at least one unused page. Each SGAM covers 64,000 extents, or almost 4 GB of data.
The SGAM has one bit for each extent in the interval it covers. If the bit is 1, the extent is being used as a mixed extent and has a free page. If the bit is 0, the extent is not used as a mixed extent, or it is a mixed extent and all its pages are being used.
- Page Free Space: Page Free Space (PFS) pages record the allocation status of each page, whether an individual page has been allocated, and the amount of free space on each page. The PFS has one byte for each page, recording whether the page is allocated, and if so, whether it is empty, 1 to 50 percent full, 51 to 80 percent full, 81 to 95 percent full, or 96 to 100 percent full.
After an extent has been allocated to an object, the Database Engine uses the PFS pages to record which pages in the extent are allocated or free. This information is used when the Database Engine has to allocate a new page.
- Index Allocation Map: Information about extents used by a table or index per allocation unit. When the SQL Server Database Engine has to insert a new row and no space is available in the current page, it uses the IAM and PFS pages to find a page to allocate, or, for a heap or a Text/Image page, a page with sufficient space to hold the row. The Database Engine uses the IAM pages to find the extents allocated to the allocation unit.
- Bulk Changed Map: Information about extents modified by bulk operations since the last BACKUP LOG statement per allocation unit. This tracks the extents that have been modified by bulk logged operations since the last BACKUP LOG statement.
If the bit for an extent is 1, the extent has been modified by a bulk logged operation after the last BACKUP LOG statement. If the bit is 0, the extent has not been modified by bulk logged operations.
Although BCM pages appear in all databases, they are only relevant when the database is using the bulk-logged recovery model. In this recovery model, when a BACKUP LOG is performed, the backup process scans the BCMs for extents that have been modified. It then includes those extents in the log backup. This lets the bulk logged operations be recovered if the database is restored from a database backup and a sequence of transaction log backups.
BCM pages are not relevant in a database that is using the simple recovery model, because no bulk logged operations are logged. They are not relevant in a database that is using the full recovery model, because that recovery model treats bulk logged operations as fully logged operations.
- Differential Changed Map: Information about extents that have changed since the last BACKUP DATABASE statement per allocation unit. This tracks the extents that have changed since the last BACKUP DATABASE statement.
If the bit for an extent is 1, the extent has been modified since the last BACKUP DATABASE statement. If the bit is 0, the extent has not been modified.
Differential backups read just the DCM pages to determine which extents have been modified. This greatly reduces the number of pages that a differential backup must scan. The length of time that a differential backup runs is proportional to the number of extents modified since the last BACKUP DATABASE statement and not the overall size of the database.
Sequence of Pages in Data file
A PFS page is the first page after the file header page in a data file (page number 1). This is followed by a GAM page (page number 2), and then an SGAM page (page 3). There is a PFS page approximately 8,000 pages in size after the first PFS page. There is another GAM page 64,000 extents after the first GAM page on page 2, and another SGAM page 64,000 extents after the first SGAM page on page 3. The DCM and BCM pages are located behind the GAM and SGAM pages in a physical file and interval between DCM pages and BCM pages is the same as the interval between GAM and SGAM page, 64,000 extents.
I hope this article helps you to understand data files, pages and extents. The source of this article is MSDN Book Online. If you like this article, you can follow us on our facebook page and on Twitter handle to get latest updates.
- SQL Server Update Statistics Best Practices - April 15, 2020
- Understanding Hybrid Buffer Pool in SQL Server - December 30, 2018
- Fix:VIEW SERVER STATE permission was denied on object ‘server’, database ‘master’ - August 30, 2018