SQL Server data structures
Pages
In SQL Server, data is organized in pages. A page has a fixed size (8 KB). Each page contains records. The number of records that can be stored in a page depends on the size of the records. The operation of reading data from a page is called a logical IO. The smaller the size of a record is, the more records can be read with the same number of logical IOs.
A page can be identified by a 6 bytes page pointer: 2 bytes for the file ID and 4 bytes for the page number. A row in a page can be identified by an 8 bytes row ID: 2 bytes for the file ID, 4 bytes for the page number and 2 bytes for the row number.
There are two types of non system pages: data pages, composed of data rows, and index pages, composed of index rows.
Structure of a table: heaps and clustered tables
There are only two types of tables in SQL Server: heaps and clustered tables.
Heaps
Heaps are tables that have no clustered index. They are entirely composed of data pages. The data rows are not stored in any particular order, and there is no particular order to the sequence of the data pages. The data pages are not linked in a linked list.
A heap.
Clustered tables
A clustered table is a table whose rows are kept in sorted order. The clustered index is the structure that stores and maintains the rows of a clustered table in sorted order. The rows are ordered using the clustering key, which is defined by one or more columns of the table. Like in a heap, the data rows are stored in data pages, and the clustered index additionally uses index pages to navigate in the data.
SQL Server organizes indexes as trees, with one page at the root level, multiple pages at the leaf level, and zero or more levels in between.
Leaf level:
The leaf level is the data of the table. It is composed of data pages. They store all the columns of the table for every row of the table.
Non-leaf levels:
The non-leaf levels are composed of index pages.
At every non-leaf level, each index row corresponds to a page in the next level. Each index row in every index page contains two things: the index key value, which is the first key value of the corresponding page in the next level of the index, and a 6 bytes page pointer to that corresponding page.
At each level (including leaf level), the pages have a pointer to the next and the previous page in the same level, so that each level forms a doubly-linked list. At each level, the pages and the rows within a page are ordered using the clustering key.
A clustered index.
Blue pages are index pages and green pages are data pages.
Nonclustered indexes
Nonclustered indexes are always associated with a table (it can be either a heap of a clustered table), but they have a structure completely separate from the data rows. Nonclustered indexes only contain a partial copy of the data of the associated table, sorted in a different order than the actual data. The rows are ordered using the nonclustered key, which is defined by one or more columns of the associated table. There can be multiple nonclustered indexes per table.
Nonclustered indexes have the same B-tree structure as clustered indexes, however the leaf level is different.
Leaf level:
In a nonclustered index, the leaf level is composed of index pages instead of data pages. Each index row in the leaf level of a nonclustered index contains a nonclustered key value along with a row locator that points to the corresponding data row in the actual table structure (heap or clustered table).
If the associated table is a heap, the row locator is an 8 bytes row ID. If the table is a clustered table, the row locator is the clustering key identifying the row.
Non-leaf levels:
The non-leaf levels of a nonclustered index are used for navigation identically as in a clustered index.
At each level of the nonclustered index, the pages and the rows within a page are ordered using the nonclustered key.
A nonclustered index for the clustered table above.
A nonclustered index for the heap above.
Covered columns
The columns stored at the leaf of an index are called the columns covered by this index.
A clustered index always covers all the columns.
A nonclustered index covers all the columns of its non clustering key. If it is a nonclustered index on a clustered table, it also covers the columns of the clustering key of the table, since they constitute the row locators.
Clustered indexes are created by default to match the primary key, which is defined on tables in SQL Server. However, you can create a clustered index on any column and then define a primary key on a separate column or columns. At this point, the primary key would be created as a unique non-clustered index. Typically, a clustered index will match the primary key, but not necessarily, so be careful. Given the variety of situations that can arise, I'll be discussing the clustered indexes themselves, and for now ignore whether you choose to make them primary keys.
different index configurations a table can have?
- No indexes
- A clustered index
- A clustered index and many nonclustered indexes
- A nonclustered index
- Many nonclustered indexes
CREATE [UNIQUE] [ CLUSTERED | NONCLUSTERED ]
INDEX index_name
ON table (column [ASC
DESC] [ ,...n ] )
[WITH {IGNORE_DUP_KEY
DROP_EXISTING
SORT_IN_TEMPDB}]
[ON filegroup ]
CREATE TABLE dummytable1 (
empid INT,
empname VARCHAR(700))
INSERT INTO dummytable1
VALUES (4,
Replicate('d',700))
GO
INSERT INTO dummytable1
VALUES (6,
Replicate('f',700))
GO
INSERT INTO dummytable1
VALUES (1,
Replicate('a',700))
GO
INSERT INTO dummytable1
VALUES (3,
Replicate('c',700))
GO
SELECT empid
FROM dummytable1
GO
CREATE CLUSTERED INDEX ix_dummytable ON dbo.dummytable1 (
empid)
SELECT empid
FROM dummytable1
GO
CREATE NONCLUSTERED INDEX ix_dummytable1 ON dbo.dummytable1 (
empname)
GO
SELECT *
FROM dummytable1
WHERE empname LIKE '%a%'
AND empid = 1
GO
The Clustered/Non Clustered index can be enabled disabled
primary key creates a clustered index on the column, where as unique key creates a nonclustered index by default.