Need For Speed: Database Indexes
Indexes are one of the most important speed optimization techniques you can use on your database queries. An index can provide a fast way to retrieve data from a table containing considerable amount of records.
For example, if you create an index on the primary key of a table and then perform a search on one of the primary key values, SQL Server first finds that value in the index, and then uses the index to quickly locate the entire row of data. Without the index, a table scan would have to be performed in order to locate the row, which is almost guaranteed to be slower.
It is a not a silver bullet for fixing all your slow queries though. You will need to understand the structure of your tables and queries when creating an index so that you can gain a speed boost. If a table is not properly indexed and contains too many unnecessary indexes, it may actually slow down other database operations like INSERT, DELETE and UPDATE.
There are two types of indexes, Clustered and Non-Clustered. Of these two, the non-clustered indexes are the ones we use most of the time. But let's start with an explanation about clustered indexes because they are created by the database by default, whereas non-clustered must be created by us.
A clustered index changes the physical order of the records in a table according to the column that was used to create the clustered index. There can be only one clustered index for a table.
When a new table is created and a primary key is assigned to it, SQL Server automatically creates a clustered index using the primary key. So if your primary key is of type Integer, then all your records in the table are ordered by that Integer primary key. Or if the primary key is a column called Fullname and is of type varchar, then the clustered index created on that column will make all the records in that table to be physically stored and sorted by the Fullname of the records. Because the clustered index is rearranging the actual order in which the rows are stored in the database, it is clear why there can only be one.
It's important the column that is chosen for the clustered index to be as unique as possible so that the index is more effective. Good examples are Id columns, Fullname, phone number, email etc.
A bad example is a column that is of type boolean or such that it has only a few possible values.
A non-clustered index is quite different in multiple regards compared to a clustered index. We can have many non-clustered indexes. They don't physically rearrange the records in a table.
Instead, they create a separate structure, a search tree, which holds data about where each row can be found in the actual table. The leaves of the tree are pointers to the data in the actual table.
Here is a graphic that illustrates how the tree that holds the index data may look.
As previously mentioned, there can be multiple non-clustered indexes. That might be a smart idea if we have multiple different queries that are using different columns in the table.
For example, let’s look at the following query and how we can index our Product table which has a few tens of thousands of records.
SELECT * FROM Product
WHERE Price < 20 AND Weight < 55 AND Stock > 2
The primary key for the Product table is the ID column which is also the clustered index and by that column the records are sorted on the disk. However, we are not using the ID column in our query and because of that the clustered index is not of much help.
We will speed up our query if we can also have the records in the table be ordered by Price, Weight and Stock. For that we can create a non-clustered composite index on those three columns which will create a separate search tree in which the records will be sorted by those three columns. That will considerably speed up our query.
What if our query was a bit different and instead of AND clauses, we had OR clauses and looked like this?
SELECT * FROM Product
WHERE Price < 20 OR Weight < 55 OR Stock > 2
In this case, creating a single composite non-clustered index on the three columns will not be of that big help like in the previous case. That is because in the case of a composite index created on multiple columns, only one of them is used as the main one by which the sort is done and the other ones are used in case of ties on the first column. So in reality, in our original composite index, the records are only actually sorted by Price, then partially sorted by Weight and even less partially by Stock value.
For a query like this we will need three separate indexes, one on each OR column in the where clause. That way the database will have three different indexes all with data sorted by a different primary column compared to one composite index on three columns where the data is primarily sorted by one column and the other two are partially sorted.
So to summarize, if we have composite index, it will be only used if the n left-most columns are being queried. In our example, the composite index on (Price, Weight, Stock) might be used in the following cases:
- when we’re searching for Price alone
- when we're searching for Price and Weight
- when we're searching for all three columns
But the composite index will never be used if we are doing queries on:
- only Weight, the second column in the index
- only Stock, the third column in the index
So remember, just because a composite index contains a column that you need to query, it doesn’t mean the database will use that index. You might actually need to break that index into a few smaller ones, each containing one or a few of the columns you are querying.
These are only a few aspects on database indexing that I’ve learned during my work and I’m in no way expert in database administration. Here is a great site I recommend which goes in much more depth about indexing. It has been of much help to me. http://use-the-index-luke.com/