Indexes are one of the most important aspects of database design, yet one of the most underused in web applications. A database index can speed up a query by hundreds or thousands of times. Imagine, for example, if the phone book was not alphabetized and you had to find a name. It would takes hours or days to find a name. This is exactly the kind of inefficiency that a database with no index has to contend with. In a busy database server, a query that takes one second to execute can cause a huge problem on a busy site. This article will cover the basics of indexes and give you a few common situations where you will want to add an index to your tables. Also, at the end of the article I'll show some sample queries showing the execution times for an indexed table vs. an unindexed table. The results may surprise you. The article covers SQL Server, but the concepts apply to any database server that has indexes.

An index is put in place as a way to look up records without having to do a table scan. When the database does a table scan, it essentially starts at the top and looks at each row until it finds the data. If there is an index on the field, the database uses the index to locate the records more quickly without needing to scan the entire table.

There are essentially two distinct types of indexes: clustered and non-clustered. A clustered index corresponds to the physical order of records. Because of that, you can only have one clustered index on a table. A non-clustered index, therefore, is any other index. Obviously a clustered index is more efficient, because the index points directly at the data. A non-clustered index contains pointers to the data so that the index can be physically ordered.

Clustered indexes

A clustered index is an index that corresponds to the physical order of records. For example, if your data is ordered by an integer which is also a primary key, and the data is stored in order, the index is said to be a clustered index. You can only have one clustered index on a table, as your data can only be stored in one physical order. Clustered indexes are generally stored along with the data, making access to the data faster. Here are some rules-of-thumb for clustered indexes:

Non-clustered indexes (assuming you have a clustered index)

A non-clustered index should be put on any other fields that you feel would benefit from an index, but should also not be overused. You would not put an index on every field in your table "just because." Every index on your table will slow down insert, update, and delete statements. Why? Because when you modify or add items to your table, each index also has to be updated with the new information. Each index takes up space, and takes up time when doing these operations. For that reason, the use of indexes should be balanced against the costs of using the index. Also, the use of an index should be weighed on the type of data you are storing. You should use an index if the use of that index is faster than using a table scan.

Fields used in ORDER BY clauses should be indexed. You will get a greater benefit from data that is more greatly varied. For example, if your field contains a value of 1, 2, or 3 and you have 1000 rows of data, it does not make sense to add an index because it might take the same amount of time or longer to do the index lookups and order the field than it would be to do a table scan and order the field. On the other hand, if your field contains mostly unique data with a few duplicate pieces of data (such as in a LastName field), it makes sense to index it because ordering using the index will be quicker than ordering using a table scan.

Fields used in a WHERE clause using = should be indexed. For example, if your statement reads WHERE City = 'Buffalo', the City field might be a good candidate for an index because you are matching it exactly. However, if your City field only contains a few different city names, the field would not be a good candidate for an index.

Similarly, when your WHERE clause has a < or > in it, an index can be helpful. For example, if your statement reads WHERE OrderDate > '07/05/2004', an index on the OrderDate column can speed up the operation.

Fields used in a WHERE clause using LIKE do not make good candidates for indexes. For example, if your statement reads WHERE City LIKE '%BUF%', the index will not be used by the statement, and is therefore only wasting space and impeding performance of inserts, updates, and deletes without providing any benefit. However, if that same field is used in other queries that use =, it still makes sense to index it because the index does not slow down the LIKE operation.

Fields used in JOINS should be indexed. For example, a statement like this has two potential indexes:

SELECT * FROM mytable
INNER JOIN mytable2
ON mytable.City = mytable2.City

In that case, the City fields from both tables would benefit from an index. This is true for any kind of join.

Time Savings

What kind of time savings are we talking about? On a table with very little data (100 rows or less) you might not notice a difference, and may not even be able to time the difference because of the speed of the database. However, as your table grows, indexes become more and more effective over simple table scans. Below is a sample table, with a script to generate a million rows of random data. Using this data you can do some simple timings in the Query Analyzer. First, run the script to create the table and populate it with data. The data population might take about 5-10 minutes.

Tip: For a good article on using the Query Analyzer, check Adrian Senior's article Working with Microsoft's Query Analyser

CREATE TABLE TestTable (
id int IDENTITY (1, 1) NOT NULL ,
test1 varchar (50) NULL,
test2 varchar (50) NULL
)

DECLARE @loop int
SET @loop = 0
WHILE @loop < 1000000

BEGIN
INSERT testtable (test1, test2)
VALUES (left(newid(),5), left(newid(),5))

SET @loop = @loop + 1
END

Next, write a SELECT statement using a value that you know exists in the test1 field, like this:

SELECT * FROM TestTable WHERE test1 = 'C9198'

Execute the query several times within a loop to get a more accurate result, as the results are returned in seconds (the following loop simulates 20 hits to the database using the same query):

DECLARE @loop int
SET @loop = 0

WHILE @loop < 20
BEGIN

SELECT * FROM TestTable WHERE test1 = 'C9198'

SET @loop = @loop + 1
END

When I run the statement, the results are completed in about 15-20 seconds each time, which equates to about a second per query -- quite a long time in a web application that might have several simultaneous users. The next section will show how to add an index to speed up the operation.

Adding an Index

Now, put an index on the test1 field. You can do that within the Enterprise Manager by right-clicking on your table, then click All Tasks > Manage Indexes or from the Query Analyzer using Tools > Manage Indexes. Click New to create a new index, then on the next screen you can give the index a name and choose the field that you want indexed (test1, shown in Figure 1). After that, click OK and the index will be created. It might take a few seconds or a minute for the index to be created.

Adding an index
Figure 1: Adding an index in the Enterprise Manager

Tip: An index can also be added with a script, using the syntax
CREATE INDEX [index name]
ON [table name]([field name])
.

After adding the index, try running the query within the loop again. The query will execute so fast that 20 queries will run in less than a second. In fact, if you up the number in the loop to 200, the query will still only take about a second.

Here is another query to test. Now you know that the test1 field has an index, while the test2 field does not. Each field has similar data in it, however. Run this looped query on the test2 field:

DECLARE @loop int
SET @loop = 0
WHILE @loop < 20
BEGIN

SELECT top 10 *
FROM testtable ORDER BY test2

SET @loop = @loop + 1
END

That simulates 20 people coming to your website and ordering the table by the test2 field, while only displaying the first 10 records in the table. The query takes about 20-25 seconds to run on my machine. Run the same query, substituting field test1 for test2:

DECLARE @loop int
SET @loop = 0
WHILE @loop < 20
BEGIN

SELECT top 10 *
FROM testtable ORDER BY test1

SET @loop = @loop + 1
END

This runs in less than a second now. In fact, once again you can increase the loop to 200 and it will still return results in less than a second. Running the loop 20,000 times still only takes about 9 seconds. The results are not scientific, but by my count the query using the index is executing about 2500 times faster.

Query Analyzer

If you are ever unsure which fields to index, use the Query Analyzer to help. The Index Tuning Wizard, located in the Query menu, will give you recommendations based on a query you want to test. To try this out, write the following query in the Query Analyzer, based on the sample table:

SELECT * FROM TestTable WHERE test2 = '11111'

Now, follow these steps to use the Index Tuning Wizard:

Choose Query > Index Tuning Wizard to open the dialog box. Click Next.

step 1

Choose the correct database, and make sure the box is checked to "Keep all existing indexes". If you have views in your database, the Add Indexed Views box will also be available. We will not be dealing with views, so you can ignore this box. Click Next.

step 2

The next screen allows you to specify workload, which is simply the "work" that the Query Analyzer has to analyze to be able to recommend an index (usually a query). Choose SQL Query Analyzer selection, which is simply the query you wrote in the Query Analyzer. Click Next.

step 3

Choose your table and click Next.

step 4

You'll see a dialog that tells you that the wizard is evalutating the data.

step 5
Finally, a list of recommedations is shown. Existing indexes are shown along with recommended indexes. Also, the improvement in performance is estimated (here showing 99%, which is actually a low estimate).

step 6

Finally, you can choose to apply the changes now, or schedule them for a later date.

step 7

Multiple Field Indexes

An index can be applied to a composite field -- that is, a field made up of more than one physical field. For example, in the phone book analogy you would not index only on LastName, because you might have pages and pages of data for specific last names, such as Smith or Wong. In those cases, it makes sense to index a composite field made up of FirstName, MiddleName, and LastName. This way, the index is more effective because the data is more unique. The general rule of thumb is that the more unique your data is, the better the index will perform. When you add an index to a database table using the visual index tools of SQL Server, you will be able to choose the fields that you want included in the index. To build a multiple field index using SQL code, you would write it like this:

CREATE CLUSTERED INDEX myIndexName
ON myTableName(field1, field2,etc)

Indexes on Views

As of SQL Server 2000, indexes can also be placed on Views. A View, in SQL Server, is simply a stored query that you can run queries against. Although many of the same principles apply to indexing views, there are also many rules that apply to indexing views in SQL Server. Consult SQL Server's Books Online for information about indexing views.

Conclusion

Indexes are a must for a database, no matter what the size. You can get speed improvements of an order of magnitude over your un-indexed tables. Indexes should be tuned to the queries that you are using in your application, but should also not be overused. A properly indexed table will make your application run smoother and more efficiently.