Ordering Tricks with SQL

A database is like a book with no index--you store information in it, but unless you know how to get it out in a particular order, it's not going to be very useful. Ordering is one of the most often used aspects of SQL, yet one of the most under-used parts of the SQL language at the same time. You can often do some really cool things through simple ordering of your data, and present the data in a different way. This article addresses ordering for database queries in any situation, whether it's in a stored procedure, or in inline SQL code in an ASP, ASP.NET, PHP, ColdFusion, or JSP page.

Note: The code shown has been tested in MS SQL Server, but will work in other databases that support the syntax. In many cases, if your database does not support certain keywords, you might be able to find an similar method of doing the same thing using equivalent functionality. MS Access queries are also shown when possible.

You might be able to improve the accuracy of the ordering, such as in this listing of titles:

'A Tale of Two Cities'
'Dreamweaver MS: The Complete Reference'
'The Andromedia Strain'

Obviously, this listing is ordered alphabetically, but it is all wrong because of the insignificant words ('A', 'The') in the example. I'll address correct ordering of titles in this article.

Another area that is often neglected is the ordering of numbers that are stored as varchar, nvarchar, text, or other textual data type values. Take this listing as an example:

ID,Name
'1','Tom'
'10','Jack'
'100','Steve'
'2','Jim'
'20','Frank'

The numbers are ordered alphabetically, which puts them out of order as numbers. This will be addressed in the article.

Forcing an order will be addressed: you can force data to be ordered based on some pre-defined criteria that you set up. There are many uses for this type of ordering, and I will show you a few examples.

Ordering within a subgroup will be addressed. What do I mean by that? Well, consider this list of data:

'10','1','Tom','Muck'
'20','1','ColdFusion'
'20','1','Flash Remoting'
'10','2','Ray','West'
'20','2','Dreamweaver MX'
'20','2','ASP.NET'
'20','2','SQL Server'
'20','2','Content Management'
'10','3','Massimo','Foti'
'20','3','ColdFusion'
'20','3','JavaScript'
'20','3','Dreamweaver MX'

This set of data does not follow the standard relational data model -- it is transactional data. The rows that begin with a '10' are the key rows and the rows that begin with '20' are the data (or line item rows) for each key row. They are linked by the record numbers in the second column. If we want to order the data by the last name of the '10' row, then by the third field in the '20' row, we might have a hard time coming up with one SQL statement that will do the trick.

Database tools are not going to help you build sorts like this, nor are the query building tools of Dreamweaver MX, CF Studio, or other web development programs. They each require hand-coding of your SQL. After having built the query, however, you can easily create a stored procedure out of it or paste it into Dreamweaver MX when creating a recordset.

Ordering by Titles

The proper way to organize a series of book, article, or movie titles involves alphabetizing by the first significant word, or removing certain common words that may be at the beginning of your title. The English words commonly removed from a sort would be: a, an, the. Other languages may vary, but the concept is the same. If your data is not stored in this way, however, you will have a hard time coming up with a sort order that can be easily browsed by your end user.

Enter SQL.

Using Structured Query Language you can carefully structure the way that the title is sorted and displayed. Some variations on SQL do not allow this type of query (Access, for example) but the majority of RDBMS do. There are three main principles involved with this type of sort:

  1. Using the CASE statement*.
  2. Using string manipulation.
  3. Using an alias for a column.

*CASE is not supported in MS Access. Sometimes you can use IIF instead of CASE.

These are three of the basic constructs in SQL coding that will make the job easy.

I will be using the SQL Server Pubs database as an example, but you can use any table in any database that has a title column. Using the Titles table as an example, I can write the following SQL statement:

SELECT title, price
FROM titles
ORDER BY title

That gives me the following results:

Title Price
But Is It User Friendly? 22.9500
Computer Phobic AND Non-Phobic Individuals: Behavior Variations 21.5900
Cooking with Computers: Surreptitious Balance Sheets 11.9500
Emotional Security: A New Algorithm 7.9900
Fifty Years in Buckingham Palace Kitchens 11.9500
Is Anger the Enemy? 10.9500
Life Without Fear 7.0000
Net Etiquette NULL
Onions, Leeks, and Garlic: Cooking Secrets of the Mediterranean 20.9500
Prolonged Data Deprivation: Four Case Studies 19.9900
Secrets of Silicon Valley 20.0000
Silicon Valley Gastronomic Treats 19.9900
Straight Talk About Computers 19.9900
Sushi, Anyone? 14.9900
The Busy Executive's Database Guide 19.9900
The Gourmet Microwave 2.9900
The Psychology of Computer Cooking NULL
You Can Combat Computer Stress! 2.9900

You can see the the word "The" is in 3 of the titles, which completely nullify the sort order. Using a CASE statement and string manipulation, we can pull the first several characters off the front of the title and do things if the word is "a", "an", or "the". First, the string manipulation. Different databases will have different syntax, but this works in SQL Server. What you want to do is test the first two characters for "a ", the first three characters for "an ", and the first four characters for "the ", and perform a different action for each of these. You'll pull the word off the beginning of the title and put it at the end, using a preceding comma. You'll then set the new "title" field to an alias named "newtitle". Here is the code:

SELECT NewTitle =

CASE Left(Title, 4)
  WHEN 'The '
    Then Right(Title, len(Title)-4) + ', The'
  WHEN 'An _'
    Then Right(Title, len(Title)-2) + ', An'
  WHEN 'A __'
    Then Right(Title, len(Title)-2) + ', A'
ELSE
  Title
END

, price
FROM titles
ORDER BY NewTitle

Basically we are looking at the first 4 characters of the title, which is pulled off the Title column with a LEFT(Title, 4). You can see what follows next. Each variation on insignificant words (the, a, an) is given a separate condition, with the _ character filling in for a single character that can be anything. Finally, the default case is to use the title with no alteration. Lastly, we order by the column alias NewTitle rather than Title. This gives us the results we want:

Title Price
Busy Executive's Database Guide, The
19.9900
But Is It User Friendly?
22.9500
Computer Phobic AND Non-Phobic Individuals: Behavior Variations
21.5900
Cooking with Computers: Surreptitious Balance Sheets
11.9500
Emotional Security: A New Algorithm
7.9900
Fifty Years in Buckingham Palace Kitchens
11.9500
Gourmet Microwave, The
2.9900
Is Anger the Enemy?
10.9500
Life Without Fear
7.0000
Net Etiquette
NULL
Onions, Leeks, and Garlic: Cooking Secrets of the Mediterranean
20.9500
Prolonged Data Deprivation: Four Case Studies
19.9900
Psychology of Computer Cooking, The
NULL
Secrets of Silicon Valley
20.0000
Silicon Valley Gastronomic Treats
19.9900
Straight Talk About Computers
19.9900
Sushi, Anyone?
14.9900
You Can Combat Computer Stress!
2.9900

Picking the first 4 characters allowed us to simplify the CASE statement, rather than picking first 4 for "the ", first 3 for "an ", and first 2 for "a ".

In MS Access you would not be able to code a query easily that accomplished this same thing, because the basic construct of the IIF statement only allows two possible outcomes, whereas CASE gives you an infinite number of options. To create a similar query in MS Access that only checks for the word "The ", the following will work:

SELECT IIF(Left(Title,4) = 'The ',Right(Title, len(Title)-4) + ', The', title)
   as newtitle
, price
FROM dbo_titles
ORDER BY
IIF(Left(Title,4) = 'The ',Right(Title, len(Title)-4) + ', The', title)

Ordering a Text Field as Numeric

Once in a while you will find that you have a database column stored as text when in fact your values are all numbers. If you try to sort the field with a SQL statement like this

SELECT ID, Name from MyTable
ORDER BY ID


you will end up with a sort order like this:

ID Name
1 Tom
10 Jack
100 Steve
2 Jim
20 Frank

To be able to sort the ID field numerically, you need to be able to covert the data into numeric data. Again, databases such as MS Access will not support this on-the-fly conversion of data types, but most modern RDBMS systems do. To sort the ID column numerically rather than alphabetically, you can simply CAST the resulting column in your ORDER BY statement as a number:

SELECT ID, Name from MyTable
ORDER BY CAST(ID as int)

Now your sort looks like this:

ID Name
1 Tom
2 Jim
10 Jack
20 Frank
100 Steve

To implement a workaround in MS Access, you can use SUM on the ORDER BY clause, but you also have to group your query by all fields:

SELECT ID, Name
FROM MyTable
GROUP BY id, Name
ORDER BY sum(ID);

Forcing Numerics to Sort When Combined with Text

What if your data contains some numbers and some text characters? Well, unfortunately in basic SQL you are out of luck because there is no way to extract parts of fields easily. The SQL language does not contain regular expressions. You could create a function that removes the numeric characters and puts them into another field, then sorts numerically, but it is not an easy proposition, given the possible varying lengths of the numeric portion of the number.

If you know that the text portion of the data is the same length, however, the SQL can be easily written. Take a look at this sample data:

id Name user_id
1 Tom user1
3 Jess user10
5 Jim user11
4 Mike user12
2 Frank user2
6 Bruce user20
7 Clint user21
8 Tony user8

You can see the user_id field is part numeric and part text. If you wanted to sort on the text portion, but also keep the numeric portion in numeric order, this would be easy in SQL, because the text portion is 4 characters in every row:

SELECT id, Name, user_id
FROM SampleNumeric
ORDER BY CAST(RIGHT(user_id, len(user_id)-4) as int)

In this query we are simply taking the portion of the field that begins at the 5th character by choosing the RIGHT portion of the field using the length of the field minus 4. The MS Access version of the same query would look like this:

SELECT id, Name, user_id
FROM SampleNumeric
GROUP BY id, name, user_id
ORDER BY SUM(RIGHT(user_id, len(user_id)-4));

What if the text portion was varying in length, however? We can do this in SQL Server and other databases that support PATINDEX and REVERSE (or their equivalents) but not MS Access. The following table contains a user_id field that has varying length text followed by number. The user_id field is in alphabetical order:

id Name user_id
2 Frank alexandria2
6 Bruce alexandria20
7 Clint alexandria21
1 Tom buffalo1
3 Jess buffalo10
5 Jim buffalo11
4 Mike buffalo12
8 Tony buffalo8

Now it becomes tricky without regular expressions. The SQL to pull the numeric portion out of the field is a little trickier, but it can be done. The following SQL will sort the data on the numeric portion of user_id only:

SELECT id, Name, user_id
FROM SampleNumeric2
ORDER BY
CAST(
RIGHT(user_id, PATINDEX('%[0-9][^0-9]%', REVERSE(user_id))
) as int)

This gives the result:

id Name user_id
1 Tom buffalo1
2 Frank alexandria2
8 Tom buffalo8
3 Jess buffalo10
5 Jim buffalo11
4 Mike buffalo12
6 Bruce alexandria20
7 Clint alexandria21

Let's examine the SQL: The CAST() function is just like you've seen in the other examples. We will cast the result of the inner expression as an integer. The PATINDEX and REVERSE functions are doing something strange though:

CAST(
RIGHT(user_id, PATINDEX('%[0-9][^0-9]%', REVERSE(user_id))
) as int)

Look at REVERSE first: this reverses the string in the field so that the numbers are first. Because we don't have regular expressions in SQL, it is difficult to pick the first occurrence of a range of characters. After reversing the characters, the PATINDEX function returns the LAST numeric character it finds. This is made possible by the use of the wildcard % in the first character position to find any character, the range of characters to search for in the next position [0-9] (which finds one character), and the range of characters NOT to search for [^0-9] (one character, once again.)

That is good if you need it sorted ONLY numerically, but what if you want it sorted on the numeric AND text portion? The following SQL will do that:

SELECT id, Name, user_id
FROM SampleNumeric2 /**/
ORDER BY
LEFT(user_id, len(user_id)-patindex('%[0-9][^0-9]%',reverse(user_id))),
CAST(
RIGHT(user_id, patindex('%[0-9][^0-9]%',reverse(user_id))
) as int)

The query will return the results intended:

id Name user_id
2 Frank alexandria2
6 Bruce alexandria20
7 Clint alexandria21
1 Tom buffalo1
8 Tony buffalo8
3 Jess buffalo10
5 Jim buffalo11
4 Mike buffalo12

The query is based on the previous query, but includes the exact opposite functionality in the first ORDER BY clause to return the alphabetic portion of the field:

LEFT(user_id, len(user_id)-patindex('%[0-9][^0-9]%',reverse(user_id)))

The number of numeric characters is subtracted from the length of the field to yield a result of ONLY text characters that we know are in the field.

These types of queries only work if you know what kind of data you can expect in the field, such as text characters followed by numeric characters. Any mixing of characters that you can't predict makes a query like this fail.

Forcing Numeric Characters to the Bottom of a Sort

One thing that you can do easily though is to force the fields that begin with numeric characters to the bottom of your listing, rather than display at the top. If you want to force the rows to the bottom, try this:

SELECT ProductName
FROM Products
ORDER BY ISNUMERIC(left(productname,1))
, productname

Here you are creating a new unnamed column to sort by: the code ISNUMERIC(left(productname, 1)) will return a 1 or a 0 depending on whether the first letter of the field is a number or not. If it is, the return value is 1. If it isn't, the return value is 0. Since you are sorting on that field first, all the 0s will rise to the top. The 1s will fall to the bottom. An MS Access version of this query might look like this:

SELECT ProductName
FROM Products
ORDER BY IIF(left(ProductName,1) < 'a',1,0), ProductName

Forcing a Specific Value to the Top or Bottom

Just as you can force the numeric values to the bottom of your resultset in the previous example, you can force any specific value to the top or bottom of the results in the same way. For example, the Employee table in the Pubs database contains employee names (the fname and lname fields) with employee id numbers (the emp_id field). The IDs have two different formats, if you examine the data. Some of the emp_id fields contain a letter followed by a dash character and several other letters. Some are made up entirely of characters. To order on emp_id, you would use this SQL:

SELECT emp_id, fname, lname
FROM employee
ORDER BY emp_id,lname, fname

You would get these results:

emp_id fname lname
A-C71970F Aria Cruz
A-R89858F Annette Roulet
AMD15433F Ann Devon
ARD36773F Anabela Domingues
CFH28514M Carlos Hernadez
CGS88322F Carine Schmitt
DBT39435M Daniel Tonini
DWR65030M Diego Roel
ENL44273F Elizabeth Lincoln
F-C16315M Francisco Chang
GHT50241M Gary Thomas
H-B39728F Helen Bennett
HAN90777M Helvetius Nagy
HAS54740M Howard Snyder
JYL26161F Janine Labrune
KFJ64308F Karin Josephs
KJJ92907F Karla Jablonski
L-B31947F Lesley Brown
LAL21447M Laurence Lebihan
M-L67958F Maria Larsson
M-P91209M Manuel Pereira
M-R38834F Martine Rance
MAP77183M Miguel Paolino
MAS70474F Margaret Smith
MFS52347M Martin Sommer
MGK44605M Matti Karttunen
MJP25939M Maria Pontes
MMS49649F Mary Saveley
PCM98509F Patricia McKenna
PDI47470M Palle Ibsen
PHF38899M Peter Franken
PMA42628M Paolo Accorti
POK93028M Pirkko Koskitalo
PSA89086M Pedro Afonso
PSP68661F Paula Parente
PTC11962M Philip Cramer
PXH22250M Paul Henriot
R-M53550M Roland Mendel
RBM23061F Rita Muller
SKO22412M Sven Ottlieb
TPO55093M Timothy O'Rourke
VPA30890F Victoria Ashworth
Y-L77953M Yoshi Latimer

Suppose you wanted to sort by emp_id, but you wanted all the emp_id fields with the dash character to magically rise to the top. Use string manipulation and the CASE statement once again:

SELECT emp_id, fname, lname
FROM employee
ORDER BY

CASE WHEN LEFT(emp_id, 2) LIKE '%-' THEN
  0
ELSE
  1
END

, emp_id, lname, fname

The equivalent query in MS Access would be:

SELECT emp_id, fname, lname
FROM Employee
ORDER BY
IIF(left(emp_id,2) LIKE '*-',0,1)
,emp_id, lname, fname;

The Access IIF statement is saying "If the left 2 characters contain any number of characters followed by a - sign, use 0 to sort by, otherwise use 1."

In SQL Server you can also simply use a PATINDEX function :

SELECT emp_id, fname, lname
FROM employee
ORDER BY
patindex('_-%',emp_id) desc
, emp_id, lname, fname

The PATINDEX function returns a 1 or a 0. If the second character is a '-' character, it will return 1. We order the field in DESC order because we want the 1 at the top, and the 0 at the bottom. Notice the emp_id field is still in perfect alphabetical order otherwise. The resulting sort order looks like this:

emp_id fname lname
A-C71970F Aria Cruz
A-R89858F Annette Roulet
F-C16315M Francisco Chang
H-B39728F Helen Bennett
L-B31947F Lesley Brown
M-L67958F Maria Larsson
M-P91209M Manuel Pereira
M-R38834F Martine Rance
R-M53550M Roland Mendel
Y-L77953M Yoshi Latimer
AMD15433F Ann Devon
ARD36773F Anabela Domingues
CFH28514M Carlos Hernadez
CGS88322F Carine Schmitt
DBT39435M Daniel Tonini
DWR65030M Diego Roel
ENL44273F Elizabeth Lincoln
GHT50241M Gary Thomas
HAN90777M Helvetius Nagy
HAS54740M Howard Snyder
JYL26161F Janine Labrune
KFJ64308F Karin Josephs
KJJ92907F Karla Jablonski
LAL21447M Laurence Lebihan
MAP77183M Miguel Paolino
MAS70474F Margaret Smith
MFS52347M Martin Sommer
MGK44605M Matti Karttunen
MJP25939M Maria Pontes
MMS49649F Mary Saveley
PCM98509F Patricia McKenna
PDI47470M Palle Ibsen
PHF38899M Peter Franken
PMA42628M Paolo Accorti
POK93028M Pirkko Koskitalo
PSA89086M Pedro Afonso
PSP68661F Paula Parente
PTC11962M Philip Cramer
PXH22250M Paul Henriot
RBM23061F Rita Muller
SKO22412M Sven Ottlieb
TPO55093M Timothy O'Rourke
VPA30890F Victoria Ashworth

Forcing NULL to the Bottom

Another time you might want to force a column to the bottom is if the column contains a NULL. The easiest way to do this is to use the ISNULL function in SQL Server (other databases have similar functions).

SELECT Col001, Col002, Col003, Col004
FROM testtable
ORDER BY col002
, isnull(col004,'zzzzzzz')
, col001
, col003

Here we are just forcing the column to the bottom with a string of characters that is guaranteed to be last in a sort order.

Another way to do it is like this:

SELECT Col001, Col002, Col003, Col004
FROM testtable
ORDER BY col002
, CASE WHEN col004 IS NULL THEN 1 else 0 end
, col001
, col003

The Microsoft Access approach would be to use the IIF construct, once again:

SELECT Col001, Col002, Col003, Col004
FROM testtable
ORDER BY col002
, IIF(isnull(col004),1,0)
, col001;
, col003;

Ordering within a Subgroup

The last item I want to talk about is perhaps the most complex, as it deals with transactional data. Transactional data does not conform to standard relational database structure. Your field names and field mappings will not describe the data, because you have two different types of rows, identified by a key field. Because the fields do not have the same type of data in each row (and the "20" rows contain one less field), the naming of the fields becomes insignificant. The first column could be named "key" and the second column could be named "recordNumber", but the remaining columns have differing data. For that reason, I'll simply refer to them as col001, col002, col003, and col004. This is how they are named automatically by SQL Server as the data is imported.

Transactional data is typically stored in a text file as an output from another database system, data received from a client for processing by you, or data that has been generated as a report.

'10','1','Tom','Muck'
'20','1','ColdFusion'
'20','1','Flash Remoting'
'10','2','Ray','West'
'20','2','Dreamweaver MX'
'20','2','ASP.NET'
'20','2','SQL Server'
'20','2','Content Management'
'10','3','Massimo','Foti'
'20','3','ColdFusion'
'20','3','JavaScript'
'20','3','Dreamweaver MX'

As you can see, the highlighted fields are the key fields in the data. Data is stored initially in sequential order in a text file, but when imported to SQL, the only way to retrieve the results in any kind of order is by using an ORDER BY clause and ordering by the second column (the record number). However, if you do this, your transactional order will be lost, because there is no line number. We can order on subgroups as long as our original record number (col002) is the column that we want to sort on for the results. Try the following statement:

SELECT Col001, Col002, Col003, Col004
FROM TransactionalTable
ORDER BY col002
, col001
, col003

Your result will look like this:

Col001 Col002 Col003 Col004
10 1 Tom Muck
20 1 ColdFusion  
20 1 Flash Remoting  
10 2 Ray West
20 2 ASP.NET  
20 2 Content Management  
20 2 Dreamweaver MX  
20 2 SQL Server  
10 3 Massimo Foti
20 3 ColdFusion  
20 3 Dreamweaver MX  
20 3 JavaScript  

The third column in the subgroup of the "20" records is now sorted. If you want to sort by some other field, however, such as col004 (the last name in the "10" rows), you would have a hard time trying to use standard SQL syntax. Sorting on last name is easy -- carrying the "20" records along with that name is the hard part.

It turns out the easiest way to do this is to create another field on the fly to act as the last name field. We'll populate the field with the last name in col004 for the "10" records, and populate the new field with that name for the "20" records. Sound easy? It's not, but once you have the code you can use it for any situation where you have to group records. The SQL is as follows:

SELECT t.col001, t.col002, col003, col004
FROM TransactionalTable t
ORDER BY
CASE when col001 = '10' THEN
  col004
ELSE
  (SELECT col004 from TransactionalTable where col002 = t.col002 and col001 = '10')
end
, col001, col003

We created the new field on the fly in the ORDER BY clause using CASE. In the "10" rows, we merely use col004 as is. In the "20" rows, we pull the col004 from the matching record in the "10" row. That fills up the field so that the record remains grouped together in the ORDER BY clause. Then we merely sort on col001, which floats the "10" records to the top of each group, and then col003, which orders the items within the "20" group.

In MS Access you can't create this query in one pass, but you can turn it into two queries: one can be a saved query named Transaction1:

SELECT t.col001, t.col002, col003, col004
, (SELECT col004 from TransactionalTable where col002 = t.col002 and col001 = '10') as DUMMY
FROM TransactionalTable t;

This creates a temporary view of the data with the added column holding the last name field (col004). Then you can run a query against this view and order it the way you want:

SELECT col001, col002, col003, col004
FROM Transaction1
ORDER BY dummy, col001, col003;

Conclusion

Ordering your results can be enhanced with a few simple tricks. Many web programmers are content to learn a few SQL keywords, but there are a lot of useful functions in the SQL language that make it easy to return the results that you need, and in any order that you need them.