Combining TOP and GROUP BY with SQL Server 2005

SQL No Comments »

It’s an every day task: Select the ten top-selling widgets from each category. The obvious approach is to combine TOP and GROUP BY, which usually comes with some annoying “Column ‘xxx’ is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause” errors you try to work around. The culprit is a misunderstanding of the GROUP BY clause. It will NOT give you multiple result sets, but rather ONE result set with a single row for each group. From Books Online:

GROUP BY produces a single result set. There is one row for each group containing only the grouping columns and aggregate functions that show the subaggregate for that group. The select list can contain only the grouping columns and aggregate functions.

Here’s how to achieve the desired “TOP X PER GROUP” results using a variety of features introduced with SQL Server 2005.

[Slashdot] [Digg] [Reddit] [del.icio.us] [Facebook] [Technorati] [Google] [StumbleUpon]

Using Regular Expressions in MS SQL Server 2005

SQL No Comments »

Regular Expressions are an extremely powerful tool for parsing and querying textual data; so using them in a database environment looks like a good match. But DBMS vendors have been rather slow building RegEx functionality into their offerings. Oracle has it since 10g and some open source databases also have a head start.

Microsoft obviously put the emphasis on XML/XPath since SQL Server 2000, and regular expression support is not on the horizon, even for SQL Server 2008.

A MSDN article gives you all you need to get you started with regular expressions in SQL Server 2005. Be sure to download the source code which includes a sample project. You have to set the database connection in the project properties and then start debugging. The assembly will then be automatically deployed to the server and you’ll have regular expression support via CLR user-defined functions. You must have the compatibility level set at 90 (SQL Server 2005), so you better be absolutely sure that sp_dbcmptlevel won’t break your stored procedures. Good luck!

[Slashdot] [Digg] [Reddit] [del.icio.us] [Facebook] [Technorati] [Google] [StumbleUpon]

Finding Asymmetrical Duplicates with Transact-SQL

SQL No Comments »

Finding duplicates over all involved columns with Transact-SQL is done with the query

select column1 (, column2…)
from table
group by column1 (, column2…)
having count(*) > 1

This technique is especially important for finding duplicates before inserting into tables with primary keys made up of multiple columns.

But what if the duplicate is only in one column, call it half- or semi-duplicate, incomplete duplicate or asymmetrical duplicate or whatever. For example, I tried to come up with a query that finds hotels with identical names at different destinations. This is what I came up with:

SELECT  DISTINCT HotelName FROM
(SELECT DISTINCT HotelName, Destination FROM HotelTable) tmp
GROUP BY HotelName
HAVING COUNT(*) > 1

This query only works with ONE column that’s different, but you can expand the selection of columns that you expect to be identical (duplicates).

So the abstracted query is:

SELECT  DISTINCT expectedDuplicateColumn (, expectedDuplicateColumn2 …) FROM
(SELECT DISTINCT expectedDuplicateColumn (,expectedDuplicateColumn2 …), nonDuplicateColumn FROM MyTable) tmp
GROUP BY expectedDuplicateColumn, (, expectedDuplicateColumn2 …)
HAVING COUNT(*) > 1

With just one expected duplicate column you can get (of course) the entire rows like this: 

SELECT * FROM HotelTable WHERE HotelName  IN

(

SELECT  DISTINCT HotelName FROM
(SELECT DISTINCT HotelName, Destination FROM HotelTable) tmp
GROUP BY HotelName
HAVING COUNT(*) > 1

)

With multiple identical columns you can use nested subqueries like so:

SELECT * FROM HotelTable WHERE HotelName IN
(
SELECT HotelName FROM
(
SELECT  DISTINCT HotelName,  HotelChain FROM
(SELECT DISTINCT HotelName,  HotelChain, Destination FROM HotelTable) tmp
GROUP BY HotelName, HotelChain
HAVING COUNT(*) > 1
)tmp2
)

This latter approach avoids the use of EXISTS, which can get really messy. I’m sure SQL experts will find a more elegant solution but this might work for you.

[Slashdot] [Digg] [Reddit] [del.icio.us] [Facebook] [Technorati] [Google] [StumbleUpon]

100% View but no Order (at least not in SQL Server 2005)

SQL No Comments »

In SQL Server 2000 you could add ORDER BY to the query defining a view that started with SELECT TOP (100) PERCENT… which is the default query when you create a view with SQL Server Enterprise Manager for SQL Server 2000.

The TOP statement would pretend to limit the resultset (nothing is actually limited when 100 percent are returned) and ORDER BY could be applied. However, things changed with SQL Server 2005. The query optimizer now removes both TOP and ORDER BY clauses if the query is for the TOP 100 PERCENT. From SQL Server Books Online:

When ORDER BY is used in the definition of a view, inline function, derived table, or subquery, the clause is used only to determine the rows returned by the TOP clause. The ORDER BY clause does not guarantee ordered results when these constructs are queried, unless ORDER BY is also specified in the query itself.

This means that each (outer) query issued against the view must contain the ORDER BY statement to guarantee an ordered resultset. There is absolutely no warning in the query designer if you verify the syntax of your view:

image

 

 

 

 

 

Microsoft recently released a hotfix for legacy applications relying on the deprecated behaviour which remains unsupported.

Here’s an in-depth view (no pun intended) of this issue and a workaround/hack (issuing TOP (Very Large Integer)).

[Slashdot] [Digg] [Reddit] [del.icio.us] [Facebook] [Technorati] [Google] [StumbleUpon]
WP Theme & Icons by N.Design Studio
Entries RSS Comments RSS Log in