Unique/Distinctive Selection

From Hashmysql
Jump to: navigation, search

Tutorial: How to select unique or distinctive results

Let's start with a sample table:

 mysql> CREATE TEMPORARY TABLE cars (
     ->   manufacturer VARCHAR(50),
     ->   brand VARCHAR(50),
     ->   model VARCHAR(50),
     ->   trim VARCHAR(50),
     ->   msrp MEDIUMINT UNSIGNED
     -> );
 Query OK, 0 rows affected (0.01 sec)
 
 mysql> INSERT INTO cars ( manufacturer, brand, model, trim, msrp ) VALUES 
     ->   ('General Motors', 'Chevrolet', 'Silverado', '1500', 18000),
     ->   ('General Motors', 'Chevrolet', 'Silverado', '2500', 25000),
     ->   ('General Motors','GMC','Sierra', '1500', 19000),
     ->   ('General Motors', 'Chevrolet','Corvette', 'Z06', 70000), 
     ->   ('Toyota','Toyota','Tundra', 'Limited', 37000),
     ->   ('Toyota','Lexus','IS', '350', 36000 );
 Query OK, 6 rows affected (0.00 sec)
 Records: 6  Duplicates: 0  Warnings: 0
 
 mysql> SELECT manufacturer, brand, model FROM cars;
 +----------------+-----------+-----------+---------+-------+
 | manufacturer   | brand     | model     | trim    | msrp  |
 +----------------+-----------+-----------+---------+-------+
 | General Motors | Chevrolet | Silverado | 1500    | 18000 |
 | General Motors | Chevrolet | Silverado | 2500    | 25000 |
 | General Motors | GMC       | Sierra    | 1500    | 19000 |
 | General Motors | Chevrolet | Corvette  | Z06     | 70000 |
 | Toyota         | Toyota    | Tundra    | Limited | 37000 |
 | Toyota         | Lexus     | IS        | 350     | 36000 |
 +----------------+-----------+-----------+---------+-------+
 6 rows in set (0.00 sec)
 
 mysql>


Using SELECT DISTINCT

So let's say we want all unique vehicles.

 mysql> SELECT DISTINCT * FROM cars;
 +----------------+-----------+-----------+---------+-------+
 | manufacturer   | brand     | model     | trim    | msrp  |
 +----------------+-----------+-----------+---------+-------+
 | General Motors | Chevrolet | Silverado | 1500    | 18000 |
 | General Motors | Chevrolet | Silverado | 2500    | 25000 |
 | General Motors | GMC       | Sierra    | 1500    | 19000 |
 | General Motors | Chevrolet | Corvette  | Z06     | 70000 |
 | Toyota         | Toyota    | Tundra    | Limited | 37000 |
 | Toyota         | Lexus     | IS        | 350     | 36000 |
 +----------------+-----------+-----------+---------+-------+
 6 rows in set (0.00 sec)
 
 mysql>

Wait a minute! Why did the Silverado show up twice? Well, that's because DISTINCT works on a per-row basis, meaning MySQL will return unique rows. You cannot use DISTINCT on a per-column basis. In this case, because Silverado is in there twice (one for the 1500, and one for the 2500), you get two rows back.

The better way is to define the columns you want. Because we didn't really care for trim, we should just not select it.

 mysql> SELECT DISTINCT manufacturer, brand, model FROM cars;
 +----------------+-----------+-----------+
 | manufacturer   | brand     | model     |
 +----------------+-----------+-----------+
 | General Motors | Chevrolet | Silverado |
 | General Motors | GMC       | Sierra    |
 | General Motors | Chevrolet | Corvette  |
 | Toyota         | Toyota    | Tundra    |
 | Toyota         | Lexus     | IS 350    |
 +----------------+-----------+-----------+
 5 rows in set (0.00 sec) 
 
 mysql>

Ah! That works! This is just one more reason *NOT* to use SELECT *.


Using GROUP BY

Using GROUP BY*, you can do the same thing as well.

mysql> SELECT * FROM cars GROUP BY model; 
+----------------+-----------+-----------+---------+-------+
| manufacturer   | brand     | model     | trim    | msrp  |
+----------------+-----------+-----------+---------+-------+
| General Motors | Chevrolet | Corvette  | Z06     | 70000 |
| Toyota         | Lexus     | IS        | 350     | 36000 |
| General Motors | GMC       | Sierra    | 1500    | 19000 |
| General Motors | Chevrolet | Silverado | 1500    | 18000 |
| Toyota         | Toyota    | Tundra    | Limited | 37000 |
+----------------+-----------+-----------+---------+-------+
5 rows in set (0.08 sec)

mysql>
  • Caveat :: Notice that with GROUP BY we got back the '1500' trim. You are not guaranteed any order to the results. For example, the 2500 trim could have just as easily have been returned. The '1500' trim was in fact only returned because it was the first row that MySQL encountered when grouping.

Try this:

mysql> CREATE TEMPORARY TABLE cars2 LIKE cars;
Query OK, 0 rows affected (0.08 sec) 
 
mysql> INSERT INTO cars2 SELECT * FROM cars ORDER BY trim DESC
Query OK, 6 rows affected (0.08 sec)
Records: 6  Duplicates: 0  Warnings: 0

Now we have an additional table, cars2 that is modeled after the original table cars.

mysql> SELECT * FROM cars;
+----------------+-----------+-----------+---------+-------+
| manufacturer   | brand     | model     | trim    | msrp  |
+----------------+-----------+-----------+---------+-------+
| General Motors | Chevrolet | Silverado | 1500    | 18000 |
| General Motors | Chevrolet | Silverado | 2500    | 25000 |
| General Motors | GMC       | Sierra    | 1500    | 19000 |
| General Motors | Chevrolet | Corvette  | Z06     | 70000 |
| Toyota         | Toyota    | Tundra    | Limited | 37000 |
| Toyota         | Lexus     | IS        | 350     | 36000 |
+----------------+-----------+-----------+---------+-------+
6 rows in set (0.00 sec) 

mysql> SELECT * FROM cars2;
+----------------+-----------+-----------+---------+-------+
| manufacturer   | brand     | model     | trim    | msrp  |
+----------------+-----------+-----------+---------+-------+
| General Motors | Chevrolet | Corvette  | Z06     | 70000 |
| Toyota         | Toyota    | Tundra    | Limited | 37000 |
| Toyota         | Lexus     | IS        | 350     | 36000 |
| General Motors | Chevrolet | Silverado | 2500    | 25000 |
| General Motors | Chevrolet | Silverado | 1500    | 18000 |
| General Motors | GMC       | Sierra    | 1500    | 19000 |
+----------------+-----------+-----------+---------+-------+
6 rows in set (0.00 sec)

mysql> 

Notice that the two tables contain the exact same data, but it is organized a little differently, but look at how the GROUP BY query behaves differently on the two tables:

mysql> SELECT * FROM cars GROUP BY model;
+----------------+-----------+-----------+---------+-------+
| manufacturer   | brand     | model     | trim    | msrp  |
+----------------+-----------+-----------+---------+-------+
| General Motors | Chevrolet | Corvette  | Z06     | 70000 |
| Toyota         | Lexus     | IS        | 350     | 36000 |
| General Motors | GMC       | Sierra    | 1500    | 19000 |
| General Motors | Chevrolet | Silverado | 1500    | 18000 |
| Toyota         | Toyota    | Tundra    | Limited | 37000 |
+----------------+-----------+-----------+---------+-------+
5 rows in set (0.00 sec)

mysql> SELECT * FROM cars2 GROUP BY model;
+----------------+-----------+-----------+---------+-------+
| manufacturer   | brand     | model     | trim    | msrp  |
+----------------+-----------+-----------+---------+-------+
| General Motors | Chevrolet | Corvette  | Z06     | 70000 |
| Toyota         | Lexus     | IS        | 350     | 36000 |
| General Motors | GMC       | Sierra    | 1500    | 19000 |
| General Motors | Chevrolet | Silverado | 2500    | 25000 |
| Toyota         | Toyota    | Tundra    | Limited | 37000 |
+----------------+-----------+-----------+---------+-------+
5 rows in set (0.00 sec)

mysql>

On the second table, cars2, the '2500' trim is returned for the 'Silverado', so even when using GROUP BY to query distinct or unique results, you should always consider if it is important what value is returned for the columns that aren't grouped.

Choosing Your Results

So now we have two ways to return unique or distinctive rows within MySQL. The next step is choosing which row or rows we want to view.

To be continued after a short nap.


\* - Yes, I know SELECT * and GROUP BY is bad. But it's here to illustrate a point many people make.