Finding duplicate records is just a matter of counting how many times a column value occurs. In these examples I’ll be using a table called
documents and will be finding documents that have duplicate
The first step is easy. Simply count how many times a title occurs in the table. This can be done with the following query.
SELECT COUNT(*) AS `count`,`title` FROM `documents` GROUP BY `title`;
To find only titles that occur more than once you might try this query. It seems like a logical first step, and why not we use conditions on columns all the time.
SELECT COUNT(*) AS `count`,`title` FROM `documents` WHERE `count` > 1 GROUP BY `title`;
This query does not work, but why? I can explain this by breaking down sections of the query into 3 parts. The
<context> and the
SELECT <output> FROM <input> WHERE <context> GROUP BY <context>;
<input> is the source data that the query will process (that part is obvious), but at the time the
where condition are executed there is no
<output> generated. So it’s not possible to reference the alias
count from the
where statement. So the
<context> is only relative to the
<input> and at the time of the
<context> there is nothing to
So we could solve this problem by using a subquery that would output the
SELECT * FROM (SELECT COUNT(*) AS `count`,`title` FROM `documents` GROUP BY `title`) AS `d` WHERE `count` > 1;
So the above will work but it’s not efficient, difficult to read and won’t work with views.
MySQL solves this problem by adding an extension called
GROUP BY HAVING that allows you to define a clause that uses the
<output> from the
GROUP BY even when working at the current
<context>. Sounds confusing right? Well it’s easier to understand if you think of
GROUP BY as a micro-tiny subquery and
HAVING as the
WHERE condition for that subquery.
SELECT COUNT(*) AS `count`,`title` FROM `documents` GROUP BY `title` HAVING COUNT(`title`) > 1;
COUNT(title) since it’s a condition based on the
So the above will work. It will list all the documents that have duplicate titles and how many duplicates there are.