This tutorial will explain how to delete duplicate records in a MySQL table while picking which record will be used as the single record.
Since MySQL 5.1 there has been a feature for
IGNORE duplicate records when creating a unique index, but there are a couple of problems with using this extension.
- It doesn’t allow you to choose which records will stay and which will be deleted.
- It’s being removed from MySQL after version 5.5.4
- It requires an unique index to remove duplicate records.
- It doesn’t work with some
In the following examples I’ll be referring to a table called
documents. This table contains a
title column that has to change from duplicates to unique values only, and in the process of doing this we want to keep the
documents records with the most
Here is an example of how to use the
IGNORE extension to delete records from
documents for records that have duplicate
ALTER IGNORE TABLE `documents` ADD UNIQUE INDEX `title`;
The problem here is that we can’t control which
documents records were removed. So we don’t know if we kept the records with the most
page_views or not,.
Recreating the table is a faster approach than using
ALTER IGNORE to remove specific records (especially if the table is very large). The following method uses a
SELECT statement to pick which records are copied to the new table (the extra duplicates are deleted by a process of exclusion). Using a
SELECT statement gives us a lot more control over which records are excluded.
CREATE TABLE `documents_tmp` LIKE `documents`; INSERT `documents_tmp` SELECT * FROM `documents` GROUP BY `title` ORDER BY `page_views` DESC; DROP TABLE `documents`; RENAME TABLE `documents_tmp` TO `documents`;
In the above example, duplicate
title records have been dropped from the table and only the records with the highest
page_views values were kept.