MySQL 删除重复数据
简介:在本教程中,您将学习删除MySQL中重复行的各种方法。 在上一个教程中,我们向您展示了如何在表中查找重复值。识别出重复行后,您可能希望删除它们以清理数据。
准备样本数据
下面的脚本创建的contacts_test
表和插入采样数据到contacts_test
表中为示范。
DROP TABLE IF EXISTS contacts_test; CREATE TABLE contacts_test ( id INT PRIMARY KEY AUTO_INCREMENT, first_name VARCHAR(50) NOT NULL, last_name VARCHAR(50) NOT NULL, email VARCHAR(255) NOT NULL ); INSERT INTO contacts_test (first_name,last_name,email) VALUES ('Carine ','Schmitt','carine.schmitt@verizon.net'), ('Jean','King','jean.king@me.com'), ('Peter','Ferguson','peter.ferguson@google.com'), ('Janine ','Labrune','janine.labrune@aol.com'), ('Jonas ','Bergulfsen','jonas.bergulfsen@mac.com'), ('Janine ','Labrune','janine.labrune@aol.com'), ('Susan','Nelson','susan.nelson@comcast.net'), ('Zbyszek ','Piestrzeniewicz','zbyszek.piestrzeniewicz@att.net'), ('Roland','Keitel','roland.keitel@yahoo.com'), ('Julie','Murphy','julie.murphy@yahoo.com'), ('Kwai','Lee','kwai.lee@google.com'), ('Jean','King','jean.king@me.com'), ('Susan','Nelson','susan.nelson@comcast.net'), ('Roland','Keitel','roland.keitel@yahoo.com');您可以在执行
DELETE
语句后执行此脚本以重新创建测试数据。
以下查询返回表中的重复电子邮件contacts_test
:
SELECT email, COUNT(email) FROM contacts_test GROUP BY email HAVING COUNT(email) > 1;
+--------------------------+--------------+ | email | COUNT(email) | +--------------------------+--------------+ | janine.labrune@aol.com | 2 | | jean.king@me.com | 2 | | roland.keitel@yahoo.com | 2 | | susan.nelson@comcast.net | 2 | +--------------------------+--------------+ 4 rows in set (0.00 sec)如您所见,我们有四行重复的电子邮件。
A)使用DELETE JOIN语句删除重复的行
MySQL为您提供了DELETE JOIN
可用于快速删除重复行的语句。
以下语句删除重复行并保留最高ID:
DELETE t1 FROM contacts_test t1 INNER JOIN contacts_test t2 WHERE t1.id < t2.id AND t1.email = t2.email;输出是:
Query OK, 4 rows affected (0.10 sec)它表明已删除了四行。您可以执行再次查找重复电子邮件的查询以验证删除:
SELECT email, COUNT(email) FROM contacts_test GROUP BY email HAVING COUNT(email) > 1;查询返回一个空集,这意味着已删除重复的行。 让我们验证
contacts_test
表格中的数据:
SELECT * FROM contacts_test;
+----+------------+-----------------+---------------------------------+ | id | first_name | last_name | email | +----+------------+-----------------+---------------------------------+ | 1 | Carine | Schmitt | carine.schmitt@verizon.net | | 3 | Peter | Ferguson | peter.ferguson@google.com | | 5 | Jonas | Bergulfsen | jonas.bergulfsen@mac.com | | 6 | Janine | Labrune | janine.labrune@aol.com | | 8 | Zbyszek | Piestrzeniewicz | zbyszek.piestrzeniewicz@att.net | | 10 | Julie | Murphy | julie.murphy@yahoo.com | | 11 | Kwai | Lee | kwai.lee@google.com | | 12 | Jean | King | jean.king@me.com | | 13 | Susan | Nelson | susan.nelson@comcast.net | | 14 | Roland | Keitel | roland.keitel@yahoo.com | +----+------------+-----------------+---------------------------------+ 10 rows in set (0.00 sec)id为2,4,7和9的行已被删除。 如果要删除重复行并保留最低ID,可以使用以下语句:
DELETE t1 FROM contacts_test t1 INNER JOIN contacts_test t2 WHERE t1.id > t2.id AND t1.email = t2.email;
注意:您可以再次执行用于创建
contacts_test
表的脚本并测试此查询。以下输出显示删除重复行后contacts_test
表的数据。
+----+------------+-----------------+---------------------------------+ | id | first_name | last_name | email | +----+------------+-----------------+---------------------------------+ | 1 | Carine | Schmitt | carine.schmitt@verizon.net | | 2 | Jean | King | jean.king@me.com | | 3 | Peter | Ferguson | peter.ferguson@google.com | | 4 | Janine | Labrune | janine.labrune@aol.com | | 5 | Jonas | Bergulfsen | jonas.bergulfsen@mac.com | | 7 | Susan | Nelson | susan.nelson@comcast.net | | 8 | Zbyszek | Piestrzeniewicz | zbyszek.piestrzeniewicz@att.net | | 9 | Roland | Keitel | roland.keitel@yahoo.com | | 10 | Julie | Murphy | julie.murphy@yahoo.com | | 11 | Kwai | Lee | kwai.lee@google.com | +----+------------+-----------------+---------------------------------+ 10 rows in set (0.00 sec)
B)使用中间表删除重复的行
以下显示了使用中间表删除重复行的步骤: 以下查询说明了这些步骤:步骤1。
CREATE TABLE source_copy LIKE source;
第2步。
INSERT INTO source_copy SELECT * FROM source GROUP BY col; -- col 是有重复数据的列
第3步。
DROP TABLE source; ALTER TABLE source_copy RENAME TO source;例如,以下语句从
contacts_test
表中删除包含重复电子邮件的行:
-- step 1 CREATE TABLE contacts_temp LIKE contacts_test; -- step 2 INSERT INTO contacts_temp SELECT * FROM contacts_test GROUP BY email,id; -- step 3 DROP TABLE contacts_test; ALTER TABLE contacts_temp RENAME TO contacts_test;
C)使用ROW_NUMBER() 函数删除重复的行
请注意,
以下语句使用ROW_NUMBER()
自MySQL版本8.02起,功能已得到支持,因此您应在使用功能之前检查您的MySQL版本。ROW_NUMBER()
函数为每行分配一个顺序整数。如果电子邮件重复,则行号将大于1。
SELECT id, email, ROW_NUMBER() OVER ( PARTITION BY email ORDER BY email) AS row_num FROM contacts_test以下语句返回重复行的id列表:
SELECT id FROM ( SELECT id, ROW_NUMBER() OVER ( PARTITION BY email ORDER BY email) AS row_num FROM contacts_test ) t WHERE row_num > 1;
+----+ | id | +----+ | 6 | | 12 | | 14 | | 13 | +----+ 10 rows in set (0.00 sec)而你只是删除从重复的行
contacts
使用表DELETE
与语句子查询 中的WHERE
条款:
DELETE FROM contacts_test WHERE id IN ( SELECT id FROM ( SELECT id, ROW_NUMBER() OVER ( PARTITION BY email ORDER BY email) AS row_num FROM contacts_test ) t WHERE row_num > 1 );MySQL发出以下消息:
4 row(s) affected在本教程中,您学习了如何使用
DELETE JOIN
语句或中间表删除MySQL中的重复行 。