The larger the dataset, the higher are the chances of encountering duplicate records. It can be problematic if they are not identified and handled correctly.
We will illustrate how to remove duplicates in Excel using a Sports dataset. This dataset contains information about Olympic medalists for the year 2012.
Using Remove Duplicates Option on Data Tab
Excel has a built-in tool that helps delete repeated entries in your dataset. Let’s have a look at the steps to be followed to remove duplicates in Excel.
- First, click on any cell or a specific range in the dataset from which you want to remove duplicates. If you click on a single cell, Excel automatically determines the range for you in the next step.
- Next, locate the ‘Remove Duplicates’ option and select it.
DATA tab → Data Tools section → Remove Duplicates
- A dialog box appears, as shown below. You can select the columns you want to compare and check for duplicate data.
In case your data consists of column headers, select the ‘My data has headers’ option, and then click on OK.
On checking the header option, the first row will not be considered for removing duplicate values.
- Excel will now delete the duplicate rows and display a dialog box. The dialog box shows a summary of how many duplicate values are found and removed along with the count of unique values.
- As you can notice, the duplicate records are removed.
Let’s move forward and understand how to remove duplicates in Excel using the Advanced Filter option.
Using the Advanced Filter Option
The Advanced Filter option in Excel helps you filter duplicate values and copy the unique values to a different location. Look at the following steps to find out how the Advanced Filter option works.
- First, click on a cell or range in the dataset from which you want to remove duplicates. If you click on a single cell, Excel automatically determines the range when you click on Advanced Filter.
- Locate the Advanced Filter option.
DATA tab → Sort & Filter section → Advanced and click on it.
- You will see a dialog box. It consists of a list of options for advanced filtering.
- Select the ‘Copy to another location’ option to copy the unique values to a different location.
- Check the range of your records in the ‘List Range’ field and make sure it is the range that you have specified.
- In the ‘Copy to:’ field, enter the range where the resultant unique values must be copied.
- Check the ‘Unique records only’ box. This step is the most crucial.
- Click on OK.
- The unique values will be copied to cell G1.
These were the in-built functionalities in Excel that help us in removing duplicates. Now, let’s move forward and learn how we can create our own function to do the same.
How to Use Formulas to Remove Duplicates in Excel?
We will be using a simple example containing the columns: type of sport, athlete name, and medal won to demonstrate this approach.
This method involves combining the columns using an Excel formula and finding out the count. We will then filter out the duplicate values (ones that have a count greater than 1).
- Let’s combine the columns A, B, and C by using the concatenation operator “&“. So, the Excel formula would be:
This formula is entered into the cell D2 and then copied down to all the rows.
- Now, we will need another column named ‘Count’ to find out the duplicates in Column D. Hence, we use the COUNTIF function on cell E2. The formula will be:
This formula helps count the number of occurrences of each value in column D.
If the value of Count is “1”, then it has only appeared once and is unique. If the value is “2” or more, then it is considered a duplicate value.
- Now add a filter to the Count column by selecting the Filter option.
You will find it in the DATA tab → Sort & Filter section → Filter
Click on the filter at the top of Column E. Select “1 ” to keep only the unique values and remove the duplicates.
- On clicking OK, the duplicate values will be removed from the table. You can copy these resultant unique records and paste them elsewhere.
Let’s proceed by understanding the final approach to delete duplicates: Using Power Query.
How to Use the Power Query Tool to Remove Duplicates in Excel?
Power Query in Excel lets you import data from various sources, clean and transform your data. This tool makes it effortless to remove duplicates in Excel.
- Select a cell or range, locate Data Tab → Get & Transform Data section → From Table/Range and click on it.
- On clicking, you will see a dialog box to create a power query table. Make sure that the range of values is correctly specified. Click on OK.
- The following Power Query editor window appears.
Now you have two options. You can remove duplicates based on:
- One or more columns
- Entire table
To remove duplicates based on one or more columns, right-click on the specific column header. You can select more than one column using the CTRL button and remove the duplicates accordingly.
To remove the duplicate records based on the entire table, click on the button present on the top left corner of the data preview. And then select the ‘Remove Duplicates’ option.
This way, the data will be free from duplicate values.
On clicking the ‘Close & Load’ option, the data will be loaded onto your spreadsheet.