How to remove duplicates from a list?

A list holds a sequence of data items enclosed within square brackets. The list is ordered, mutable, and can contain duplicate elements.

Consider a list list_1 = [1, 4, 3, 1, 2, 3]. When duplicate elements are removed from list_1, the list_1 is changed to [1, 4, 3, 2]. This article discusses various methods to remove duplicate elements from the list.

Removing duplicates from a list by converting it to set

A set is a built-in data type in python that contains a sequence of data items enclosed within curly brackets. Set is not ordered and doesn't allow redundant data items.

A list is converted to a set to remove duplicates and is converted back again to list. But, the main drawback of this method is the ordering of elements is lost.

Consider a list list_1 = [1, 'red', 2, 1, 'red', 4]. The set() constructor converts list to set and is stored in variable set_1 = {1, 2, 4, 'red'} . This removes the duplicate elements in the list_1. The set_1 is converted back again to list using list() constructor.

list_1 = [1, "red", 2, 1, "red", 4]
print("list_1: ", list_1)
set_1 = set(list_1)
print("set_1: ", set_1)
list_2 = list(set_1)
print("list_2: ", list_2)

The above code returns the output as

list_1:  [1, 'red', 2, 1, 'red', 4]
set_1:  {1, 2, 4, 'red'}
list_2:  [1, 2, 4, 'red']

Removing duplicates from a list using OrderedDict and fromkeys()

The dictionary doesn't allow duplicate keys. Hence, a sequence when converted to a dictionary removes the duplicate data items.

The OrderedDict() can be imported from the collections module. The OrderedDict preserves the order of the keys in which they are inserted.

The fromkeys() method takes two parameters sequence and a value(optional) and returns a dictionary. All the elements in the sequence are keys for the dictionary. If the value argument is provided by the user, all the keys are set to that particular value. By default, the value parameter is set to None.

The syntax for fromkeys() is given below

#syntax:
dict.fromkeys(sequence, value)

Consider a list list_1 = [1, 'red', 2, 1, 'red', 4]. The dict_1 is initialized to OrderedDict() to preserve the order of items. dict.fromkeys(list_1) returns a dictionary and is stored in dict_1 = {1: None, 'red': None, 2: None, 4: None}. The list() converts the dict_1 back to list and stores the data in variable list_2 = [1, 'red', 2, 4].

from collections import OrderedDict
list_1 = [1, "red", 2, 1, "red", 4]
print("list_1: ", list_1)
dict_1 = OrderedDict()
dict_1 = dict.fromkeys(list_1)
print("dict_1: ", dict_1)
list_2 = list(dict_1)
print("list_2: ", list_2)

The above code returns the output as

list_1:  [1, 'red', 2, 1, 'red', 4]
dict_1:  {1: None, 'red': None, 2: None, 4: None}
list_2:  [1, 'red', 2, 4]

Removing duplicates from a list using a for loop and in operator

Consider a list list_1 = [1, 'red', 2, 1, 'red', 4]. The variable list_2 is initialized to an empty list. A for loop iterates over the list_1. In each iteration, the program checks if the element is present in list_2. If the element is not present the item is added to list_2 using the append() function.

The list_2 holds the first occurrence of the data items and doesn't contain any duplicate values.

list_1 = [1, "red", 2, 1, "red", 4]
print("list_1: ", list_1)
list_2 = []
for item in list_1:
    if item not in list_2:
        list_2.append(item)
print("list_2: ", list_2)

Output

list_1:  [1, 'red', 2, 1, 'red', 4]
list_2:  [1, 'red', 2, 4]

Removing duplicates from a list using enumerate()

An enumerate() function takes two parameter sequence and start and returns an enumerate object. The enumerate function adds counter as keys to data items in the sequence. By default, the value of parameter start is set to 0.

Consider a list list_1 = [1, 'red', 2, 1, 'red', 4]. The variable list_2 is initialized to an empty list. A for loop iterates over the enumerate (list_1) and checks if the element has any duplicates item before that particular index. If any duplicate item is found the element is not added to list_2. Otherwise, the element is added to list_2.

list_1 = [1, "red", 2, 1, "red", 4]
print("list_1: ", list_1)
list_2 = []
print("enumerate(list_1): ", list(enumerate(list_1)))
for count, item in enumerate(list_1):
     if item not in list_1[:count]:
         list_2.append(item)
print("list_2: ", list_2)

Output

list_1:  [1, 'red', 2, 1, 'red', 4]
enumerate(list_1):  [(0, 1), (1, 'red'), (2, 2), (3, 1), (4, 'red'), (5, 4)]
list_2:  [1, 'red', 2, 4]

Removing duplicates from a list using numpy.unique()

The numpy.unique() takes a sequence as a parameter and returns a sorted unique array. The array is converted back to list using tolist() method.

Consider a list list_1 = [1, 'red', 2, 1, 'red', 4]. numpy.unique(list_1) returns an sorted unique array object array_1 as ['1' '2' '4' 'red']. The array_1.tolist() converts the array to list ['1', '2', '4', 'red']

import numpy
list_1 = [1, 'red', 2, 1, 'red', 4]
print("list_1: ", list_1)
array = numpy.unique(list_1)
print("Array object: ",array)
list_2 = array.tolist()
print("list_2: ", list_2)

Output

list_1:  [1, 'red', 2, 1, 'red', 4]
Array object:  ['1' '2' '4' 'red']
list_2:  ['1', '2', '4', 'red']

There are two drawbacks of using numpy.unique()

  • It returns a sorted array. Hence, the ordering of elements is not preserved.
  • Passing a mixed data type list as an argument to unique() converts every element in the list to string data type and returns an array with the string data type.
0 results
Comment / Suggestion Section
Point our Mistakes and Post Your Suggestions