How to remove punctuation from a string in python?

Punctuation refers to the characters used to separate sentences. ! () - [] {} ; : ' ", etc. are different punctuation marks. This article discusses different methods to remove punctuation marks from a string.

Removing punctuation from string by using for loop and in op

erator

The in operator checks the existence of an element in an iterable and returns True if the element exists in the sequence. Otherwise, in operator returns False.

Consider a string string_1 = "Welcome !!! to 'PyThon' ; { Programming }". A string punc_string is initialize to "!()-[]{};:\'\"\,<>./[email protected]#$%^&*_~" consisting all the punctuation marks and string_2 is initialized to an empty string. A for loop iterates over the string_1.

In each iteration, the in operator checks the existence of punctuation marks. If the character is not a punctuation mark the character is concatenated to string_2.

punc_string = "!()-[]{};:\'\"\,<>./[email protected]#$%^&*_~"
string_1 = "Welcome !!! to 'PyThon' ; { Programming }"
string_2 = ''
print("string_1: ", string_1)
print("punc_string: ", punc_string)
for i in string_1:
    if i not in punc_string:
        string_2 += i
print("string_2: ", string_2)

The above code returns the output as

string_1:  Welcome !!! to 'PyThon' ; { Programming }
punc_string:  !()-[]{};:'"\,<>./[email protected]#$%^&*_~
string_2:  Welcome  to PyThon   Programming 

Instead of initializing the punc_string with all the punctuation marks, we can also use string.punctuation. string.punctuation is a built-in method in the string module that returns all the punctuation marks and doesn't take any arguments since it is not a function.

Removing punctuation from string by using translate() function

The translate() function returns a new string by replacing the old substring with a new substring. The translate() function uses maketrans() to create a mapping table or a dictionary to represent the old substring and new substring.

The maketrans() function takes three arguments sub_str1, sub_str2, sub_str3. The syntax for maketrans() is given below.

#syntax:
string.maketrans(sub_str1, sub_str2, sub_str3)
  • sub_str1: This parameter takes a string as a value specifying the characters to be replaced.
  • sub_str2 : This parameter takes the string of the same length as sub_str1 as a value. Each character of the first parameter will be replaced with the corresponding character in the second parameter.
  • sub_str3: Represents a substring describing the characters to be removed from the original string.

Consider a string string_1 = "Welcome !!! to 'PyThon' ; { Programming }". We don't want to replace any characters so, we will pass an empty string to both parameters sub_str1 and sub_str2. A string.punctuation is passed as a parameter to sub_str3.

import string
string_1 = "Welcome !!! to 'PyThon' ; { Programming }"
print("string_1: ", string_1)
mapping_table = string_1.maketrans("", "", string.punctuation)
string_2 = string_1.translate(mapping_table)
print("string_2: ", string_2)

The above code returns the output as

string_1:  Welcome !!! to 'PyThon' ; { Programming }
string_2:  Welcome  to PyThon   Programming 

Removing punctuation from string by using re.sub() method

The re.sub() method can be imported from the re(Regular expressions) module. The re.sub() returns a string by replacing a set of one or more characters with a replacement string. The syntax for re.sub() is given below.

#syntax:
re.sub(pattern, repl, string)
  • pattern: The pattern indicates the set of characters to be replaced.
  • repl: This takes a string as an argument that would replace the characters.
  • string: This represents the original string.

Consider a string string_1 = "Welcome !!! to 'PyThon' ; { Programming }". The re.sub() method takes the pattern as [^\w\s] and replacement string as an empty string. This pattern indicates to replace every character with a blank space except \w (all the alphabets, numbers, underscores) and \s(white space).

import re
string_1 = "Welcome !!! to 'PyThon' ; { Programming }"
print("string_1: ", string_1)
string_2 = re.sub(r'[^\w\s]', '', string_1)
print("string_2: ", string_2)

The above code returns the output as

string_1:  Welcome !!! to 'PyThon' ; { Programming }
string_2:  Welcome  to PyThon   Programming 
0 results
Comment / Suggestion Section
Point our Mistakes and Post Your Suggestions