How to count the number of words in a string in python?

Consider a string "Python Programming". The number of words in the strings is 2. This article discusses different ways to count words in a string in python.

Counting the words using the split() method

The split() breaks the string by a specified separator and returns a list of strings. It takes two parameters 'sep' and 'maxsplit'.

  • sep represents the separator, The string breaks at this specified separator, if not provided by default white space is considered as a separator.
  • The maxsplit indicates the number of splits to be performed, if not provided it has no limit. Both the parameters sep and maxsplit are optional.

The syntax for the split() method is as follows.

#syntax:
string.split(sep, maxsplit)

Consider a string string_1 = "Welcome To Python Programming". string_1.split() breaks the string at whitespace and returns a list list_1 = ['Welcome', 'To', 'Python', 'Programming']. The length of the list gives the total count of words.

string_1 = "Welcome To      Python Programming"
print("string_1: ", string_1)
list_1 = string_1.split()
print("list_1: ", list_1)
print("Total words: ", len(list_1))

Output

string_1:  Welcome To      Python Programming
list_1:  ['Welcome', 'To', 'Python', 'Programming']
Total words:  4

This approach fails when special characters or punctuation are introduced in the string. For example, consider a string string_1 ="Welcome To Python & Java Programming". The number of words in string_1 is 5.

The split() method breaks the string at whitespace and returns the list as list_1 = ['Welcome', 'To', 'Python', '&', 'Java', 'Programming']. The split() methods take the special character '&' as a word and returns the output as 6.

string_1 = "Welcome To Python & Java Programming"
print("string_1: ", string_1)
list_1 = string_1.split()
print("list_1: ", list_1)
print("Total words: ", len(list_1))

Output

string_1:  Welcome To Python & Java Programming
list_1:  ['Welcome', 'To', 'Python', '&', 'Java', 'Programming']
Total words:  6

To overcome the drawbacks of split() method we can use regex(findall) or split() +strip()

Counting the words using the findall() method

The findall() method can be imported from the re(regular expressions) module. The findall() method searches for all the occurrences that match a given pattern. The syntax for the findall() method is given below.

re.findall(pattern, string)

Consider a string string_1 ="Welcome To-> Python & Java Programming". The re.findall(r'\w+', string_1) returns the list as list_1 = ['Welcome', 'To', 'Python', 'Java', 'Programming'].

The \w+ matches the word characters(this includes number, alphabets, underscores). The '\w+' is prefixed with raw string notation r so that backslashes are not handled in a special way.

import re
string_1 = "Welcome To-> Python & Java Programming"
print("string_1: ", string_1)
list_1 = (re.findall(r'\w+', string_1))
print("list_1: ", list_1)
print("Total words: ", len(list_1))

The above code returns the output as

string_1:  Welcome To-> Python & Java Programming
list_1:  ['Welcome', 'To', 'Python', 'Java', 'Programming']
Total words:  5

Counting the words using split() + strip()

Consider a string string_1 ="Welcome To-> Python & Java Programming". The string is broken into list of strings by using string_1.spit() method and is store in a variable list_1 = ['Welcome', 'To->', 'Python', '&', 'Java', 'Programming'].

A for loop traverses the list_1 and the punctuations are removed from each string using the strip() function. The strip() method removes leading and trailing characters based on the parameters passed to the strip() function.

string.punctuation is passed as an argument to strip() function to remove the leading and trailing punctuation from the strings. string.punctuation returns all the punctuation and doesn't take any arguments since it is not a function. The string.punctuation can be used after importing the string module.

After removing the leading and trailing punctuation, the isalpha() method checks if the string contains all the alphabets or not. If the isalpha() returns True the variable count is incremented by 1 in each iteration. The variable count gives the total count of words in string_1.

In this method, if numerical values are present in the string they are not taken into consideration.
import string
string_1 ="Welcome To-> Python & Java Programming"
print("string_1: ", string_1)
list_1 = string_1.split()
print("list_1: ", list_1)
count = 0
for i in list_1:
  if (i.strip(string.punctuation).isalpha()):
      count += 1
print("Total words: ", count)

Output

string_1:  Welcome To-> Python & Java Programming
list_1:  ['Welcome', 'To->', 'Python', '&', 'Java', 'Programming']
Total words:  5
0 results
Comment / Suggestion Section
Point our Mistakes and Post Your Suggestions