Find the Most Repeated Word in a data set using Power Query

Most Repeated Word Excel

MODE function in Excel will return the most frequently occurring number in a data set. But for finding the most frequently occurring text string in a data set, we don’t have an Excel function for that.

In this article, I will show you how to find the most repeated word in a data set using Power Query in Excel.

Following is the list of 100 English Proverbs.

To list out the Most Used Words in these 100 sentences,

Select a cell in the data set > go to the Data tab in Excel ribbon > From Table/Range > Click OK in the Create Table dialog

Selected data is loaded into the Power Query Editor of Excel.

Now, we have to split the sentences in the second column called Proverb into words.

In the Home tab of Power Query Editor > Split Column > By Delimiter

In the dialog called Split Column by Delimiter, Select Space from the options available under Select or enter delimiter and Click OK

Each word of each sentence in 100 Rows, is now in a separate column. For example, the first sentence “Actions speak louder than words” is split into 5 columns.

Now that we have the words in different columns, we have to UnPivot these columns into Rows. For that…

Right-click on the first column header > Select Unpivot Other Columns

Data in columns came into 659 Rows (that means there are 659 words in those 100 sentences).

We don’t need the first two columns containing serial numbers and column headers. To remove those columns, Select the first two columns > Right-click on the column header > Remove Columns

To get the count of each word, right-click on the column header of the remaining column containing words > select Group By…

In the Group By dialog, we have to define 3 parameters.

  1. The column containing values to group – In this case, the column called Value contains the values to group
  2. New column nameCount will be the name of the new column
  3. Operation – We need the count of words, so Count Rows should be the operation to perform

When I click OK, we have a new column called Count containing the count of words. Note that the words are grouped and the number of rows have become 383.

To find the Most Repeated Words, select the column called Count > Click on Sort Descending in the Home tab

Here we have the list of most repeated words in our data set.

The word ‘the’ can be found 34 times in the whole data set. ‘is’ 26 times, ‘to’ 15 times etc.

I just want the 10 Most repeated words from this list. So, to remove all other rows, xlick on the split button for Keep Rows in the Home tab > Select Keep Top Rows

Type in 10, in the Keep Top Rows dialog and click OK

All other rows except the Top 10 rows are gone. Before loading this data into the Excel sheet, I will change the column header of the first column to Words

To load thois data into the Excel sheet, Click on the Split button for Close & Load in the Home tab > Close & Load To…

In the Import Data dialog > Select Existing worksheet > Select the cell where you want to place the data (Here, I have selected the cell D2) and Click OK

Here, we have the list of the 10 Most repeated words in our data set.

Whenever you add something new to the source data, Right-click on the output table and select Refresh to update it.

See what happens, when I add 50 new sentences to the source table.


2 thoughts on “Find the Most Repeated Word in a data set using Power Query

    1. Glad to hear that.
      I recommend to check my YouTube video on 12 methods to Clean Data using Power Query.

Comments are closed.