Clustering keywords based on SERP similarity with Python

Written by: Frank

Publication date: 7 March 2024

Clustering keywords remains a task for many specialists that is time-consuming and, in most cases, not done well. Based on assumptions, they often look at which keywords could potentially match. This while clustering based on SERP similarity is more data driven and can also be automated. In this blog, I will take you through how to use Python to cluster keywords in bulk. This will save time and is going to improve the quality of your clustering.

How to previously did keyword clustering

When you begin to determine your SEO strategy, it is important to map the search landscape. Through a keyword research you can easily find out where the needs of the potential target audience lie. You can determine what is being searched for and which terms would be relevant for your website to be visible as well.

Creating a longlist

You start by creating a longlist. You can do this based on your own understanding (or the client’s), Google Search Console, Keyword Planner and an external tool such as Ahrefs or Semrush, for example. You throw all these keywords on a big list and filter it based on competitor names you don’t want to include and other irrelevant keyword.

The creation of a long list is important because in it you will collect all potential keywords. This list will only get shorter as the process goes on, you will need to add keywords afterwards, which means there is a chance that you will overlook keyword variants.

Clustering the keywords (we are going to automate this part)

Next, you come to by far the most time-consuming part of keyword research, clustering your longlist. You have a list of keywords that you now want to start grouping together before you can turn it into an actual content plan. During this process, you determine which page should go findable on which pages. To avoid internal competition, this process is hugely important.

When you cluster manually, this is often done on the basis of intent; you go through keywords manually and determine whether they mean the same thing so that you can make the same page findable on these keywords. It is therefore useful to divide your long list into main clusters so that you can get an overview. For the script, it is also useful to make a distinction in main clusters and label them as main clusters.

At Maatwerk Online we have investigated how much time we spend on clustering keywords, and with a longlist of 1,000 keywords, you will soon spend 3 hours (later I will come up with exact figures). In addition, you determine whether keywords belong together based on intuition, something that can be done if you are very familiar with the industry but even then this remains an error-prone method.

Turning clustered keywords into a content plan

Once you know which pages can be optimized for which keywords, it’s time to make this concrete and turn it into a content plan. In this content plan, you specify which pages should be written without creating internal competition.

Why keywords automation clustering is ideal

The very act of automating the clustering of keywords is ideal. It ensures that you can work in a data-driven way without losing your feeling for the search landscape. Often the argument is that a human being should remain involved in this process because otherwise you still have no idea where the opportunities lie. By creating the longlist yourself and building the content plan largely manually, you retain the idea that you know what the search landscape looks like.

In addition, clustering keywords in this way saves an enormous amount of time. To show that this is worth the investment, I have analyzed exactly how much time it takes to cluster keywords. This analysis was, of course, carried out on a small group of specialists, but it nevertheless gives a good idea of how much time it takes to cluster keywords. It showed that it takes an average specialist 3 hours and 12 minutes to cluster 1,000 keywords.

Clustering based on SERP similarity

Something that gave me the idea of clustering keywords based on SERP similiraty is that this can be done in bulk. You can use a data source to actually show that keywords belong together. So if this can be done once with two keywords there is no limit to this and we can do this repeatedly.

Keywords clustering is often not data driven

If we manually cluster keywords or have this done purely by an LLM, the disadvantage is often that we are using knowledge that we have. The disadvantage is that this is often not done based on what the actual search behavior is. We (humans) or an LLM might think that two keywords have the same search intent but we don’t know for sure.

SERP similiraty is a reliable data source

We can use SERP similarity to determine if the search intent is similar. If two SERPs are similar then this is a signal to merge them. In my experience, about 3 matching SERPs is the “sweat spot” to determine if two keywords have enough overlap to cluster them. Sometimes slightly more or less is better but this is industry dependent and something you can easily play with.

In some cases the output will be no better than doing this manually because you have the knowledge to do it yourself. But if you weigh this against the time you save by automating this process, you can easily fix this margin of error.

Working of the script in simple language

Built in Python, the script literally retrieves the current SERP data and converts it into a dataset with the right information for each keyword. Then the script compares the SERP information of two keywords and if they have more than three matching page ranks then it merges and thus clusters them. In the output, it indicates which keywords belong together and the score of the overlap. Based on this, you can also easily fix things. Through a Google Colab, you can easily use the script without needing Python knowledge or having Python installed.

Get started with the script yourself

The script is free to use via the Google Colab I built for this purpose. You can work with your own API from SerpAPI to cluster your own keywords. The first time it will feel a bit cumbersome but when you notice how much time this can potentially save, it’s something you’ll throw back in every keyword research project. If you have any questions or spot an innovative addition please be sure to let me know.

About author

My career as an online marketer began broadly, but my passion for SEO eventually led me to found my own agency, Dificem, in 2019. After almost two years, I decided to further specialize in technical SEO at Bespoke Online, where I am now the point of contact for technical SEO issues. My interest in automation and AI, especially through the use of Python and ChatGPT, has greatly enhanced my work, which even gave me the opportunity to speak at BrightonSEO in 2024. In addition to my work, I am an avid traveler and marathon runner, regularly working from different countries as a digital nomad.

More about Frank

More blogs

Structured data: the complete guide

Structured data: the complete guide

Structured data, adding context to a page so that search engines like Google will better understand your content and thus website. For many SEO specialists, structured data is a pain in the ass but for others, who cleverly exploit the possibilities of structured data,...

No Results Found

The page you requested could not be found. Try refining your search, or use the navigation above to locate the post.