A few months ago I tweeted a complex (and tedious) Excel formula on how to classify keywords:
For the #seo who insists on completing their keyword/intent research in excel
— Christopher Yee (@Eeysirhc) April 24, 2019
Philosophy: keyword intent is not absolute so it won't fall neatly into an assigned bucket. For this reason a keyword can live under multiple conversion funnels since we can't be 100% certain. pic.twitter.com/JcTl9P11mC
I then ended it with:
Notes
— Christopher Yee (@Eeysirhc) April 24, 2019
1) This will probably crap out after 30k keywords (solve: copy+paste values only)
2) You can supplement with GSC CTR data or SEM conversion rates (separate workflow)
3) Consider ditching excel for R/Python in the future (https://t.co/GoiyfAiFYN)
In hindsight the third comment could be interpreted as “gud luk lulz!”
For this walkthrough we’ll use the {fuzzyjoin} #rstats package to replicate the aforementioned Excel method. In a future post we’ll build a neural net instead to achieve this for 1M+ keywords.
Load packages
library(tidyverse)
Download data
df <- read_csv("https://raw.githubusercontent.com/Eeysirhc/random_datasets/master/keywords_digital_marketing.csv")
Inspect the data
df
## # A tibble: 29 x 1
## Keyword
## <chr>
## 1 digital marketing agency
## 2 what is digital marketing
## 3 digital marketing jobs
## 4 digital marketing salary
## 5 digital marketing manager
## 6 digital marketing course
## 7 digital marketing strategy
## 8 digital marketing services
## 9 digital marketing specialist salary
## 10 digital marketing certificate
## # … with 19 more rows
Introducing {fuzzyjoin}
Fuzzyjoin is an amazing package “that allows matching not just on values that match between columns, but on inexact matching.”
This is exactly what we need to identify and match keyword strings together so let’s first load the package.
# install.packages("fuzzyjoin")
library(fuzzyjoin)
To understand how this works we’ll start small by building out the consideration identifier segment.
consideration <- as_tibble(c("what is", "blog", "specialist",
"near me", "agency", "compan",
"service", "example")) %>%
rename(consideration = value)
Then we (fuzzy) match the two dataframes with the regex_left_join function to produce the following result:
regex_left_join(df, consideration, by = c("Keyword" = "consideration")) %>%
replace(., is.na(.), "-") # REPLACE NA VALUES
## # A tibble: 29 x 2
## Keyword consideration
## <chr> <chr>
## 1 digital marketing agency agency
## 2 what is digital marketing what is
## 3 digital marketing jobs -
## 4 digital marketing salary -
## 5 digital marketing manager -
## 6 digital marketing course -
## 7 digital marketing strategy -
## 8 digital marketing services service
## 9 digital marketing specialist salary specialist
## 10 digital marketing certificate -
## # … with 19 more rows
Not bad - we were able to assign each keyword to the respective consideration segment while ignoring those that do not apply.
Now let’s combine everything for the other categories…
# TRANSACTIONAL
transactional <- as_tibble(c("agency", "compan", "consult",
"service")) %>%
rename(transactional = value)
# EVALUATION
evaluation <- as_tibble(c("blog", "consult", "agency",
"compan", "service", "example")) %>%
rename(evaluation = value)
# CONSIDERATION
consideration <- as_tibble(c("what is", "blog", "specialist",
"near me", "agency", "compan",
"service", "example", "strategy")) %>%
rename(consideration = value)
# AWARENESS
awareness <- as_tibble(c("what is", "tool", "definition",
"channel", "near me", "blog",
"course", "new", "trend", "tip",
"strategy")) %>%
rename(awareness = value)
# OUT OF FUNNEL
out_of_funnel <- as_tibble(c("degree", "institute", "course",
"certif", "skill", "for dummi",
"training", "quote", "job",
"salary", "intern", "manager",
"resume", "analyst", "strategist",
"director", "specialist")) %>%
rename(out_of_funnel = value)
And finally match them with the keywords dataset…
df_parsed <- regex_left_join(df, transactional, by = c("Keyword" = "transactional")) %>%
regex_left_join(evaluation, by = c("Keyword" = "evaluation")) %>%
regex_left_join(consideration, by = c("Keyword" = "consideration")) %>%
regex_left_join(awareness, by = c("Keyword" = "awareness")) %>%
regex_left_join(out_of_funnel, by = c("Keyword" = "out_of_funnel")) %>%
replace(., is.na(.), "-") %>% # REPLACE NA VALUES
head(10) # FILTER ON FIRST 10
Step 5) Win.
Keyword | transactional | evaluation | consideration | awareness | out_of_funnel |
---|---|---|---|---|---|
digital marketing agency | agency | agency | agency |
|
|
what is digital marketing |
|
|
what is | what is |
|
digital marketing jobs |
|
|
|
|
job |
digital marketing salary |
|
|
|
|
salary |
digital marketing manager |
|
|
|
|
manager |
digital marketing course |
|
|
|
course | course |
digital marketing strategy |
|
|
strategy | strategy |
|
digital marketing services | service | service | service |
|
|
digital marketing specialist salary |
|
|
specialist |
|
salary |
digital marketing specialist salary |
|
|
specialist |
|
specialist |
Wrapping Up
What I love about this is the reproducibility and speed.
Although Excel did the job for me in the past when I needed it, the most frustrating thing was the cognitive load to get the lengthy array formula correct. It was (and still is) prone to error every time I had to 1) add new keyword identifiers or 2) wanted to expand the number of categories.
Transitioning to R has saved me countless headaches where I can just write, reuse or edit a few lines of code.
I hope this example using the {fuzzyjoin} package will help your keyword research workflow as much as it has helped mine!