My friend John-Henry Scherck recently tweeted his process on how to refresh stale content:
Put together a quick video on how to refresh stale content using nothing more than Google Search Console and a word doc.
— John-Henry Scherck (@JHTScherck) January 21, 2020
Check out the full video here: https://t.co/Vva4Zm4mNn pic.twitter.com/74Fm2oIz4c
I imagine this can be broken down into five distinct parts:
- Stale content selection
- Understanding keyword intent
- Actually refreshing the content
- Internal link optimization
- Publish
This short guide will focus on the first aspect where we’ll use #rstats to remove the manual work associated with stale candidate selection.
That's it! Fairly manual, but hopefully straightforward. Let me know what you think or if you have any questions.
— John-Henry Scherck (@JHTScherck) January 21, 2020
Load packages
library(tidyverse)
library(searchConsoleR)
scr_auth()
Download data
The code below will grab 100K results for the last five full weeks of data but feel free to revise as you see fit.
df <- as_tibble(search_analytics("https://www.christopheryee.org/",
Sys.Date() - 35, # START DATE
Sys.Date() - 3, # END DATE
c("page", "query"),
searchType = "web",
rowLimit = 1e5))
Identify keywords
This is where we’ll exclude brand terms and filter only on keywords with more than 2K impressions & average position between 5 to 15.
keywords <- df %>%
group_by(query) %>%
summarize(impressions = sum(impressions),
position = mean(position)) %>%
filter(!grepl("brand_term", query)) %>% # EXCLUDE BRAND TERMS HERE
arrange(dsec(impressions)) %>%
filter(impressions >= 2000,
position >= 5 & position < 15) %>%
select(query)
Dedupe landing pages
There may be instances where a page will have multiple keywords.
We can remove duplicates here by sorting keywords with highest clicks for each page.
pages <- df %>%
inner_join(keywords) %>% # JOIN OUR KEYWORDS DATASET
group_by(query) %>%
arrange(desc(clicks)) %>%
mutate(candidate = row_number()) %>%
ungroup() %>%
filter(candidate == 1) %>%
select(page)
Fun fact: I often use candidate = row_number() as a quick hack to filter the “top” or “bottom” criteria for a given dataset
Final candidates
df %>%
inner_join(pages) %>%
mutate(ctr = (clicks / impressions) * 100) %>% # STANDARDIZE CTR
arrange(desc(page, impressions)) %>%
distinct(.)
From here you can then take the keywords and move on to the understanding keyword intent phase.
Resources
- Full script can be found on GitHub
- If you enjoyed this post, you may be interested in my getting started with R guide using Google Search Console data