My friend John-Henry Scherck recently tweeted his process on how to refresh stale content:

I imagine this can be broken down into five distinct parts:

  1. Stale content selection
  2. Understanding keyword intent
  3. Actually refreshing the content
  4. Internal link optimization
  5. Publish

This short guide will focus on the first aspect where we’ll use #rstats to remove the manual work associated with stale candidate selection.

Load packages



Download data

The code below will grab 100K results for the last five full weeks of data but feel free to revise as you see fit.

df <- as_tibble(search_analytics("",
                                 Sys.Date() - 35, # START DATE
                                 Sys.Date() - 3, # END DATE
                                 c("page", "query"),
                                 searchType = "web",
                                 rowLimit = 1e5))

Identify keywords

This is where we’ll exclude brand terms and filter only on keywords with more than 2K impressions & average position between 5 to 15.

keywords <- df %>% 
  group_by(query) %>% 
  summarize(impressions = sum(impressions),
            position = mean(position)) %>% 
  filter(!grepl("brand_term", query)) %>% # EXCLUDE BRAND TERMS HERE
  arrange(dsec(impressions)) %>% 
  filter(impressions >= 2000,
         position >= 5 & position < 15) %>% 

Dedupe landing pages

There may be instances where a page will have multiple keywords.

We can remove duplicates here by sorting keywords with highest clicks for each page.

pages <- df %>% 
  inner_join(keywords) %>% # JOIN OUR KEYWORDS DATASET
  group_by(query) %>% 
  arrange(desc(clicks)) %>% 
  mutate(candidate = row_number()) %>% 
  ungroup() %>% 
  filter(candidate == 1) %>% 

Fun fact: I often use candidate = row_number() as a quick hack to filter the “top” or “bottom” criteria for a given dataset

Final candidates

df %>% 
  inner_join(pages) %>% 
  mutate(ctr = (clicks / impressions) * 100) %>% # STANDARDIZE CTR
  arrange(desc(page, impressions)) %>% 

From here you can then take the keywords and move on to the understanding keyword intent phase.
