Recreating plots in R: the power of tiny gains

Once in awhile I find some great charts which make me stop and think. Below is one example of this: 1% better every day is all it takes to completely change your life. pic.twitter.com/kYbgGOzzZv — Santiago (@svpino) September 10, 2021 I really appreciate the inspirational message behind it: being deliberate about the change you wish to see in the world. With that in mind, I want to recreate that chart below using the R programming language (regardless if the trajectory path is realistic or not). ...

September 21, 2021 · Christopher Yee

[Updated] US firearm sales in 2020

My original exploratory analysis on the topic can be found at Firearm Sales: How are Americans coping with 2020? This post is a quick #rstats follow-up to visualize the final tally for 2020 data. Load libraries library(tidyverse) library(lubridate) library(scales) Download & parse data df_raw <- read_csv("https://raw.githubusercontent.com/BuzzFeedNews/nics-firearm-background-checks/master/data/nics-firearm-background-checks.csv") df <- df_raw df_clean <- df %>% filter(month >= "2016-01" & month < "2021-01") %>% select(month, state, handgun, long_gun) %>% arrange((month)) %>% mutate(month = as.Date(paste0(month, "-01"))) %>% group_by(month) %>% summarize(handgun = sum(handgun), long_gun = sum(long_gun)) %>% mutate(index_month = as.factor(month(month, label = TRUE)), index_year = as.factor(year(month))) %>% ungroup() Visualize data df_clean %>% group_by(index_year) %>% mutate(handgun = cumsum(handgun), long_gun = cumsum(long_gun)) %>% ungroup() %>% select(month, index_month, index_year, handgun, long_gun) %>% pivot_longer(handgun:long_gun, names_to = "type") %>% ggplot(aes(index_month, value, color = index_year, group = index_year)) + geom_line() + geom_point() + scale_y_continuous(labels = comma_format()) + scale_color_brewer(palette = 'Paired') + expand_limits(y = 0) + facet_grid(type ~ .) + labs(color = NULL, x = NULL, y = NULL, title = "NICS Firearm Background Checks: monthly cumulative per year by type", caption = "by: @eeysirhc\nsource: Federal Bureau of Investigation") + theme_bw() + theme(legend.position = 'top') ...

March 5, 2021 · Christopher Yee

Visualizing FB spend: image vs video creative

Objective: plot the comparison of total Facebook spend between image and video creatives for a small sample of DTC brands. The original piece without any visualization (e.g. tabulated data) can be found here but the main takeaway: Though it can be tempting to go all in on video assets, I intend to use this data as added inspiration to continue investing in and testing Images. Load modules import pandas as pd import numpy as np import matplotlib.pyplot as plt import seaborn as sns sns.set(style='darkgrid') Encode data labels = ['brand', 'total_spend', 'pct_image_spend', 'image_cpa', 'pct_video_spend', 'video_cpa'] df = [['Brand 1', 1880000, 17, 773, 83, 805], ['Brand 2', 1630000, 57, 350, 44, 463], ['Brand 3', 1610000, 34, 179, 66, 188], ['Brand 4', 1300000, 12, 132, 88, 169], ['Brand 5', 1230000, 63, 46, 37, 40], ['Brand 6', 800000, 15, 22, 85, 24], ['Brand 7', 690000, 7, 120, 93, 127], ['Brand 8', 590000, 87, 18, 13, 28], ['Brand 9', 400000, 3, 47, 97, 0.63], ['Brand 10', 230000, 24, 48, 75, 114], ['Brand 11', 220000, 20, 25, 80, 21], ['Brand 12', 180000, 40, 57, 59, 51], ['Brand 13', 170000, 3, 47, 95, 59], ['Brand 14', 120000, 13, 17, 90, 13]] df = pd.DataFrame(df) df.columns = labels Define function We will use this simple method to categorize the brands and their different ad spend levels on Facebook. ...

February 10, 2021 · Christopher Yee

California Wildfires: cumulative acres burned over time

Wildfires are raging across California (again). Always knew I would end up in hell but I imagined it was more of a spontaneous combustion type of event rather than a gradual descent into the infernal #everythingisfine pic.twitter.com/gl6otozX6f — Christopher Yee (@Eeysirhc) September 8, 2020 What I noticed over the years of “doom watching” is how the news only report on tabulated data. They lacked any sort of visualization to underscore the impact of these fires. ...

September 16, 2020 · Christopher Yee

Visualizing the relationship between quality score & CPC

The SEM industry has published a lot of information about the importance of improving quality score to lower average cost per click (CPC). Most of those articles, however, just share a table with quality score in one column and its associated % increase/decrease to average CPC in the other. Although helpful I think it misses the mark on underscoring the magnitude of how much QS can help CPC. We will do something different: the python code below will take that data and visualize the impact to average CPC for a given quality score. ...

August 11, 2020 · Christopher Yee

Recreating plots in R: intro to bootstrapping

Objective: recreate and visualize the 500K sampling distribtuion of means from this intro to bootstrapping in statistics post using R. Load libraries library(tidyverse) library(rsample) Download data df <- read_csv("https://statisticsbyjim.com/wp-content/uploads/2017/04/body_fat.csv") Bootstrap resampling 500K df_bs <- df %>% bootstraps(times = 500000) %>% mutate(average = map_dbl(splits, ~ mean(as.data.frame(.)$`%Fat`))) Visualize sampling distribution of means df_bs %>% ggplot(aes(average)) + geom_histogram(binwidth = 0.1, alpha = 0.75, color = 'white', fill = 'steelblue') + scale_x_continuous(limits = c(25, 32)) + scale_y_continuous(labels = scales::comma_format()) + labs(title = "Histogram of % Fat", subtitle = "500K bootstrapped samples with 92 observations in each", x = "Average Mean", y = "Frequency") + theme_minimal() ...

June 1, 2020 · Christopher Yee

Script to track COVID-19 cases in the US

A couple weeks ago I shared an #rstats script to track global coronavirus cases by country. The New York Times also released COVID-19 data for new cases in the United States, both at the state and county level. You can run the code below on a daily basis to get the most up to date figures. Feel free to modify for your own needs: library(scales) library(tidyverse) library(gghighlight) state <- read_csv("https://raw.githubusercontent.com/nytimes/covid-19-data/master/us-states.csv") county <- read_csv("https://raw.githubusercontent.com/nytimes/covid-19-data/master/us-counties.csv") State state %>% group_by(date, state) %>% mutate(total_cases = cumsum(cases)) %>% ungroup() %>% filter(total_cases >= 100) %>% # MINIMUM 100 CASES group_by(state) %>% mutate(day_index = row_number(), n = n()) %>% ungroup() %>% filter(n >= 12) %>% # MINIMUM 12 DAYS ggplot(aes(day_index, total_cases, color = state, fill = state)) + geom_point() + geom_smooth() + gghighlight() + scale_y_log10(labels = comma_format()) + facet_wrap(~state, ncol = 4) + labs(title = "COVID-19: cumulative daily new cases by US states (log scale)", x = "Days since 100th reported case", y = NULL, fill = NULL, color = NULL, caption = "by: @eeysirhc\nSource: New York Times") + theme_minimal() + theme(legend.position = 'none') + expand_limits(x = 30) ...

March 30, 2020 · Christopher Yee

[Updated] Top Industries from Inc.5000 Companies

Changelog Originally published on September 10th, 2019 Built a Shiny app for this Full code can be found on GitHub One of my favorite online marketers, (the) Glen Allsopp, tweeted the following: Over the past few weeks I've went through every site in the Inc. 5000. My mind has been blown multiple times. Don't click if you're easily distracted. Enjoy! https://t.co/mHVK8rvb9X pic.twitter.com/BoEb3qQ7LZ — Glen Allsopp (@ViperChill) August 27, 2019 The public spreadsheet contains four fields: ...

January 7, 2020 · Christopher Yee