TardyThursday: College Tuition, Diversity & Pay

The differences between this unsanctioned #tardythursday and the official #tidytuesday: These will publish on Thursday (obviously) The dataset will come from a completely different week of TidyTuesday For a surprise, I’ll code with either #rstats or python (similar to #makeovermonday) Load modules import pandas as pd import seaborn as sns import matplotlib.pyplot as plt Download and parse data df_raw=pd.read_csv("https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2020/2020-03-10/salary_potential.csv") df=df_raw[['state_name', 'early_career_pay', 'mid_career_pay']].groupby('state_name').mean().reset_index() Visualize dataset sns.set(style="darkgrid") plt.figure(figsize=(20,15)) g=sns.regplot(x="early_career_pay", y="mid_career_pay", data=df) for line in range(0,df.shape[0]): g.text(df.early_career_pay[line]+0.01, df.mid_career_pay[line], df.state_name[line], horizontalalignment='left', size='medium', color='black') plt.xlabel("Early Career Pay") plt.ylabel("Mid Career Pay") plt.title("Average Salary Potential by State: Early vs Mid Career", x=0.01, horizontalalignment="left", fontsize=16) plt.figtext(0.9, 0.09, "by: @eeysirhc", horizontalalignment="right") plt.figtext(0.9, 0.08, "Source: TuitionTracker.org", horizontalalignment="right") plt.show() ...

March 19, 2020 · Christopher Yee

Script to track global Coronavirus pandemic cases

The coronavirus (a.k.a. COVID-19) is taking the world by storm with the World Health Organization officially characterizing the situation as a pandemic. I’m not an infectious disease expert but I couldn’t resist and write a quick #rstats script to visualize the total number of cases by country. Feel free to use and modify for your own needs: # LOAD PACKAGES library(tidyverse) library(scales) library(gghighlight) # DOWNLOAD DATA df <- read_csv("https://covid.ourworldindata.org/data/ecdc/full_data.csv") # PARSE DATA df_parsed <- df %>% filter(total_cases >= 100) %>% # MINIMUM 100 CASES group_by(location) %>% mutate(n = n(), day_index = row_number()) %>% ungroup() %>% filter(n >= 25, # MINIMUM 25 DAYS !location %in% c('World', 'International')) # EXCLUDE # GRAPH df_parsed %>% ggplot(aes(day_index, total_cases, color = location, fill = location)) + geom_point() + geom_smooth() + gghighlight() + scale_y_log10(labels = comma_format()) + labs(title = "COVID-19: cumulative daily new cases by country (log scale)", x = "Days since 100th reported case", y = NULL, fill = NULL, color = NULL, caption = "by: @eeysirhc\nSource: Our World in Data") + facet_wrap(~location, ncol = 4) + expand_limits(x = 70) + theme_minimal() + theme(legend.position = 'none') ...

March 16, 2020 · Christopher Yee

Using R to calculate car lease payments

Purchasing a car is a significant time and financial commitment. There is so much at stake that the required song and dance with the sales manager don’t alleviate any fears about over paying. Thus, it is difficult to determine the equilibirum point at which the dealer will accept your offer versus how much you are willing to pay. I decided to write this for a few reasons: I am in the market for a new car Rather than doing actual car shopping I thought it would be more fun to procrastinate Online calculators are quite clunky when you want to compare and contrast monthly payments Hopefully, this will help others make more informed car buying decisions (TBD on Shiny app) Note: this guide will focus only on leasing and not the auto financing aspect of it ...

March 12, 2020 · Christopher Yee

How to interact with Slack from R

I think my tweet speaks for itself: Words can not express how excited I am to use this :D — Christopher Yee (@Eeysirhc) March 10, 2020 The goal of this article is to document how to send #rstats code and plots directly to Slack. Load packages library(slackr) library(slackteams) library(slackreprex) Slack credentials Member ID You can easily grab that from this guide here. Slack Key ID To retrieve your Slack key ID, login here and then follow the prompts. ...

March 11, 2020 · Christopher Yee

MakeoverMonday: Women in the Workforce

Goal of #makeovermonday is to transform some of my #rstats articles and visualizations to their python equivalent. Original plot for this #tidytuesday dataset can be found here. Load modules import pandas as pd import seaborn as sns import matplotlib.pyplot as plt Download and parse data df_raw = pd.read_csv("https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2019/2019-03-05/jobs_gender.csv", sep=',', error_bad_lines=False, index_col=False, dtype='unicode') # FILTER ONLY FOR 2016 df_raw = df_raw[df_raw['year']=='2016'] df_raw = df_raw[['major_category', 'total_earnings_male', 'total_earnings_female', 'total_earnings', 'total_workers', 'workers_male', 'workers_female']] # REMOVE NULL VALUES df_raw = df_raw.dropna() Clean data Need to transform our data from objects to numerical values. ...

February 17, 2020 · Christopher Yee

Deciphering Hopper's Data Puzzle

I like to browse company career pages once in awhile to see what positions they have open. In my opinion, this provides a glimpse into what they are investing in for the next few years. Hopper is one company which stands out but the reason I am writing this is a puzzle they included in the job description: At Hopper, every dataset tells a story. Do you have what it takes to decipher the clues? bit.ly/2q6U8dq ...

February 7, 2020 · Christopher Yee

Using R & GSC data to identify stale content

My friend John-Henry Scherck recently tweeted his process on how to refresh stale content: Put together a quick video on how to refresh stale content using nothing more than Google Search Console and a word doc. Check out the full video here: https://t.co/Vva4Zm4mNn pic.twitter.com/74Fm2oIz4c — John-Henry Scherck (@JHTScherck) January 21, 2020 I imagine this can be broken down into five distinct parts: Stale content selection Understanding keyword intent Actually refreshing the content Internal link optimization Publish This short guide will focus on the first aspect where we’ll use #rstats to remove the manual work associated with stale candidate selection. ...

January 21, 2020 · Christopher Yee

[Updated] Top Industries from Inc.5000 Companies

Changelog Originally published on September 10th, 2019 Built a Shiny app for this Full code can be found on GitHub One of my favorite online marketers, (the) Glen Allsopp, tweeted the following: Over the past few weeks I've went through every site in the Inc. 5000. My mind has been blown multiple times. Don't click if you're easily distracted. Enjoy! https://t.co/mHVK8rvb9X pic.twitter.com/BoEb3qQ7LZ — Glen Allsopp (@ViperChill) August 27, 2019 The public spreadsheet contains four fields: ...

January 7, 2020 · Christopher Yee