TidyTuesday: Cocktails pt.2
This is part 2 of TidyTuesday: Cocktails. Below shows how we can use #rstats to write a cocktail recommendation system that takes in a drink and returns a few other cocktails based on similarly mixed ingredients. Load libraries library(tidyverse) library(recommenderlab) Download and parse data Note: please check out part 1 for deatils on processing steps bc_raw <- read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2020/2020-05-26/boston_cocktails.csv') bc <- bc_raw %>% mutate(ingredient = str_to_lower(ingredient)) %>% distinct() %>% select(name, ingredient) bc_tidy <- bc %>% filter(!str_detect(ingredient, ",")) bc_untidy <- bc %>% filter(str_detect(ingredient, ",")) %>% mutate(ingredient = str_split(ingredient, ", ")) %>% unnest(ingredient) bc_clean <- rbind(bc_tidy, bc_untidy) %>% distinct() df <- bc_clean %>% mutate(ingredient = str_replace_all(ingredient, "-", "_"), ingredient = str_replace_all(ingredient, " ", "_"), ingredient = str_replace_all(ingredient, "old_mr._boston_", ""), ingredient = str_replace_all(ingredient, "old_thompson_", "")) df_processed <- df %>% mutate(value = 1) %>% pivot_wider(names_from = name) %>% replace(is.na(.), 0) Recommendation algorithm Transform data to binary rating matrix cocktails_matrix <- df_processed %>% select(-ingredient) %>% as.matrix() %>% as("binaryRatingMatrix") Create evaluation scheme scheme <- cocktails_matrix %>% evaluationScheme(method = "cross", k = 5, train = 0.8, given = -1) Input customer cocktail preference Let’s check the ingredients for a very simple cocktail: ...