With the end of the latest Star Wars trilogy, I wanted to compare, contrast, and explore Lucas vs Disney era domestic box office revenue.
The analysis and python code below will parse weekly ticket sales from Box Office Mojo, adjust revenue numbers for inflation, visualize, and attempt to uncover insights from the data.
TL;DR The top 3 revenue generating films (inflation-adjusted) are the first movie for each trilogy Disney era films do not make it past week 20 compared to the Lucas era On average, Lucas era movies generate 80% of their revenue within the first 10 weeks of release while Disney takes 2.8 weeks Load modules import pandas as pd import matplotlib.pyplot as plt import seaborn as sns sns.set(style="darkgrid") Define function def movie_revenue(era, trilogy, movie, url): # RETRIEVE DATA FROM URL movie_data = pd.read_html(url)[0] # CUMULATIVE REVENUE: TRANSFORM TO FLOAT AND CALCULATE PER MILLION (1e6) movie_data['Cumulative_Revenue'] = movie_data['To Date'].str.replace(',', '').str.replace('$', '').astype(int) movie_data['Cumulative_Revenue'] = movie_data['Cumulative_Revenue'] / 1e6 # WEEKLY REVENUE: TRANSFORM TO PER MILLION (1e6) FLOAT movie_data['Weekly_Revenue'] = movie_data['Weekly'].str.replace(',', '').str.replace('$', '').astype(int) movie_data['Weekly_Revenue'] = movie_data['Weekly_Revenue'] / 1e6 # SELECT WEEK INDEX & REVENUE DATA movie_data = movie_data[['Week', 'Weekly_Revenue', 'Cumulative_Revenue']] # ADD ADDITIONAL COLUMNS movie_data['era'] = era movie_data['trilogy'] = trilogy movie_data['movie'] = movie return(movie_data) Set parameters # LIST: PRODUCER ERA, TRILOGY, MOVIE NAME, URL TO CRAWL sw_list = [['Lucas','Prequel','The Phantom Menace','https://www.boxofficemojo.com/release/rl2742257153/weekly/'], ['Lucas','Prequel','Attack of the Clones','https://www.boxofficemojo.com/release/rl2809366017/weekly/'], ['Lucas','Prequel','Revenuge of the Sith','https://www.boxofficemojo.com/release/rl2943583745/weekly/'], ['Lucas','Original','A New Hope','https://www.boxofficemojo.com/release/rl2759034369/weekly/'], ['Lucas','Original','The Empire Strikes Back','https://www.boxofficemojo.com/release/rl2775811585/weekly/'], ['Lucas','Original','Return of the Jedi','https://www.boxofficemojo.com/release/rl2792588801/weekly/'], ['Disney','Sequel','The Force Awakens','https://www.boxofficemojo.com/release/rl2691925505/weekly/'], ['Disney','Sequel','The Last Jedi','https://www.boxofficemojo.com/release/rl2708702721/weekly/'], ['Disney','Sequel','The Rise of Skywalker','https://www.boxofficemojo.com/release/rl3305145857/weekly/'], ['Disney','SW Story','Rogue One','https://www.boxofficemojo.com/release/rl2557707777/weekly/'], ['Disney','SW Story','Solo','https://www.boxofficemojo.com/release/rl1954383361/weekly/']] Retrieve and parse data star_wars = [] for m in sw_list: data = movie_revenue(m[0], m[1], m[2], m[3]) star_wars.append(data) star_wars = pd.concat(star_wars) Spot check from random import randint star_wars.iloc[randint(0,len(star_wars))] ## Week 10 ## Weekly_Revenue 0.762516 ## Cumulative_Revenue 514.326 ## era Disney ## trilogy Sequel ## movie The Rise of Skywalker ## Name: 9, dtype: object Inflation-adjusted revenue # MOVIE TITLE m = [] for t in sw_list: title = t[2] m.append(title) # YEAR OF MOVIE y = [2000, 2002, 2005, 1977, 1980, 1983, 2016, 2018, 2019, 2017, 2018] # INFLATION RATE (https://www.bls.gov/data/inflation_calculator.htm) i = [0.497, 0.433, 0.32, 3.254, 2.129, 1.588, 0.074, 0.027, 0.008, 0.052, 0.027] # JOIN ALL THREE LISTS inflation = list(zip(m, y, i)) # CONVERT TO PANDAS DATAFRAME inflation = pd.DataFrame(inflation) # CREATE & APPLY COLUMN NAMES labels = ["movie", "year", "inflation_rate"] inflation.columns = labels # COMBINE DATAFRAMES star_wars_adjusted = star_wars.merge(inflation, how='left', on='movie') # INFLATION-ADJUSTED REVENUE: WEEKLY & CUMULATIVE star_wars_adjusted['Adjusted Weekly Revenue'] = star_wars_adjusted['Weekly_Revenue'] * (1 + star_wars_adjusted['inflation_rate']) star_wars_adjusted['Adjusted Cumulative Revenue'] = star_wars_adjusted['Cumulative_Revenue'] * (1 + star_wars_adjusted['inflation_rate']) Final spot check star_wars_adjusted.iloc[randint(0,len(star_wars_adjusted))] ## Week 46 ## Weekly_Revenue 0.792809 ## Cumulative_Revenue 213.955 ## era Lucas ## trilogy Original ## movie A New Hope ## year 1977 ## inflation_rate 3.254 ## Adjusted Weekly Revenue 3.37261 ## Adjusted Cumulative Revenue 910.166 ## Name: 101, dtype: object What is the total ticket sales for the entire franchise? star_wars_adjusted.agg({'Adjusted Weekly Revenue':'sum'})[0] ## 6125.216885688999 Over $6.6 billion in gross revenue (inflation-adjusted) was made from the US box office alone.
...