Monte Carlo Business Case Analysis using pandas
This package provides some tools and examples to analyse high-impact business decisions, and dodge many of the problems of using spreadsheets.
Pandas is increasingly becoming a standard tool in scientific computing. Could it also have a role in the CFO's office? CFOs regularly need to analyse the impact of different projects or business cases, and they almost universally do this using spreadsheets. Spreadsheets have many advantages - they have a low barrier to entry and are easy for most people to understand. However as they get more complicated, disadvantages start to appear; in particular, they can be inflexible and highly error-prone.
The application here is made more interesting and useful by being 'Monte Carlo' analysis. Traditional business case analysis takes single point estimates of sales, costs and prices, and calculates a single profit forecast. Everybody knows the profit will not turn out to be exactly equal to the forecast. But it is not clear what the range of profits might be, or how likely a loss is. 'Monte Carlo' analysis solves this problem by allowing ranges or distributions on the assumptions; the forecast is then a range of outcomes.
For more background, see the talk.
Slides and an ipython notebook are included in the
talks directory. The talk, titled "Monte Carlo Business Case Analysis using pandas", was presented at the Science and Data mini-conference for PyCon Australia (1 Aug 2014), as well as the Sydney Python group on 5 June 2014. It is an introduction to both Monte Carlo Business Case Analysis and pandas. You can find it online at racingtadpole.com/blog/business-case-python-pandas. The slides have been prepared using the html5 presentation engine Shower.
montepylib directory includes:
io.py- routines to simplify data input
manip.py- routines to manipulate DataFrames
utils.py- utility routines like
sim.py- the core simulation routines
extend.py- adds some useful methods to
examples directory for more, including testing your model.
First install the package:
pip install montepylib
Then, in python, try:
from pandas import Series, DataFrame import pandas as pd from montepylib.sim import Simulator volume = DataFrame([['Factory A','Widget X',200,30.0],['Factory A','Widget Y',100,15.0],['Factory B','Widget X',300,50.0]], columns=['factory','widget','mean','sd']) costs = DataFrame([['Factory A','Widget X',1,.2],['Factory A','Widget Y',.6,.1],['Factory B','Widget X',1.2,.25],['Factory B','Widget Y',.7,.12]], columns=['factory','widget','mean','sd']) price = DataFrame([['Widget X',5,.5], ['Widget Y',3,.3]], columns=['widget','mean','sd']) mc = Simulator() N = 5000 sim_vol = mc.sim_from_params(N, volume, 'volume') sim_costs = mc.sim_from_params(N, costs, 'cost') sim_price = mc.sim_from_params(N, price, 'price') sim = mc.merge(sim_vol, sim_costs, sim_price, how='left') sim['margin'] = sim['price'] - sim['cost'] sim['profit'] = sim['volume'] * sim['margin'] sim_factory = sim.groupby(['factory', 'iteration']).sum()[['profit','volume']] total_per_iteration = sim_factory.groupby(level='iteration').sum() total_per_iteration.hist(xrot=90); summary_per_factory = sim_factory.groupby(level='factory').describe().unstack(level=0) summary_per_factory # profit volume # factory Factory A Factory B Factory A Factory B # count 5000.000000 5000.000000 5000.000000 5000.000000 # mean 1041.392873 1142.445923 299.692542 300.461622 # std 167.320088 255.161823 33.146621 50.388224 # min 411.263495 370.741291 150.234686 127.378759 # 25% 929.578744 965.513249 277.282007 266.361845 # 50% 1032.065933 1129.047618 299.882982 300.278990 # 75% 1148.330681 1308.399140 321.756739 334.676815 # max 1808.573213 2167.622599 420.227083 478.421165
For the base case, use
N = 0:
N = 0 # include the same code as above to calculate sim_factory sim_factory # profit volume # factory iteration # Factory A base case 1040 300 # Factory B base case 1140 300
For sensitivity analysis, use
N = -1:
N = -1 # include the same code as above to calculate total_per_iteration total_per_iteration # profit volume # iteration # base case 2180 600 # high cost 2055 600 # high price 2460 600 # high volume 2526 695 # low cost 2305 600 # low price 1900 600 # low volume 1834 505
examples directory for a more detailed example, including testing your model.
Pandas is not the only way to do this. For another approach, I have also explored using R - see this post for some examples, and see the book "Business Case Analysis with R" by Robert D. Brown, 2013. Or if you want to free yourself from using simulations and deal directly with mathematical distributions, you could even use Mathematica...