Predictive Analysis with Monte Carlo Simulation
I got interested in Monte Carlo simulation during one of my weekend reading, the more I read the more I appreciated the computational algorithm. This method is used to obtain a result (with a degree of probability) using random input with a knowledge of boundaries. The modern version of the Monte Carlo method was invented in the late 1940s by Stanislaw Ulam, while he was working on nuclear weapons projects at the Los Alamos National Laboratory. It was named by Nicholas Metropolis, after the Monte Carlo Casino, where Ulam’s uncle often gambled.  In this post I will briefly cover the basics of Monte Carlo Simulation and then I will detail a scenario where I was able to apply the basic method using MS-Excel for business capacity planning for a fictitious company.
Monte Carlo Simulation
Monte Carlo methods (or Monte Carlo experiments) are a broad class of computational algorithms that rely on repeated random sampling to obtain numerical results; typically one runs simulations many times over in order to obtain the distribution of an unknown probabilistic entity.  This is a stochastic simulation as there is a degree of randomness involved. The outcome of the simulation most likely will not be same every time. Since the outcome varies with every simulation, the question that begs an answer would be; how many times does the simulation needs to be run to gain confidence in the outcome that is generated. A randomly selected samples will exhibit the same behavior as the entire population from which it is selected is the assumption behind this method. By running the simulation using more number of times the experiment becomes more reproducible. Inferential statistics obtained by running the simulation several thousand times will produce a generalized result.. The range of the results gets narrower as the number of random samples increases. Using all the results of the simulations we can run perform statistical analysis to generate the various outcomes and the corresponding probability of occurrence. Following are the four steps involved in applying this model for analysis of a given situation:
- Identify a mathematical model of the activity or process you want to explore.
- Define the parameters (like mean and standard deviation) for each factor in your model.
- Create random data according to those parameters.
- Simulate and analyze the output of your process.
In this example, we have a online business that has been in running for the last few years. This company has been seeing an increase in the number of orders it has been receiving year-over-year. The business would like to perform capacity planning so that they can prepare their operations expansion to handle the transaction load for the next five years. The key information they need at this time is the total number of orders that their business could potentially generate in the fifth year from now. Let us look at how we can use Monte Carlo as a computation model using using Excel to provide this key information.
The following table shows the number of orders that the business received every year for the last 10 years. The table also shows the rate of increase in the orders for the same period. We will use this input information to determine mean rate of increase in the orders and also the standard deviation in the year-over-year order count. The mean is computed to be the average of the growth rate for the last 10 years and has been computed to be at 17.95%. The Standard Deviation, which is the the amount of variation or dispersion from the average, was calculated to be 53.98 %.
Let us assume that rate of increase follows a normal distribution. I have used the NORM.INV(RAND(), Mean, SD) to introduce the randomness to the distribution. Now we can apply the same distribution for the next five years to obtain our first possible outcome using the mean and the standard deviation. The order count of 2014 has been computed using the formula p (1 + r) where “p” is the previous years order count and “r” is the computed rate of growth for the year. When we apply the same formula for the next five years we get the following results and these results vary every time we run the simulation with a new randomness.
Now we can simulate running the same experiment several hundred or thousand times using the Data Table feature of “What-if” analysis in MS-Excel. We can then compute the mean of all the outcomes from the simulations to arrive at a possible order count which could be the very close to the actual value, if the same rate of growth is sustained for the next five years. In the following two pictures \some results from the simulation of 1000 iterations have been displayed with actual values and a graphical representation of the iterations plotted against the 1000 outcomes.
The above graph shows the results of the 1000 iterations plotted against the computed order count for the year 2018 which is the output of each iteration. Analyzing the output of the simulation we can determine that the mean of all the 1000 outcomes appears to be spread between 1.0 and 1.1 million orders. I had later run the same simulation with 10,000 iterations and I had noticed that the mean outcome spread has tightened further. Depending on the application the number of iterations can be very high only limited by the computational power at-hand. This information could be used by the business to plan for handling future work load. We have seen how Monte Carlo simulation is one of the tools that can be used to predict future values from a known set of uncertainty in input with boundaries. This computation model is used in many fields of Mathematics, Physical Sciences, Engineering, Finance, Project Management, Computational Biology to name a few.