Approaches to Choosing Probability Distributions

As noted earlier, the most challenging task in creating a simulation model often involves selecting appropriate probability distributions (and associated parameters) to describe the behavior of each uncertain input variable. In some cases, you might have data on the past (historical) behavior of your uncertain input variables that can assist in this task; in other cases, you probably won't.  Following are some suggestions for choosing probability distribution in each of these cases.  

When Past Data is Available

If you have, or you can collect data on the past performance of the uncertain variable -- and if you believe that past performance is likely to be representative of future performance -- you have three options:

  • If you have a reasonably large number of observations of past performance of the variable, compared to the number of simulation trials you want to run, you can use the past data itself for simulation trials.  This is sometimes called a “trace-driven simulation.”  In Risk Solver, you can use past data in the form of a SIP (Stochastic Information Packet), and use the PsiSip() or PsiSlurp() distribution function for the uncertain variable.
  • If -- as is often the case -- you have a relatively small number of observations of past performance compared to the number of trials you want to run, you may be better off resampling the past performance data.  Instead of using all of the past observations (one per simulation trial), you randomly sample the past observations on each trial.  In Risk Solver, you can place past data in a cell range or SIP, and use the PsiDisUniform() function (single values) or the PsiResample() function (multiple values) for the uncertain variable.
  • If you can fit the data (past observations) to a specific type of analytic distribution and its parameters, and if there is reason to believe that the underlying process that the uncertain variable is measuring is consistent with the assumptions from which the analytic distribution is derived, you can use this distribution (for example PsiNormal, PsiWeibull, etc.) for the variable.  You can use Risk Solver to automatically find the best-fitting analytic distribution.

When Past Data is Not Available

If you don’t have, and you cannot easily collect data on the past performance of the uncertain variable -- or if past performance is not likely to be representative of future performance -- you must tackle the problem in a different way:

  • Consult the literature for your industry, if available, to find examples of applications like yours where simulation models were built.  Find out -- by contacting the authors if necessary -- what kinds of distributions were used for the uncertain variables, and the rationale for choosing them.
  • If you cannot find reports on industry-specific applications like yours, consult the publications of professional societies like INFORMS, where simulation applications are reported.  One rich source is the past proceedings of the Winter Simulation Conference (www.wintersim.org).
  • The Risk Solver User Guide has a chapter PSI Function Reference, with descriptions of 40 different analytic and custom distribution functions, including brief comments on the types of applications where each distribution has been used in the past.
  • To learn more about analytic distributions, consult textbooks such as Simulation Modeling and Analysis, 4th Ed. by Averill Law, Statistical Distributions, 3rd Ed. by Evans, Hastings and Peacock, Univariate Discrete Distributions, 3rd Ed. by Johnson, Kemp and Kotz, or Continuous Univariate Distributions, Vol. 1 & 2, 2nd Ed. by Johnson, Kotz and Balakrishnan.

You are well-advised to keep it simple!  Many physical, social and biological phenomena are well described by the Normal distribution, or -- if the possible values are equally likely to occur, as in a coin flip or single die -- the Uniform distribution.  Bear in mind that when any set of distributions are summed, the result (quickly) tends towards the Normal distribution.

Applications that involve queuing -- customers arriving or departing, parts awaiting assembly, etc. -- have been well studied, so you can often find appropriate distributions in the literature.  Applications that use the Project Evaluation and Review Technique (PERT) can often use the PsiPert() function in Risk Solver to model uncertainty.