Probability Distributions for Simulation

For experienced modelers, the most challenging task in creating a simulation model is usually not identifying the key inputs and outputs, but selecting an appropriate probability distribution and parameters to model the uncertainty of each input variable.  For example, Risk Solver software provides over 40 probability distributions -- so which one should you use?  The answer depends on your application, but some general guidelines can be given.

Discrete Vs. Continuous Distributions

If you must choose or create your own distribution, the first step is to determine whether to use a discrete or continuous form.

  • If there are a small number of possible values for the uncertain variable, you may be able to use a discrete analytic distribution, or construct a discrete custom distribution. If the underlying physical process involves discrete, countable entities -- such as the number of customers arriving at a service window -- you can use a discrete distribution.
  • If the possible values are highly divisible -- such as most prices, volumes, interest rates, exchange rates, weights, distances, etc. -- you will likely use a continuous distribution.  In some cases, you may use a continuous distribution to approximate a discrete distribution.

Bounded Vs. Unbounded Distributions

Another characteristic that distinguishes probability distributions is the range of sample values they can generate.

  • Some distributions are intrinsically bounded -- samples are guaranteed to lie between a known minimum and maximum value.  Examples are the Uniform, Triangular, Beta, and Binomial distributions.
  • Other distributions are unbounded -- sample values may cluster around the distribution’s mean, but may sometimes have extreme negative or positive values.  Examples are the Normal, Logistic, and Extreme Value distributions.
  • Still other distributions are partially bounded, with a known minimum such as zero, but no maximum value.  Examples are the Exponential, Poisson, and Weibull distributions.

At times, you may find that the most appropriate distribution (say the Normal) is unbounded, but you know that the realistic values of the physical process are bounded, or your model is designed to handle values only up to some realistic limit.  Your software may allow you to truncate an unbounded distribution.  For example, in Risk Solver you can impose bounds on any distribution by passing the PsiTruncate property function as an argument to the distribution function.

Analytic Vs. Custom Distributions

A third characteristic of probability distributions is whether they are analytic (also called parametric) or custom (sometimes called non-parametric) distributions.

  • An analytic distribution has a form derived from certain theoretical assumptions about the problem.  For example, a Poisson distribution is derived from an assumption that events are independent and occur at a known average rate, and an Exponential distribution is derived from an assumption of a constant rate of decay in some process.
  • A custom distribution has a form dictated by either past data or expert opinion about the range and frequency of sample values.  Risk Solver software provides five general-purpose functions -- PsiCumul, PsiDiscrete, PsiDisUniform, PsiGeneral and PsiHistogram -- to help you model custom distributions.

Generally speaking, you should choose an analytic distribution if -- and only if -- the theoretical assumptions truly apply in your situation.

More Hints and Warnings

Using a Triangular Distribution.  If you only have estimates of the minimum, maximum, and most likely values of an uncertain variable -- and no other past data or literature references -- a popular approach is to create a Triangular distribution from these three numbers.  This is unlikely to be a highly accurate representation of the uncertainty, but it will allow you to get started, and it is far better than a single average that is subject to the Flaw of Averages.  If your ‘minimum’ and ‘maximum’ values are really low- and high-percentile estimates rather than the absolute lowest and highest values that can occur, consider using a 'generalized Triangular' distribution (PsiTriangGen in Risk Solver) instead.

Define Each Uncertain Variable Only Once.  Often, you’ll need to use the same uncertain variable in several different formulas in your model.  A very common error is to enter the same distribution function, with the same parameters (say PsiNormal(100, 10) in Risk Solver), several times in a model -- in a belief that these instances will yield the same results on each trial.  This is incorrect -- by doing this, you’ve actually defined several independent uncertain variables that may well sample different values on each trial.  You should instead define =PsiNormal(100, 10) only once (for example in a cell such as A1), and use A1 in every formula where the uncertain variable is needed.