Families of Discrete Distributions

  • A families of probability distributions is a collection of probability distributions having some funcitonal form that is parameterized by a well-defined parameters.
  • In the discrete case, the PMF, , is parameterized by the parameter where is called the parameter space. The parameter affects the actual form of the PMF, including the support of the random variable. Hence, a family of distributions is the collection of PMFs for all
using Distributions
dists = [
    DiscreteUniform(10,20),
    Binomial(10, 0.5),
    Geometric(0.5),
    NegativeBinomial(10,0.5),
    Hypergeometric(30,40,10),
    Poisson(5.5)]

println("Distribution \t\t\t\t Parameters \t Support")
reshape([dists; params.(dists);
    ((d) -> (minimum(d), maximum(d))).(dists)],
    length(dists),3)

Discrete Uniform Distribution

  • Discrete uniform distribution is simply a probability distribution that places equal probabilities for all equal outcomes.
  • One example is given by the proability of the oucomes of a die toss. The probability of each possible outcome for a fair, six-sided die is given by
  • Simulation by Julia
using StatsBase, Plots; pyplot()

faces, N = 1:6, 10^6
mcEstimate = counts(rand(faces,N),faces)/N

plot(faces, mcEstimate,
    line=:stem, marker=:circle,
    c=:blue, ms=10, msw=0, lw=4, label="MC estimate")
plot!([i for i in faces], [1/6 for _ in faces],
    line=:stem, marker=:xcross, c=:red,
    ms=6, msw=0, lw=2, label="PMF",
    xlabel="Face number", ylabel="Probability", ylims=(0,0.22))

Binomial Distribution

  • Binomial distribution is a discrete distribution with arise there multiple identical and independent yes/no, true/false, success/failure trials (also known as Bernoulli trials) are performed. For each trial, there can only be two outcomes, and the probability weightings of each unique trial must be the same.
  • As an example, consider a two-sided coin, which is flipped times in a row. If the probability of obtaining a head in a single flip is , then the probability of obtaining heads total is given by the PMF
  • Simulation by Julia
using StatsBase, Distributions, Plots; pyplot()

binomialRV(n,p) = sum(rand(n) .< p)

p,n,N = 0.5, 10, 10^6

bDist = Binomial(n,p)
xGrid = 0:n
bPmf = [pdf(bDist, i) for i in xGrid]
data = [binomialRV(n,p) for _ in 1:N]
pmfEst = counts(data, 0:n)/N

plot(xGrid, pmfEst, 
    line=:stem, marker=:circle,
    c=:blue, ms=10, msw=0, lw=4, label="MC estimate")
plot!(xGrid, bPmf,
    line=:stem, marker=:xcross, c=:red,
    ms=6, msw=0, lw=2, label="PMF", xticks=(0:1:10),
    ylims=(0,0.3), xlabel="x", ylabel="Probability")

Geometric Distribution

  • In this case, consider an infinite sequence of independent trials, each with success probability , and let be the first trial that is successful. Using first principles it is easy to see that the PMF is:
  • An alernative version of the geometric distribution is the distribution of the random variable , counting the number of the number of failures until success.
  • In the Julia Distributions package, Geometric stands for the distribution of , not .
  • Example Roulette is a game of chance, where a ball is spun on the inside edge of a horizontal wheel. As the ball loses momentum, it eventually falls vertically down, and lands on one of 37 spaces, numbered to . There are 18 black spaces, 18 red, and a single space ("zero") is green. Each spins of the wheel is independent, and each of the possible 37 outcomes is equally likely. Now let us assume that a gambler goes to the casino and plays a series of roulette spins. There are various ways to bet on the outcome of roulette, but in this case he always bets on black (if the ball lands on black he wins, otherwise he loses). Say that the gambler plays until his first win. In this case, the number of plays is a geometric random variable with support .
using StatsBase, Distributions, Plots; pyplot()

function rouletteSpins(p)
    x = 0
    while true
        x += 1
        if rand() < p
            return x
        end
    end
end

p, xGrid, N = 18/37, 1:7, 10^6
mcEstimate = counts([rouletteSpins(p) for _ in 1:N], xGrid)/N
gDist = Geometric(p)
gPmf = [pdf(gDist,x-1) for x in xGrid]
plot(xGrid, mcEstimate, line=:stem, marker=:circle,
    c=:blue, ms=10, msw=0, lw=4, label="MC estimate")
plot!(xGrid,gPmf, line=:stem, marker=:xcross,
    c=:red, ms=6, msw=0, lw=2, label="PMF",
    ylims=(0,0.5), xlabel="x", ylabel="Probability")

Negative Binomial Distribution

  • Recall the previous example above of a roulette gambler. Assume now that the gambler plays until he wins for -th time. The negative binomial distribution describes this situation. The PMF is
  • Notice that with the expression reduces to the geometric PMF. Similarly to the geometric case, there is an alternative version of the negative binomial distribution. Let denote the number of failures until the -th success. Here, like in the geometric case, when both random variables are coupled on the same sequence of trials, we have .
  • We simulate a gambler who bets consistently on black much like in the previous example, and determine the PMF for . That is, we determine the probabilities that plays will occur up to the -th success (or win).
using StatsBase, Distributions, Plots

function rouletteSpins(r,p)
    x = 0
    wins = 0
    while true
        x += 1
        if rand() < p
            wins += 1
            if wins == r
                return x                
            end
        end
    end
end

r, p, N = 5, 18/37, 10^6
xGrid = r:r+15

mcEstimate = counts([rouletteSpins(r,p) for _ in 1:N],xGrid)/N

nbDist = NegativeBinomial(r,p)
nbPmf = [pdf(nbDist,x-r) for x in xGrid]

plot(xGrid, mcEstimate,
    line=:stem, marker=:circle, c=:blue,
    ms=10, msw=0, lw=4, label="MC estimate")
plot!(xGrid, nbPmf, line=:stem,
    marker=:xcross, c=:red, ms=6, msw=0, lw=2, label="PMF",
    xlims=(0,maximum(xGrid)), ylims=(0,0.2),
    xlabel="x", ylabel="Probability")

Hypergeometric Distribution

  • Considering the fishing problem where we fish without replacement. In this scenario, each time we sample from the population it decreases, and hence the probability of success changes from each subsequent sample. The hypergeometric distribution describes this situation. The PMF is given by
    • Here the parameter is the population size, and is the number of successes present in the population (this implies that is the number of failures present in the population). The parameter is the number of samples taken from the population, and the input argument is the number of successful samples observed. Hence, a hypergeometric random variable with describes the number of successful samples when sampling without replacement.
    • To understand of the support of the distribution, first consider the least possible value, . It is either or if . The latter case stems from a situation where the number of samples is greater than the number of failures present in the population. As for the upper value of the support, it is because if then it isn't possible to sample only success.
  • Example consider a pond which contains a combination of gold and silver fish. In this example, there are fish total, and we will define the catch of a gold fish a success, and a silver fish a failure. Now say that we sample fish without replacement. We consider several of these cases, where the only difference between each is the number of success, , (gold fish) in the populaiton.
  • Similation by Julia
    • Hypergeometric(numberOfSuccesses::Int, numberOfFailures::Int, sampleSize::Int)
using Distributions, Plots; pyplot()

L, K, n = 500, [450, 400, 250, 100, 50], 30
hyperDists = [Hypergeometric(k, L-k, n) for k in K]
xGrid = 0:1:n
pmfs = [pdf.(dist, xGrid) for dist in hyperDists]
labels = "Successes = ".* string.(K)

bar(xGrid, pmfs,
    alpha=0.8, c=[:orange :purple :green :red :blue],
    label=hcat(labels...), ylims=(0,0.25),
    xlabel="x", ylabel="Probability", legend=:top)

Poisson Distribution and Poisson Process

  • The Poisson process is a stochastic process (random process) which can be used to model occurrences of events over time. It may be used to model the arrival of customers to a system, the emission of particles from radioactive material, or packets arriving to a communication router. The Poisson process is the canonical example of a point process capturing the most sensible model for completely random occurrences over time.
  • In a Poisson process, during an infinitesimally small time interval, , it is assumed that (as ) there is an occurrence () with probability , and no occurrence () with probability . Furthermore, as , it is assumed that the chance of or more occurrences () during an interval of length tends to . Here is the intensity of the Poisson process, and has the property that when multiplied by an interval of length , the mean number of occurrences during the interval is
  • For a Poisson process over the interval , the number of occurrences satisfies
  • The PMF of Poisson distribution is
    • The mean of Poisson distribution is , hence, the number of occurrences in a Poisson process during is Poisson distributed with parameter
  • The properties of Poisson process:
    • Consider the random variable such that
      • where is a sequence of i.i.d. uniform(0,1) random variables and
      • Seeking such a random variable produces an efficient recipe for generating a Poisson random variable. That is, the is Poisson distributed with mean . Notice that the recipe dictated by above formula is to continue multiplying uniform random variables to a "running product" until the product goes below the desired level
    • Simulation this property by Julia
using StatsBase, Distributions, Plots; pyplot()

function prn(lambda)
    k, p = 0, 1
    while p > MathConstants.e^(-lambda)
        k += 1
        p *= rand()
    end
    return k-1
end

xGrid, lambda, N = 0:16, 5.5, 10^6

pDist = Poisson(lambda)
bPmf = pdf.(pDist, xGrid)
data = counts([prn(lambda) for _ in 1:N], xGrid)/N

plot(xGrid, data,
    line=:stem, marker=:circle,
    c=:blue, ms=10, msw=0, lw=4, label="MC estimate")
plot!(xGrid, bPmf, line=:stem,
    marker=:xcross, c=:red, ms=6, msw=0, lw=2, label="PMF",
    ylims=(0,0.2), xlabel="x", ylabel="Probability of x events")

results matching ""

    No results matching ""