Families of Continuous Distributions
- Typically, the PDF, , is parameterized by the parameter . Hence, technically a family of continuous distributions is the collection of PDFs for all .
- Create a distribution object for each of the continuous distributions we cover.
using Distributions
dists = [
Uniform(10,20),
Exponential(3.5),
Gamma(0.5,7),
Beta(10,0.5),
Weibull(10,0.5),
Normal(20,3.5),
Rayleigh(2.4),
Cauchy(20,3.5)]
println("Distributions \t\t\t Parameters \t Support")
reshape([dists; params.(dists); ((d) -> (minimum(d), maximum(d))).(dists)],
length(dists),3)
Continuous Uniform Distribution
- The continuous uniform distribution describes the case where the outcome of a continuous random variable has a constant likelihood of occuring over some finite intervel. Since the integral of the PDF must equal one, given an interval , the PDF is given by
- Example consider the case of a fast spinning circular disk, such as a hard drive. Imagine now there is a small defect on the disk, and we define as the clockwise angle (in radians) the defect makes with the read head at an arbitary time. In this case, is modeled by the continous uniform distribution over .
using Distributions, Plots, LaTeXStrings; gr()
cUnif = Uniform(0,2π)
xGrid, N = 0:0.01:2π, 10^6
stephist(rand(N)*2π, bins=xGrid,
normed=:true, c=:blue,
label="MC estimate")
plot!(xGrid, pdf.(cUnif, xGrid),
c=:red, ylims=(0,0.2),label="PDF",ylabel="Density",xticks=(collect(0:π/2:2π),
["0", L"\dfrac{\pi}{2}", L"\pi", L"\dfrac{3\pi}{2}", L"2\pi"]))
Exponential Distribution
- Exponential Distribution is used to model random durations between occurrences. A non-negative random variable , exponentially distributed with a rate parameter , has PDF
- mean
- variance
- CCDF
- In Julia, exponentially distribution is parameterized by the mean, rather than by
- Exponential random variables has a lack of memory propety.
- A similar property holds for geometric random variables. This hint at the fact that exponential random variables are the continuous analogs of geometric random variables. Consider a transform of an exponential random variable ( represents the mathematical floor funciton.) In this case, is no longer a continuous random variable, but is discrete in nature.
- If we set , we observe that is a geometric random variable which starts at and has success parameter .
- A similar property holds for geometric random variables. This hint at the fact that exponential random variables are the continuous analogs of geometric random variables. Consider a transform of an exponential random variable ( represents the mathematical floor funciton.) In this case, is no longer a continuous random variable, but is discrete in nature.
using StatsBase,Distributions, Plots; pyplot()
lambda, N = 1, 10^6
xGrid = 0:6
expDist = Exponential(1/lambda)
floorData = counts(convert.(Int, floor.(rand(expDist, N))), xGrid) / N
geomDist = Geometric(1-MathConstants.e^(-lambda))
plot(xGrid, floorData,
line=:stem, marker=:circle,
c=:blue, ms=10, msw=0, lw=4,
label="Floor of Exponential")
plot!(xGrid, pdf.(geomDist, xGrid),
line=:stem, marker=:xcross,
c=:red, ms=6, msw=0, lw=2,
label="Geometric", ylims=(0,1),
xlabel="x", ylabel="Probability")
Gamma Distribution and the Squared Coefficient of Variation
- The gamma distribution is commonly used for modeling asymmetric non-negative data.It generalizes the exponential distribution and chi-squared distribution.
- Example The lifetimes of light bulbs are exponentially distributed with mean . Now imagine we are lighting a room continuously with a single bulb, and that we replace the bulb with a new one when it burns out. If we starts at time , what is the distribution of time until bulbs are replaced?
- One way to describe this time is by the random variable , where
- The PDF of the gamma distribution is a function (in ) proportional to , where the non-negative parameters and are called the rate parameter and shape parameter, respectively. In order to normalize the function, we need to divide by
- this integral can be represented by , where is a well-known mathmatical special function called the gamma function
- thus the PDF of gamma distribution is
- In the light bulbs case, we have that , with shape parameter . In general for a gamma random variable, , the shape parameter does not have to be a whole number. It can analytically be evaluated that
- The squared coefficient of variation
- Simulation by Julia
- set the rate parameter for the light bulbs at
- Note that the Julia funciton
Gamma()is not parameterized by , but by
using Distributions, Plots; pyplot()
lambda, N = 1/3, 10^5
bulbs = [1,10,50]
xGrid = 0:0.1:10
C = [:blue :red :green]
dists = [Gamma(n, 1/(n*lambda)) for n in bulbs]
function normalizeData(d::Gamma)
sh = Int64(shape(d))
data = [sum(-(1/(sh*lambda))*log.(rand(sh))) for _ in 1:N] # using inverse probability transform to generating the random values which from the exponential distribution
end
L = ["Shape = "*string.(shape.(i))*", Scale = "*
string.(round.(scale.(i), digits=2)) for i in dists]
stephist(normalizeData.(dists), bins=50,
normed=:true, c=C, xlims=(0,maximum(xGrid)), ylims=(0,1),
xlabel="x", ylabel="Density", label="")
plot!(xGrid, [pdf.(i,xGrid) for i in dists], c=C, label=reshape(L,1,:))
Beta Distribution and Mathematical Special Functions
Beta Distribution
- The beta distribution is a commonly used distribution when seeking a parameterized shape over a finite support. It is parameterized by two non-negative parameters, and . It has a density proportional to for . By using different positive values of and , a variety of shapes can be produced. One common example is , in which cas the distribution defaults to the uniform(0,1) distribution.
- In order to normalize the distribution, we use
- Hence the PDF is .
- Using the gamma function above, we can represent by (mathematically, this is called the inverse of the beta function)
Gamma function
- The gamma function is a type of special function, and is defined as
- It is a continuous generalization of factorial. We know that for positive integer ,
- The gamma function exhibits similar propertiex, and one can evaluate it via integration by parts,
- Note furthermore that, . Hence, we see that for integer values of ,
- In Julia,
gamma()andbeta()are special mathematical functions, butGamma()andBeta()are the constructor for the distribution.
using SpecialFunctions, Distributions
a, b = 2, 7
x = 0.75
betaAB1 = beta(a,b)
betaAB2 = (gamma(a)gamma(b))/gamma(a+b)
betaAB3 = (factorial(a-1)factorial(b-1))/factorial(a+b-1)
betaPDFAB1 = pdf(Beta(a,b),x)
betaPDFAB2 = (1/beta(a,b))*x^(a-1)*(1-x)^(b-1)
println("beta($a, $b) = $betaAB1,\t$betaAB2,\t$betaAB3")
println("betaPDF($a, $b) at $x = $betaPDFAB1,\t$betaPDFAB2")
- Another property of the gamma function is that
using QuadGK, SpecialFunctions
g(x) = x^(0.5-1)*MathConstants.e^(-x)
quadgk(g, 0, Inf)[1], sqrt(pi), gamma(1/2)
Weibull Distribution and Hazard Rates
- The Weibull distribution and hazard rate function is often used in reliability analysis and survival analysis. For a random variable , representing the lifetime of an individual or a component, an interesting quantity if the instantaneous chance of failure at any time, given that the component has been operating without failure up to time . This can be expressed as
- Alernatively, by using the conditional probability and the PDF for small , we can express the above as
- Here the function is called the hazard rate, and it is a common method of viewing the distribution for lifetime random variable . In fact, we can reconstruct the CDF by
- Hence, every continuous non-negative random variable can be described uniquely by its hazard rate. The Weibull distribution is naturally defined through the hazard rate by considering hazard rate function that have a specific form. It is a distribution with (where is positive and takes on any real value.)
- If then the hazard rate is constant, in which case the Weibull distribution is actually an exponential distribution with rate . If , then the hazard increases over time. This depicts a situation of "aging components", i.e. the longer a component has live, the higher the instantaneous chance of failure. This is called Increasing Failure Rate (IFR). Conversely, depicts a situation where the longer a component has lasted, the lower the chance of its failing. This is called Decreasing Failure Rate (DFR).
- Based on the formulas above, we can obtain the CDF and PDF
- In Julia, the distribution is parameterized slightly differently via
- In this case, is called the scale parameter and is the shape parameter.
using Distributions, Plots, LaTeXStrings; pyplot()
alphas = [0.5, 1.5, 1]
lam = 2
toLambda(dist::Weibull) = shape(dist)*scale(dist)^(-shape(dist))
theta(lam, alpha) = (alpha/lam)^(1/alpha)
dists = [Weibull.(a, theta(lam, a)) for a in alphas]
hA(dist,x) = pdf(dist, x)/ccdf(dist, x)
hB(dist, x) = toLambda(dist)*x^(shape(dist)-1)
xGrid = 0.01:0.01:10
hazardA = [hA.(d, xGrid) for d in dists]
hazardB = [hB.(d, xGrid) for d in dists]
println("Maximum difference between two implementations of hazard: ",
maximum(maximum.(hazardA-hazardB)))
Cl = [:blue :red :green]
Lb = [L"\lambda=" * string(toLambda(d)) * ", " * L"\alpha=" * string(shape(d))
for d in dists]
plot(xGrid, hazardA, c=Cl, label=reshape(Lb,1,:), xlabel="x",
ylabel="Instantaneous failure rate", xlims=(0,10), ylims=(0,10))
Gaussian (Normal) Distribution
- The Gaussian distribution is a symmetric "bell curved"-shaped distritbution. Examples include the distribution of heights among adult humans and noise disturbances of electrical signals.
- The Guassian distribution is defined by two parameters and , which are the mean and variance respectively. Standard normal signifies the case of a normal distribution with and , the PDF is
- The CDF of the normal distribution is not available as a simple expression. However, it is frequently needed and hence stastical tables or software are often used. The CDF of the standard normal variable is
- The is the error function. It is a mathematical special function defined as
- With tabulated, one can move to a general normal random variable with mean and variance . In this case, the CDF is available via
- Plot standard normal PDF, along with its first and second derivatives in Julia
using Distributions, Calculus, SpecialFunctions, Plots, LaTeXStrings; pyplot()
xGrid = -5:0.01:5
PhiA(x) = 0.5*(1+erf(x/sqrt(2)))
PhiB(x) = cdf(Normal(),x)
println("Maximum difference between two CDF implementations: ",
maximum(PhiA.(xGrid) - PhiB.(xGrid)))
normalDensity(z) = pdf(Normal(),z)
d0 = normalDensity.(xGrid)
d1 = derivative.(normalDensity, xGrid)
d2 = second_derivative.(normalDensity, xGrid)
plot(xGrid, [d0 d1 d2], c=[:blue :red :green], label=[L"f(x)" L"f'(x)" L"f''(x)"])
plot!([-5,5],[0,0], color=:black, lw=0.5, xlabel="x", xlims=(-5, 5), label="")
Rayleigh Distribution and Box-Muller Transform
Rayleigh Distribution
- Consider an exponentially distributed random variable, , with rate parameter where . If we set a new random variable,
- Thus, by differentiating, we get the density (Rayleigh Distribution with parameter )
- Consider two indepenent normally distributed random variables, and , each with mean and standard deviation . Thee is Rayleigh distributed just as above. This property yields a method for generating normal random variables. It also yields a stastical model often uesed in radio communication called Rayleigh fading.
using Distributions, Random, Plots; pyplot()
Random.seed!(1)
N = 10^6
sig = 1.7
data1 = sqrt.(-(2*sig^2)*log.(rand(N)))
distG = Normal(0,sig)
data2 = sqrt.(rand(distG,N).^2 + rand(distG,N).^2)
distR = Rayleigh(sig)
data3 = rand(distR,N)
println(mean.([data1, data2, data3]))
p1 = histogram(data1, normed=:true, bins=50, label="", title="data1")
p2 = histogram(data2, normed=:true, bins=50, label="", title="data2")
p3 = histogram(data3, normed=:true, bins=50, label="", title="data3")
plot(p1,p2,p3, layout=(1,3),size=(1200,400), xlims=(-1,10),ylims=(0,0.4), grid=:none)
Box-Muler Transform — A way to generating Normally distributed random variables

- Consider this picture above representing the relationship between the pair and their polar coordinate counterparts, and . Assume now that the Cartesian coordinates of points are identically normally distributed, with independent of and set . In this case, by representing and in polar coordinates we have that the angle is uniformly distributed on and the radius is distributed as a Rayleigh random variable.
- Often is not needed. Hence, in practice, given two independent uniform(0,1) random variables and , we can get the standard normal distribution
- Simulation by Julia
using Distributions, Random, Plots; plotly()
Random.seed!(1)
Z() = sqrt(-2*log(rand()))*cos(2π*rand())
xGrid = -4:0.01:4
histogram([Z() for _ in 1:10^6], bins=50,
normed=:true, label="MC estimate")
plot!(xGrid, pdf.(Normal(), xGrid),
c=:red, lw=4, label="PDF",
xlims=(-4,4), ylims=(0,0.5), xlabel="x", ylabel="f(x)")
Cauchy Distribution
- The Cauchy Distribution, also known as the Lorentz distribution. At first glance, a plot of PDF looks very similar to the normal distribution. However, it is fundamentally different as its mean and standard deviation are undefined. The PDF of Cauchy distribution is (where is the location parameter at which the peak is observed and is the scale parameter.)
- Consider a drone hovering stationary in the sky at unit height. A pivoting laser is attached to its undercarriage, which pivots back and forth as it shoots pulses at the ground. At any point the laser fires, it makes an angle from the vertical . Since the laser fires at a high frequency as it is pivoting, we can assume that the angle is distributed uniformly on . For each shot from the laser, a point can be measured, , horizontally on the ground from the point above which the drong is hovering. We can consider the horizontal measurement as a new random variable, . Hence, the CDF is
- Thus the PDF is
- This is a special case of Cauchy distribution, with and .
- Importantly, the expectional integral,
- is not defines since each of the one-sided improper integrals does not converge. Hence, a Cauchy random vairable is an exaple of a distribution without a mean.
- Simulation by Julia to illustate the mena of Cauchy distribution is not converge.
julia