Functions Describing Distributions

Cumulative Probabilities

  • Consider first the CDF(Cumulative Distribution Function) of a random variable , definded as
    • Where can be discrete, continuous or a more genral random variable. The CDF is a very popular descriptor because unlike the PMF and PDF, it is not restricted to just the discrete or continuous case. A closely related funciton is CCDF(Complement Culmutive Distribution Funciton)
    • From the definition of the CDF,
    • Furthermore, is a non-decreasing funciton. Any function with these properties constitutes a valid CDF and hence a probability distribution of a random variable.
    • In the case of a continuous random variable, the PDF and the CDF are related via
      • Since CDF is non-decreasing funciton and
    • In the case of discrete random variable, the PMF is related to the CDF via
  • In this Julia code, we look at an elementary example, where we consider the PDF and integrate it via a crude Riemann sum to obtain the CDF:
using Plots, LaTeXStrings;

f2(x) = (x<0 ? x+1 : 1-x) * (abs(x)<1 ? 1 : 0)
a, b = -1.5, 1.5
delta = 0.01

F(x) = sum([f2(u)*delta for u in a:delta:x])

xGrid = a:delta:b
y = [F(u) for u in xGrid]
plot(xGrid, y, c=:blue, xlims=(a,b), ylims=(0,1),
 xlabel=L"x", ylabel=L"F(x)", legend=:none)

Inverse and Quantiles

  • Where the CDF answers the question "what is the probability of being less than or equal to ", a dual question often asked is "what value of corresponds to a probability of the random variable being less than or equal to ". This is the inverse function of .
    • For example, take the sigmoid function as the CDF, which is as a type of logistic funciton
    • Solving for yields
      • Solving process
    • As we get , while we get . This is the inverse CDF for the distribution. Schemeatically, given a specified probability , it allows us to find value such that
    • The value satisfying is also called the -th quantile of the distribution. If is given as a percent, then it is called a percentile.
      • The median is another related term, and is also known as the -th quantile.
      • The quartiles is also a related term, with the first quartile at , the third quartile at , and the inter-quartile range, which is defined as
    • In more general cases, where CDF is not necessarily strictly increasing and continuous, we may still define the inverse CDF via ( is the infimum, which is the least elements of )
  • Example Consider an arbitrary customer arriving to a queue where the server is utilized 80% of time, and an average service takes 1 minute. How long does such a customer wait in the queue until service starts? Some customers won't wait at all (20% of the customers), whereas others will need to wait until those arrive before them are serviced. Results from the field of queuing theory give rise to the following distribution function for the waiting time:
    • Solving by Julia
using Plots, LaTeXStrings

xGrid = 0:0.01:10
uGrid = 0:0.01:1
busy = 0.8

F(t) = t<=0 ? 0 : 1 - busy*exp(-(1-busy)t)

infimum(B) = isempty(B) ? Inf : minimum(B)
invF(u) = infimum(filter(x -> F(x) >= u, xGrid))

p1 = plot(xGrid, F.(xGrid), xlims=(-0.1, 10), ylims=(0,1),
    xlabel=L"x", ylabel=L"F(x)")

p2 = plot(uGrid, invF.(uGrid), xlims=(0,0.95), ylims=(0, maximum(xGrid)), 
    xlabel=L"u", ylabel=L"F^{-1}(x)")

plot(p1, p2, legend=:none, size=(800,400))

Integral Transforms

  • An integral tansform of a probability distribution is a representation of the distribution on a different domain.
  • For a random variable and a real or complex fixed value , consider the expectation, . When viewed as a function of , this is the moment generating function. We present this here for a continuous random variable with PDF
    • Example of application: Consider two distribuions with densities:
    • where the respective random variables are denoted as and . Computing the MGF of these distributions, we obtain:
    • Define now a random variable, where and are assumed independent. It is known that the MGF of is the product of the MGFs of and . That is
    • The new MGF fully specifies the distribution of .
  • A key property of any MGF of a random variable is that
    • Hence to calculate the -th moment, one can simply evaluate the derivative of the MGF at .
  • Estimate both the PDF and MGF of by Julia.
using Distributions, Statistics, Plots

dist1 = TriangularDist(0,1,1)
dist2 = TriangularDist(0,1,0)
N = 10^6

data1, data2 = rand(dist1,N), rand(dist2,N)
dataSum = data1 + data2

mgf(s) = 4(1 + (s-1)*MathConstants.e^s) * (MathConstants.e^s-1-s)/s^4

mgfPointEst(s) = mean([MathConstants.e^(s*z) for z in rand(dist1,20)+rand(dist2,20)])


p1 = histogram(dataSum, bins=80, normed=true,
    ylims=(0,1.4), xlabel="z", ylabel="PDF")

sGrid = -1:0.01:1
p2 = plot(sGrid, mgfPointEst.(sGrid), ylims=(0,3.5),c=:blue)
p2 = plot!(sGrid, mgf.(sGrid), c=:red)
p2 = plot!([minimum(sGrid), maximum(sGrid)],
    [minimum(sGrid), maximum(sGrid)] .+ 1,
    xlabel="s", ylabel="MGF",c=:black)

plot(p1,p2,legend=:none, size=(800,400))

results matching ""

    No results matching ""