XWiki

NEXT: section 9, reading 17a

Probability and Statistics
Probability and Random Variables
MathML editor
Asciimath has all the syntax. TODO: copy the MathJax.js file to my server.
Short Intro to R
R Language Features
R Language Tutorials
Common Derivatives & Integrals

Set Equations
Distributions:
Discrete Random Variables
Continuous Random Variables
Expected Value
Variance
Covariance
Correlation Coefficient
Quantiles
Central Limit Theorem
Joint Distributions
Maximum Likelihood Estimates

Set Equations

`P(A|B) = (P(A nn B)) / (P(B))` provided `P(B)!=0`

` = (|A nn B|)/(|B|)`

`"Bayes Theorem" = P(B|A) = (P(A|B)*P(B))/(P(A))`

`"Bayes Theorem" = P(B|A) " " alpha " " P(A|B)*P(B)` where `alpha` means "is proportional to"

`(A uu B)^c = A^c nn B^c`

`(A nn B)^c = A^c uu B^c`

This means A is independent of B:

`P(A nn B) = P(A)*P(B)`

`P(A|B) = P(A)`

`P(B|A) = P(B)`

`"Addition of probabilities" = P(A) = P(A|B) * P(B) + P(A|B^c)*P(B^c)`

`P(E^c) = "complement of E" = 1 - P(E)`

`O(E^c) = 1 / (O(E))`

`O(E) = "odds" = (P(E)) / (1 - P(E))`

`P(E) = (O(E)) / (1 + O(E))`

`BF = "Bayes factor" = (P(A|H))/(P(A|H^c)`

If C and D are conditionally independent:

`O(H|C,D) = BF_C * BF_D * O(H)`
`ln(O(H|C,D)) = ln(BF_C) + ln(BF_D) + ln(O(H))`

Bernoulli Distribution

Range X : 0, 1

`P(X=1) = p`

`P(X=0) = 1 - p`

`X ~ "Bernoulli"(p)` or `"Ber"(p)`

`"Ber"(p) = "Bin"(1,p)`

`E(X) = p`

`Var(X) = (1-p)p`

Binomial Distribution

This is the sum of independent Bernoulli(p) variables.

Range X : 0, 1, … n

`X ~ "Binomial"(n,p)`

`p(k) = "Bin"(n,k) = ((n),(k)) p^k (1-p)^(n-k)` = dbinom(k,n,p)

`"Bin"(n,n) = p^n`

`E(X) = np`

Var(X)=np(1-p)

Geometric Distribution

This models number of failures before first success in a sequence of coin flips (Bernoulli(0.5) trials).

Range X : 0, 1, 2, ...

`X ~ "geometric"(p) or "geo"(p)`

`p(k) = P(X=k) = (1-p)^k p` = dgeom(k,p)

`E(X) = (1-p)/p`

`P(X=n+k | X >= n) = P(X=k)`

Var(X) = (1-p) / p^2

Uniform Distribution

Where all outcomes are equally likely.

Range X : `[a, b]`

`p(k) = 1 / (b-a)`

`E(X) = (a + b) / 2`

`X ~ "uniform"(a,b)` or `U(a,b)`

`f(x) = 1 / (b-a)` for `a <= x <= b`

`F(x) = (x-a)/(b-a)` for `a <= x <= b`

`Var(X) = (b - a)^2/12`

Exponential Distribution

Models: Waiting times

X ~ exponential(`lambda`) or exp(`lambda`)

Parameter: `lambda` (called the rate parameter)

Range: `[0,oo)`

Density: `f(x) = lambda e^(-lambda x)` for `x >= 0` = dexp(x,`lambda`)

`F(x) = 1 - e^(-lambda x)`

`mu = 1/lambda`

Normal distribution

Models: Measurement error, intelligence/ability, height, averages of lots of data.

Range: `(-oo,oo)`
Parameters: `mu` `sigma`
Notation: `"normal"(mu,sigma^2)` or `N(mu,sigma^2)`
Density: `f(x)=1/(sigma sqrt(2pi)) e^(-(x-mu)^2/(2sigma^2))` = dnorm(x,μ,σ)
Distribution: F(x) has no formula, so use tables or software such as pnorm in R to compute F(x).
- pnorm(.6,0,1) returns the .6 quantile of the standard normal distribution.
Standard Normal Cumulative Distribution : N(0,1) = `Phi(z)` : has mean 0 and variance 1.
Standard Normal Density: `phi(z) = 1/sqrt(2pi) e^(-x^2/2)`
`N(mu,sigma^2)` has mean `mu`, variance `sigma^2`, and standard deviation `sigma`.
`P(-1 <= Z <= 1) ~~ 0.6826895`, `P(-2 <= Z <= 2) ~~ 0.9544997`, `P(-3 <= Z <= 3) ~~ 0.9973002`
`Phi(x) = P(Z <= x)`
`Phi(1) = P(Z <= 1) ~~ 0.8413447 = "pnorm"(1,0,1)`
`Phi(2) = P(Z <= 2) ~~ 0.9772499 = "pnorm"(2,0,1)`
`Phi(3) = P(Z <= 3) ~~ 0.9986501 = "pnorm"(3,0,1)`
`P(|Z|) = "pnorm"(Z) - "pnorm"(-Z)`

Normal-normal update formulas for n data points

Normal distribution is its own conjugate prior. If normal prior and normal likelihood then we get normal posterior.

So, if prior is `N(mu_"prior", sigma_"prior"^2)` and likelihood is `N(theta, sigma^2)` then posterior is `N(mu_"post", sigma_"post"^2)`.

`a = 1 / sigma_"prior"^2`
`b = n / sigma^2`
`bar x = (x_1 + … + x_n) / n`
`mu_"post" = (a mu_"prior" + b bar x) / (a + b)`
`sigma_"post"^2 = 1 / (a+b)`

Beta distribution

`f(theta) = c theta^(a-1) (1-theta)^(b-1)` = dbeta(θ,a,b)

`c = ((a+b-1)!) / ((a-1)!(b-1)!)`

Beta distribution is a conjugate prior of the binomial distribution. This means if the prior is a beta and the likelihood is a binomial then the posterior is also beta.

So, if prior is dbeta(p,a,b) and likelihood is dbinom(k,n,p) then posterior is dbeta(p,a+k,b+n-k).

if prior is dbeta(p,a,b) and likelihood is dgeom(k,p) then posterior is dbeta(p,a+1,b+k).

Discrete Random Variables

Random variable X assigns a number to each outcome: have stuff `X : Omega -> R`

`X = a " denotes event " {omega | X(omega) = a}`

`"probability mass function (pmf) of X is given by: " p(a) = P(X=a)`

`"Cumulative distribution function (cdf) of X is given by: " F(a) = P(X<=a)`

Continuous random variables

`"Cumulative distribution function (cdf)" = F(x) = P(X<=x) = int_-oo^x f(t) dt`

`"Probability density function (pdf)" = P(c<=x<=d) = int_c^d f(x) dx "for" f(x)>=0`

cdf of X is `F_x(x)=P(X <= x)`
pdf of X is `f_X(x)=F'_X(x)`

Properties of the cdf (Same as for discrete distributions)

(Definition) `F(x) = P(X<=x)`
`0 <= F(x) <= 1`
non-decreasing
`lim_(x->-oo) F(x) = 0`
`lim_(x->oo) F(x) = 1`
`P(c < X <= d) = F(d) - F(c)`
`F'(x) = f(x)`

Expected Value (mean or average)

weighted average = `E(X) = sum_(i=1)^n x_i * p(x_i)`
`E(X+Y) = E(X) + E(Y)`
`E(aX+b) = a*E(X) + b`
`E(h(X)) = sum_i h(x_i) * p(x_i)`
`E(X-mu_x) = 0`
`E(X) = int_a^b x * p(x) dx` (units for p(x) are probability/dx).
(not sure if this is correct) `E(XY) = int_c^d int_a^b xy * p(x,y) dx dy`

Variance

`"mean" = E(X) = mu`
variance of X = `Var(X) = E((x-mu)^2) = sigma^2 = sum_(i=1)^n p(x_i)(x_i-mu)^2`
standard deviation = `sigma = sqrt(Var(X))`
`Var(aX+b) = a^2 Var(X)`
`Var(X) = E(X^2) - E(X)^2 = E(X^2) - mu^2`
If X and Y are independent then: `Var(X+Y) = Var(X) + Var(Y)`

Covariance

Measure of how much two random variables vary together.

A positive value means an increase in one leads to an increase in the other. A negative value means an increase in one leads to a decrease in the other. A value of 0 does not mean they are independent of each other though. (see `Y=X^2`)

`"Cov"(X,Y)="Covariance"=E((X-mu_X)(Y-mu_Y))`
`"Cov"(X,Y)=int_c^d int_a^b (x-mu_x)(y-mu_y)f(x,y)dxdy`
` = (int_c^d int_a^b xy f(x,y) dxdy) - mu_x mu_y`

Properties:

`"Cov"(aX+b, cY+d) = ac"Cov"(X,Y)`
`"Cov"(X_1+X_2,Y) = "Cov"(X_1,Y) + "Cov"(X_2,Y)`
`"Cov"(X,X) = "Var"(X)`
`"Cov"(X,Y) = E(XY) - mu_X mu_Y`
`"Var"(X+Y) = "Var"(X) + "Var"(Y) + 2"Cov"(X,Y)`
If X and Y are independent then Cov(X,Y) = 0.

Correlation Coefficient

Ratio of the linear relationship between variables. Doesn't apply for higher order relationships (like `Y=X^2`)

`rho` is the covariance of the standardizations of X an Y.

`"Cor"(X,Y) = rho = ("Cov"(X,Y))/(sigma_X sigma_Y)`
`rho` is dimensionless (it's a ratio)
`-1 <= rho <= 1`
`rho = +1` iff `Y=aX+b` with a > 0
`rho = -1` iff `Y=aX+b` with a < 0

Quantiles

The 60th percentile is the same as the 0.60 quantile and the 6th decile. The 3rd quartile would be 75th percentile.

median is x for which `P(X<=x) = P(X>=x)`
median is when cdf `F(x) = P(X<=x) = .5`
The p^th quantile of X is the value `q_p` such that `F(q_p)=P(X<=q_p)=p`. In this notation `q_.5` is the median.

Central Limit Theorem & Law of Large Numbers

LoLN: As n grows, the probability that E(X) is close to `mu` goes to 1.
LoLN: `lim_(n->oo) P(|E(X) - mu| < alpha) = 1`
CLT: As n grows, the distribution of E(X) converges to the normal distribution `N(mu,sigma^2/n)`
`E(S_n) = mu_S = n mu`
`"Var"(S_n) = n sigma^2`
`sigma_(S_n) = sqrt(n) sigma`
`S_n = sum_(i=1)^n X_i`
`bar X_n = S_n / n`
For large n, Z = standardization of X = `(X - mu_S)/sigma_S = (S - n mu) / (sqrt(n sigma^2)) = (bar X - mu) / (sigma/sqrt(n))`
Z has mean 0, standard deviation 1
If X has a normal distribution, Z is the standard normal distribution

Joint Distributions

range [a,b] x [c,d]
f(x,y) = joint density
`F(x,y)="joint cdf"=P(X<=x,Y<=y)=int_c^y int_a^x f(u,v) du dv`
`f(x,y)=(d^2 F)/(dx dy)(x,y)`
marginal pdf = `f_X (x) = int_c^d f(x,y) dy`
marginal cdf = `F_X (x) = F(x,d)`
X and Y are independent if `f(x,y)=f_X (x) f_Y (y)`
X and Y are independent if `F(X,Y)=F_X (x) F_Y (y)`

Maximum Likelihood Estimates

likelihood function = `f(x_1,...,x_n|p)` = distributions multiplied together (if independent)

log likelihood = `ln(f(x_1,...,x_n|p))`

`hat p` = maximum likelihood estimate for p : `(delta "likelihood function") / (delta p) = 0`

`hat p` = maximum likelihood estimate for p : `(delta "log likelihood function") / (delta p) = 0`

For uniform distributions, max likelihood results from:

`hat a = "min"(x_1, ..., x_n)`
`hat b = "max"(x_1, ..., x_n)`

Notations:

θ is the value of the hypothesis.
p(θ) is the prior probability mass function of the hypothesis.
p(θ|D) is the posterior probability mass function of the hypothesis given the data.
p(D|θ) is the likelihood function. (This is not a pmf!)

hypothesis	prior	likelihood	Bayes numerator	posterior	posterior predictive probability	posterior#2
θ	P(θ)	P(D\|θ)	P(θ) * P(D\|θ)	P(θ\|D)	P(θ\|D) * P(D\|θ)	P(θ\|D2)
A	aP	aL	a = aP * aL	aP2 = a / SUM	a2 = aP2 * aL	a2 / SUM2
B	bP	bL	b = bP * bL	bP2 = b / SUM	b2 = bP2 * bL	b2 / SUM2
C	cP	cL	c = cP * cL	cP2 = c / SUM	c2 = cP2 * cL	c2 / SUM2
total	1		SUM	1	SUM2	1

Law of total probability: prior predictive probability = `p(x) = int_a^b p(x|theta) f(theta) d theta`

	hypothesis	prior	likelihood	Bayes numerator	posterior
	H	P(H)	P(D\|H)	P(D\|H)P(H)	P(H\|D)
Discrete θ	θ	p(θ)	p(x\|θ)	p(x\|θ)p(θ)	p(θ\|x)
Continuous θ	θ	f(θ) dθ	p(x\|θ)	p(x\|θ)f(θ) dθ	f(θ\|x) dθ

Version 195.1 last modified by Geoff Fortytwo on 18/01/2018 at 15:01

Attachments: 0

No attachments for this document

MIT Probability Reference

Top Menu

Set Equations

Bernoulli Distribution

Binomial Distribution

Geometric Distribution

Uniform Distribution

Exponential Distribution

Normal distribution

Normal-normal update formulas for n data points

Beta distribution

Discrete Random Variables

Continuous random variables

Expected Value (mean or average)

Variance

Covariance

Correlation Coefficient

Quantiles

Central Limit Theorem & Law of Large Numbers

Joint Distributions

Maximum Likelihood Estimates

Document data

Attachments: 0

Search

Halloween Costumes

My Projects

My Interests

My Stuff

Programming

Navigation

Blog

Main

Photos

Sandbox

Scheduler

Stats