Bayesian Inference

Bayesian Inference Framework

We aim to extract information about $\Theta$, based on observing a collection $X = (X_1,…,X_n)$

  • Unknow $\Theta$
    • treated as a random variable
    • prior distribution $p_\Theta$ or $f_\Theta$
  • Observation $X$
    • observation model $p_X|\Theta$ or $f_X|\Theta$
  • Use appropriate version of the bayes rule to find $p_{\Theta|X}(\cdot|X=x)$

Principal Bayesian Estimation Method

  • Maximum a posterior probability (MAP) rule Select the possible parameter with maximum conditional/posterior probability given the data
  • Least mean squares (LMS) estimation Select an estimator/function of the data that minimizes the mean squared error between the parameterand its estimate
    • $p_{\Theta|X}(\theta^*|x)=\mathop{max}_{\theta}p_{\Theta|X}(\theta|x)$

Example (Inferring the Unknown Bias of A Coin)

We wish to estimate the probability of heads, denoted by $p$, suppose the prior is a beta density with $a,b$, that is , $p \sim Beta(a,b)$. We consider n independent tosses and let $X$ be the number of heads observed

MAP Estimate The posterior PDF of $p$ has the form

Hence the posterior density is beta with parameters $a+k$ and $b+(n-k)$

By MAP rule, we select the eatimator as

Let $g(p) = p^{a+k-1}(1-p)^{b+(n-k)-1}$,then we have

To find $p^*$ let

which yields

When the prior distribution of $p$ is $Unif(0,1)$, that is $a=0,b=0$, the estimator under MAP rule is $\hat{p}_{MAP} = \frac{k}{n}$

LMS Estimate By Beta-Binomial conjugacy, $f(p|X=k) \sim Beta(a+k,b+n-k)$, the expectation of random variable $Y\sim Beta(a,b)$ is $E(Y) = \frac{a}{a+b}$, we have

When the prior distribution of $p$ id $Unif(0,1)$, that is $a = 1,b = 1$, the estimator under MAP rule is $\hat{p}_{MAP} = \frac{k+1}{n+2}$


Classical Inference

  • Classical Statistics: unknown constant $\theta$
    • also for vectors $X$ and $\theta$ : $p_{X_1,…,X_n}(x_1,…,x_n; \theta_1,…,\theta_m)$
    • $p_X(x;\theta)$ are NOT conditional probabilities; $\theta$ is not random
    • mathematically: many models, one for each possible value of $\theta$

For example, the data observation model is $X\sim Binomial(n,\theta)$, Then under each possible value of $\theta$, the candidate model is

Classical Inference use the maximum likelihood to estimate the $\theta$


Sampling Moments

Definition (Moments)

Let $X$ be an r.v. with mean $\mu$ and variance $\sigma^2$, The $n^{th}$ moment of $X$ is $E(X^n)$, the $n^{th}$ central moment is $E((X-\mu)^n)$, and the $n^{th}$ standardized moment is $E((\frac{X-\mu}{\sigma})^n)$

Definition (Sample Moments)

Let $X_1,…,X_n$ be i.i.d. random variables, the $k^{th}$ sample moment is the

The sample mean $\bar{X}_n$ is the first sample moment:

Theorem (Mean and Var of Sample Mean)

Let $X_1,…,X_n$ be i.i.d. r.v.s with unknown mean $\mu$ and variance $\sigma^2$. Then the sample mean $\bar{X}_n$ is unbiased for estimating $\mu$. That is

The variance is

Definition (Sample Variance)

Let $X_1,…,X_n$ be i.i.d. random variables. The sample variance is the r.v.

Theorem (Unbiaseness of Sample Var)

Let $X_1,…,X_n$ be i.i.d. r.v.s with unknown mean $\mu$ and variance $\sigma^2$. The Sample Var $S_n^2$ is unbiased for estimating $\sigma^2$

Definition (Convergence with Probability)

Let $X_1,X_2,…$ be random variables. $X_n$ converges almost surely (a.s.) to the random variable $X$ as $n\rightarrow \infty$ and only if

Notation: $X_n \xrightarrow{a.s.} X \text{ as } n\rightarrow \infty$

Example

Definition (Convergence in Probability)

Let $X_1,X_2,…$ be random variables. $X_n$ converges in probability to the random variable $X$ as $n\rightarrow \infty$ if and only if for every $\epsilon >0$

Notation: $X_n \xrightarrow{P} X \text{ as } n\rightarrow \infty$

Example


Law of large Numbers

Definition

Let $X_1,…,X_n$ be i.i.d. r.v. with finite mean $\mu$ and finite variance $\sigma^2$. The samplw mean $\bar{X}_n$ is defined as

The Sample mean $\bar{X}_n$ is itself an r.v. with mean $\mu$ and variance $\sigma^2/n$

Theorem (Strong Law of Large Numbers)

The event $\bar{X}_n \rightarrow \mu$ has probability $1$

Theorem (Weak Law of Large Numbers)

For all $\epsilon >0, P(|\bar{X}_n - \mu>\epsilon) \rightarrow 0$ as $n\rightarrow \infty$

Definition (Time Average)

Definition (Ensemble Average)


Central Limit Theorem

Theorem (Central Limit)

As $n\rightarrow \infty$

CLT Approximation For a large $n$, the distribution od $\bar{X}_n$ is approximately $N(\mu,\sigma^2/n)$

Example (CLT Example)

  • Poisson Convergence to Normal Let $Y\sim Pois(n)$. Consider $Y$ as sum of $n$ i.i.d. $Pois(1)$ r.v.s. For large $n$:
  • Gamma Convergence to Normal Let $Y\sim Gamma(n,\lambda)$. Consider $Y$ as sum of $n$ i.i.d. $Expo(\lambda)$ r.v.s. For large $n$:
  • Binomial Convergence to Normal Let $Y\sim Bin(n,p)$. Consider $Y$ as sum of $n$ i.i.d. $Bern(p)$ r.v.s. For large $n$:

De Moivre-Laplace Approximation

  • Possion Approximation When $n$ is large and $p$ is samll
  • Normal Approximation When $n$ is large and $p$ is around $1/2$

Inequality

Basic Inequalities

  • Cauchy-Schwarz Inequality

  • Jensen’s Inequality

    • If $g$ is a convex function and $X$ is a r.v. then
    • If $g$ is a concave function
  • Markov’s Inequality

  • Chebyshev’s Inequality Let $X$ have mean $\mu$ and variance $\sigma^2$

  • Chernoff’s Inequality

Concentration Inequalities

Hoeffding Lemma Let r.v. $X$ satisfy $E(X) = 0$ and $X\leq b$, Then for any $h>0$

Hoeffding Inequality Let the r.v. $X_1,X_2,…,X_n$ be independent, with $x_k\leq X_k\leq b_k$ for each $k$, Let $S_n = \sum_{k=1}^n X_k$. Then

.