Introduction of PGM

Why Using PGM

For a joint distribution $P(X_1,X_2,…,X_n)$, can be written in two ways

All Dependent: $P(X_1,X_2,…,X_n) = P(X_1)P(X_2|X_1)P(X_2|X_1,X_2)…P(X_n|X_1,…,X_{n-1})$
- pro: the formular are correct in any case
- con: require huge probability table $O(k^n)$
Independent: $P(X_1,X_2,…,X_n) = P(X_1)P(X_2)…P(X_n)$
- pro: probability table only take $O(kn)$ memory
- con: independent are restrictive hypothese

The PGM are target a middle-ground between two extremes

Two types of GMs

Directed edges give causality relationships (Bayesian Network or Directed Graphical Model)
Undirected edges simply give correlations between variables (Markov Random Field or Undirected Graphical model)

Bayesian Network

Firstly the Bayesian Network is DAG (Directed Acyclic Graph)

Factorization Theorem : The probability of specific BN

$P(X_1,X_2,...,X_n) = \prod_{i=1:n} P(X_i|Parents(X_i))$

Graph model and Bayesian

Suppose the distribution is $A\leftarrow B \rightarrow C$

Bayesian Interpretation: $P(ABC) = P(B) P(A|B) P(C|B)$
Graph Model Interpretation: $I(G) = \{ A\bot C |B \}$
now we want to prove $I(P(ABC)) = I(\{ A\bot C |B \})$
$\begin{align} P(AC|B) &= P(A|B)P(C|B) \leftarrow \{ A\bot C |B \}\\ P(AC|B) &= \frac{P(ABC)}{P(B)} = \frac{ P(B) P(A|B) P(C|B)}{P(B)} \leftarrow P(ABC) \end{align}$

conditional Independence :
for $X\leftarrow Z \rightarrow Y$, $Z$ represent the height of father, $X,Y$ are the brother, the relation between $X$ and $Y$ :

Dependent : we dont know the height of father, but we know the height of $Y$, so $Y$ may effect the distribution of $Z$, at the same time also influence the $X$
Independent : we already know the height of father, so whether $Y$ is dont influence $X$

local Markov assumption :
each node $X_i$ is independent of its nondescendants given its parents.
_when you confirm the parents, you only dependent with your childs_

$I_l(G):\{ X_i\bot NonDescendants_{X_i} | Pa_{X_i} : \forall i \}$

Independencies (Three Basic Model)

a: Cascade
- $Y$ observed $X,Z$ are independent
b: Common parent
- $Y$ observed $X,Z$ are independent
- $Y$ unknow $X,Z$ are dependent
c: V-structure
- $Y$ observed $X,Z$ are dependent
- $Y$ unknow $X,Z$ are independent

Let $P$ be a distribution of $X$, $I(P)$ is the set of independence assertions of the form $(X \bot Y | Z)$ that hold in $P$. $I(G)$ is the sub-set of the $I(P)$,

We say that $K$ is an $I$-map for a _set_ of independencies $I$ if $I(K) ⊆ I$

Active trail

Causal Trail $X → Z → Y$ : active iff $Z$ is not observed.
Evidential Trail $X ← Z ← Y$ : active iff $Z$ is not observed.
Common Cause $X ← Z → Y$: active iff $Z$ is not observed.
Common Effect $X → Z ← Y$ : active iff $Z$ (or any of its descendents) is observed.
_here active means the dependent relation estibilish_

Definition : Let $\textbf{X}, \textbf{Y} , \textbf{Z}$ be three sets of nodes in $G$. We say that $\textbf{X}$ and $\textbf{Y}$ are d-separated given $\textbf{Z}$, denoted $dsep_G(\textbf{X};\textbf{Y}|\textbf{Z})$, if there is no active trail between any node $X \in \textbf{X}$ and $Y \in \textbf{Y}$ given $\textbf{Z}$

Definition: $I(G)=$ all independence properties that correspond to d- separation:

$I(G) = {X \bot Y | Z : dsep_G(X \bot Y | Z)}$

Undirected GM

An undirected graphical model represents a distribution $P(X1,…Xn)$ defined by an undirected graph H, and a set of positive-valued potential functions $\psi_c$ corresponding to each clique $c \in C$ of $H$ such that:

$\begin{align} P(X_1,...,X_n) = \frac{1}{Z} \prod_{c\in C} \psi_c(X_c) \\ Z = \sum_{X_1,...,X_n} \prod_{c\in C} \psi_c(X_c) \end{align}$

potential function can be joint and conditional probability function, or even a table of values

Clique Example

max-cliques = $\{A,B,D\}, \{B,C,D\}$,
sub-cliques = $\{A,B\}, \{C,D\}, …\text{ all edges and single point}$

$\begin{align} P^{\prime}(x_1,x_2,x_3,x_4) = \frac{1}{Z} \psi_c(X_{124}) \times \psi_c(X_{234}) \\ Z = \sum_{x_1,x_2,x_3,x_4} \psi_c(X_{124}) \end{align}$

Independence:

global Markov independencies $I(H)$ $= \{ A\bot C|B:sep_H(A,C|B) \}$
- any disjoint A,B,C in distribution, B separates A and C, A is independent of C given B.
local Markov independencies $I_l(H)$ $=\{X_i \bot V \setminus (X\cup MB_{X_i})|MB_{X_i} :\forall i \}$
- Independent with node that dont near it.
pairwise Markov independencies $I_p(H)$ $=\{X \bot Y|V \setminus \{X,Y\}:\{X,Y\}\notin E\}$
- Independent when no shared edge.
Markov blanket of $X_i$ denoted $MB_{X_i}$, is the neighbors of $X_i$ in graph
B separates A and C if every path from A to C through B: $sep_H(A,C|B)$

relation of local and global

Thm: $P\models I(H) \Rightarrow P \models I_l(H) \Rightarrow P \models I_p(H)$
Corollary: For a positive distribution P, global, local, and pairwise indepedencies are equivalent

Exponential Model

Form discussed above, we need find a clique potential can ensure the positive distribution, one form is negative exponential: $\Psi (\mathbf{x}_c) = exp\{ -\phi_c(\mathbf{x_c}) \}$

The exponential form of the distribution structure is:

$p(\mathbf{x}) = \frac{1}{Z}\mathop{exp}\left\{ - \sum_{c\in C} \phi_c(\mathbf{x_c}) \right\}=\frac{1}{Z} \mathop{exp}\{ -H(\mathbf{x}) \}$

$H(\mathbf{x})$ is the “free energy”.

Boltzmann Machines

A Boltzmann Machine is a fully connected graph with pairwise potentials on binary-valued nodes. The energy function for this is expressed in sub-clique form, which comes from the physics tradition.

$\begin{align} P(\mathbf{x}) &= \frac{1}{Z} \mathop{exp} \left\{ \sum_{i,j} \phi_{ij}(x_i,x_j) \right\} \\ &= \frac{1}{Z} \mathop{exp} \left\{ \sum_{i,j} \theta_{ij}x_ix_j + \sum_i \alpha_i x_i + C \right\} \\ &= \frac{1}{Z} \mathop{exp} \left\{ (x-\mu)^T \Theta (x-\mu) \right\} \end{align}$

Restricted Boltzmann Machines

This is inspired by the Boltzmann Machine and is responsible for much of the deep learning craze. An RBM consists of many layers. Within each layer, there are two sublayers: one of hidden units (factors, $h_j$), and one of visible units ($x_i$). The probability function for an RBM is

$p(x,h|\theta) = \mathop{exp} \left\{ \sum_i \theta_i \phi_i (x_i) + \sum_j \theta_j \phi_j(h_j) + \sum_{i,j} \theta_{i,j} \phi_{i,j} (x_i,h_j) - A(\theta) \right\}$

Conditional Random Fields

…

PGM-Representation

2018-08-08

PGM-Representation

Introduction of PGM

Why Using PGM

Bayesian Network

Graph model and Bayesian

Independencies (Three Basic Model)

Active trail

Undirected GM

Independence:

Exponential Model

Boltzmann Machines

Restricted Boltzmann Machines

Conditional Random Fields

PGM-Representation

Introduction of PGM

Why Using PGM

Bayesian Network

Graph model and Bayesian

Independencies (Three Basic Model)

Active trail

Undirected GM

Independence:

Exponential Model

Boltzmann Machines

Restricted Boltzmann Machines

Conditional Random Fields

thanks~