Introduction of PGM
Why Using PGM
For a joint distribution $P(X_1,X_2,…,X_n)$, can be written in two ways
- All Dependent: $P(X_1,X_2,…,X_n) = P(X_1)P(X_2|X_1)P(X_2|X_1,X_2)…P(X_n|X_1,…,X_{n-1})$
- pro: the formular are correct in any case
- con: require huge probability table $O(k^n)$
- Independent: $P(X_1,X_2,…,X_n) = P(X_1)P(X_2)…P(X_n)$
- pro: probability table only take $O(kn)$ memory
- con: independent are restrictive hypothese
The PGM are target a middle-ground between two extremes
Two types of GMs
- Directed edges give causality relationships (Bayesian Network or Directed Graphical Model)
- Undirected edges simply give correlations between variables (Markov Random Field or Undirected Graphical model)
Bayesian Network
Firstly the Bayesian Network is DAG (Directed Acyclic Graph)
Factorization Theorem : The probability of specific BN
Graph model and Bayesian
Suppose the distribution is $A\leftarrow B \rightarrow C$
- Bayesian Interpretation: $P(ABC) = P(B) P(A|B) P(C|B)$
Graph Model Interpretation: $I(G) = \{ A\bot C |B \}$
now we want to prove $I(P(ABC)) = I(\{ A\bot C |B \})$
conditional Independence :
for $X\leftarrow Z \rightarrow Y$, $Z$ represent the height of father, $X,Y$ are the brother, the relation between $X$ and $Y$ :
- Dependent : we dont know the height of father, but we know the height of $Y$, so $Y$ may effect the distribution of $Z$, at the same time also influence the $X$
- Independent : we already know the height of father, so whether $Y$ is dont influence $X$
local Markov assumption :
each node $X_i$ is independent of its nondescendants given its parents.
_when you confirm the parents, you only dependent with your childs_
Independencies (Three Basic Model)

- a: Cascade
- $Y$ observed $X,Z$ are independent
- b: Common parent
- $Y$ observed $X,Z$ are independent
- $Y$ unknow $X,Z$ are dependent
- c: V-structure
- $Y$ observed $X,Z$ are dependent
- $Y$ unknow $X,Z$ are independent
Let $P$ be a distribution of $X$, $I(P)$ is the set of independence assertions of the form $(X \bot Y | Z)$ that hold in $P$. $I(G)$ is the sub-set of the $I(P)$,
We say that $K$ is an $I$-map for a _set_ of independencies $I$ if $I(K) ⊆ I$
Active trail
- Causal Trail $X → Z → Y$ : active iff $Z$ is not observed.
- Evidential Trail $X ← Z ← Y$ : active iff $Z$ is not observed.
- Common Cause $X ← Z → Y$: active iff $Z$ is not observed.
- Common Effect $X → Z ← Y$ : active iff $Z$ (or any of its descendents) is observed.
_here active means the dependent relation estibilish_
Definition : Let $\textbf{X}, \textbf{Y} , \textbf{Z}$ be three sets of nodes in $G$. We say that $\textbf{X}$ and $\textbf{Y}$ are d-separated given $\textbf{Z}$, denoted $dsep_G(\textbf{X};\textbf{Y}|\textbf{Z})$, if there is no active trail between any node $X \in \textbf{X}$ and $Y \in \textbf{Y}$ given $\textbf{Z}$
Definition: $I(G)=$ all independence properties that correspond to d- separation:
Undirected GM
An undirected graphical model represents a distribution $P(X1,…Xn)$ defined by an undirected graph H, and a set of positive-valued potential functions $\psi_c$ corresponding to each clique $c \in C$ of $H$ such that:
potential function can be joint and conditional probability function, or even a table of values
Clique Example
- max-cliques = $\{A,B,D\}, \{B,C,D\}$,
- sub-cliques = $\{A,B\}, \{C,D\}, …\text{ all edges and single point}$
Independence:
- global Markov independencies $I(H)$ $= \{ A\bot C|B:sep_H(A,C|B) \}$
- any disjoint A,B,C in distribution, B separates A and C, A is independent of C given B.
- local Markov independencies $I_l(H)$ $=\{X_i \bot V \setminus (X\cup MB_{X_i})|MB_{X_i} :\forall i \}$
- Independent with node that dont near it.
- pairwise Markov independencies $I_p(H)$ $=\{X \bot Y|V \setminus \{X,Y\}:\{X,Y\}\notin E\}$
- Independent when no shared edge.
- Markov blanket of $X_i$ denoted $MB_{X_i}$, is the neighbors of $X_i$ in graph
- B separates A and C if every path from A to C through B: $sep_H(A,C|B)$
relation of local and global
- Thm: $P\models I(H) \Rightarrow P \models I_l(H) \Rightarrow P \models I_p(H)$
- Corollary: For a positive distribution P, global, local, and pairwise indepedencies are equivalent
Exponential Model
Form discussed above, we need find a clique potential can ensure the positive distribution, one form is negative exponential: $\Psi (\mathbf{x}_c) = exp\{ -\phi_c(\mathbf{x_c}) \}$
The exponential form of the distribution structure is:
$H(\mathbf{x})$ is the “free energy”.
Boltzmann Machines
A Boltzmann Machine is a fully connected graph with pairwise potentials on binary-valued nodes. The energy function for this is expressed in sub-clique form, which comes from the physics tradition.
Restricted Boltzmann Machines
This is inspired by the Boltzmann Machine and is responsible for much of the deep learning craze. An RBM consists of many layers. Within each layer, there are two sublayers: one of hidden units (factors, $h_j$), and one of visible units ($x_i$). The probability function for an RBM is
Conditional Random Fields
…
.