如果一个变量在两个 clique 之间出现,那么这个变量一定在这两个 clique 的路径之间

learning is parameter estimation

sufficient statistics $T(x)$ means x self or some transformation of x

先验概率可理解为统计概率,后验概率可理解为条件概率


In Bayesian probability theory, if the posterior distributions $p(\theta | x)$ are in the same probability distribution family as the prior probability distribution $p(\theta)$, the prior and posterior are then called conjugate distributions, and the prior is called a conjugate prior for the likelihood function.

For example, the Gaussian family is conjugate to itself (or self-conjugate) with respect to a Gaussian likelihood function: if the likelihood function is Gaussian, choosing a Gaussian prior over the mean will ensure that the posterior distribution is also Gaussian

Learning Graphical Models

The target is given the assignments and predict the best (most likely) structure of the network.

“Optimal” here means the employed algorithms guarantee to
return a structure that maximizes the objectives (e.g., LogLik)

  • $\prod_n$ means enum $n$ data
  • $\prod_i$ means enum all nodes
  • $\mathbf{x}_{n,\pi_i(G)}$ is the assignments of x’s parents

M is the number of the state, add this we can turn the count function into probability representation

the right decomposited part is entropy !