One of the learning problem of Bayesian Network is given network structure but with some missing values, how to learn the better model to do prediction.

Why is it a “Chicken-Egg” problem? If we have complete data, then we can estimate the model parameters based on them, however our data has missing values. On the other hand, if we have the parameters for our model, then we can infer the values of missing data. But again, we do not know the parameters yet. So which one comes first? That’s the reason why it is like a “Chicken-Egg” problem.

EM stands for expectation maximization. EM can be understood as the solver of such “Chicken-Egg” problem. The high level idea is that:

1)First, let’s do not care too much about the quality of initial model. Just compute the model parameters based on the data with complete values.

2) OK, now we have our toy initial model. Then we can do inference! So we can fill in the missing values using our current model.

3) Now we have complete data with no missing values. so we can build a better model based on them. Then starting from here, you go back to the above two steps, again and again, until you see no change in your models.

There are theories to demonstrate that through above iterative procedure, our model will get better and better. In other words, we do not need to care about the initial model too much, and eventually we will get the better model.