One class classification

I came across a good learning resource about one class classification:

In practice, weka provides two ways of doing one-class classification: one is under libsvm package; the other one is named: OneClassClassifier (, which is a meta classifier. In this case, you need to choose a basic classifier to go on.

From an experiment of the data set I collected, I noticed that OneClassClassifier with random forest as basic classifier performs better than one-class classification of SVM.


Bayesian network structure learning (1)

Problem: have all data but does not know the structure of the Bayesian Netowrk


determining whether there is an edge between two nodes (A,B): measure the likelihood p(B|A).

What’s the problem of using likelihood in structure learning?


likelihood cannot do better than completely connected network structure.

one way to avoid overfitting is to have a good estimation of the prior probability.

uniform prior is not good because MAP will be Maximum likelihood given uniform prior on hypothesis.

penalty of how many edges in the network.

put your knowledge into initial bayessian network. staring from there, do estimation on adding, adjusting, removing edges.

In all, how to learn unknow structure for Bayesian Network:

1. initial state: empty network  or prior network

2. operators: add arc, deletec arc, reverse arc

3. Evaluation: Posterior Probability

EM Algorithm: The way to solve the problem of “Chicken-Egg” in Machine Learning

One of the learning problem of Bayesian Network is given network structure but with some missing values, how to learn the better model to do prediction.

Why is it a “Chicken-Egg” problem? If we have complete data, then we can estimate the model parameters based on them, however our data has missing values. On the other hand, if we have the parameters for our model, then we can infer the values of missing data. But again, we do not know the parameters yet. So which one comes first? That’s the reason why it is like a “Chicken-Egg” problem.

EM stands for expectation maximization. EM can be understood as the solver of such “Chicken-Egg” problem. The high level idea is that:

1)First, let’s do not care too much about the quality of initial model. Just compute the model parameters based on the data with complete values.

2) OK, now we have our toy initial model. Then we can do inference! So we can fill in the missing values using our current model.

3) Now we have complete data with no missing values. so we can build a better model based on them. Then starting from here, you go back to the above two steps, again and again, until you see no change in your models.

There are theories to demonstrate that through above iterative procedure, our model will get better and better. In other words, we do not need to care about the initial model too much, and eventually we will get the better model.

Gibbs Classifier V.S. Bayes Optimal Classifier

Bayes Optimal Classifier: considering all the hypothesis, so is hopelessly inefficient

Gibbs Classifier: two steps

1. Choose just one hypothesis at random, according to the posterior P(h|D)   (D: data set; h: hypothesis)

2. Use this to classify new instance

Error rate comparison: assume target concepts are drawn at random from H according to priors on H. then

E[error of Gibbs] <= 2 * E[error of Bayes optimal]