Learning Bayesian Network Classifiers by Maximizing Conditional Likelihood

时间:2022-10-03 19:49:42

Abstract

Bayesian networks are a powerful probabilistic representation, and their use for classification has received considerable attention. However, they tend to perform poorly when learned in the standard way. This is attributable to a mismatch between the objective function used (likelihood or a function thereof) and the goal of classification (maximizing accuracy or conditional likelihood). Unfortunately, the computational cost of optimizing structure and parameters for conditional likelihood is prohibitive. In this paper we show that a simple approximation – choosing structures by maximizing conditional likelihood while setting parameters by maximum likelihood – yields good results. On a large suite of benchmark datasets, this approach produces better class probability estimates than naïve Bayes, TAN, and generatively-trained Bayesian networks.

1. Introduction

The simplicity and surprisingly high accuracy of the naïve Bayes classifier have led to its wide use, and to many attempts to extend it. In particular, naïve Bayes is a special case of a Bayesian network, and learning the structure and parameters of an unrestricted Bayesian network would appear to be a logical means of improvement. However, Friedman et al. found that naive Bayes easily outperforms such unrestricted Bayesian network classifiers on a large sample of benchmark datasets. This explanation was that the scoring functions used in standard Bayesian network learning attempt to optimize the likelihood of the entire data, rather than just the conditional likelihood of the class given the attributes. Such scoring results in suboptimal choices during the search process whenever the two functions favor differing changes to the network. The natural solution would then be to use conditional likelihood as the objective function. Unfortunately, Friedman et al. observed that, while maximum likelihood parameters can be efficiently computed in closed form, this is not true of conditional likelihood. The latter must be optimized using numerical methods, and doing so at each search step would be prohibitively expensive. Friedman et al. thus abandoned this avenue, leaving the investigation of possible heuristic alternatives to it as an important direction for future research. In this paper, we show that the simple heuristic of setting the parameters by maximum likelihood while choosing the structure by conditional likelihood is accurate and efficient.

Friedman et al. chose instead to extend naive Bayes by allowing a slightly less restricted structure (one parent per variable in addition to the class) while still optimizing likelihood. They showed that TAN, the resulting algorithm, was indeed more accurate than naive Bayes on benchmark datasets. We compare our algorithm to TAN and naive Bayes on the same datasets, and show that it outperforms both in the accuracy of class probability estimates, while outperforming naive Bayes and tying TAN in classification error.

2. Bayesian Networks

A Bayesian network encodes the joint probability distribution of a set of Learning Bayesian Network Classifiers by Maximizing Conditional Likelihood

2.1 Learning Bayesian Networks

Given an i.i.d. training set Learning Bayesian Network Classifiers by Maximizing Conditional Likelihood

Learning Bayesian Network Classifiers by Maximizing Conditional Likelihood

When the structure of the network is known, this reduces to estimating Learning Bayesian Network Classifiers by Maximizing Conditional Likelihood

Since on average adding an arc never decreases likelihood on the training data, using the log likelihood as the scoring function can lead to severe overfitting. This problem can be overcome in a number of ways. The simplest one, which is often surprisingly effective, is to limit the number of parents a variable can have. Another alternative is to add a complexity penalty to the log-likelihood. For example, the MDL method minimizes Learning Bayesian Network Classifiers by Maximizing Conditional LikelihoodBayesian Dirichlet (BD) score:

Learning Bayesian Network Classifiers by Maximizing Conditional Likelihood

where Learning Bayesian Network Classifiers by Maximizing Conditional Likelihood

2.2 Bayesian Network Classifiers

The goal of classification is to correctly predict the value of a designated discrete class variable Learning Bayesian Network Classifiers by Maximizing Conditional Likelihoodpredictors or attributes Learning Bayesian Network Classifiers by Maximizing Conditional Likelihoodnaïve Bayes classifier is a Bayesian network where the class has no parents and each attribute has the class as its sole parent. Friedman et al.'s TAN algorithm uses a variant of the Chow and Liu method to produce a network where each variable has one other parent in addition to the class. More generally, a Bayesian network learned using any of the methods described above can be used as a classifier. All of these are generative models in the sense that they are learned by maximizing the log likelihood of the entire data being generated by the model, Learning Bayesian Network Classifiers by Maximizing Conditional Likelihoodconditional log likelihood Learning Bayesian Network Classifiers by Maximizing Conditional Likelihood

Learning Bayesian Network Classifiers by Maximizing Conditional Likelihood

Notice Learning Bayesian Network Classifiers by Maximizing Conditional Likelihooddiscriminative learning, because it would focus on correctly discriminating between classes. The problem with this approach is that, unlike Learning Bayesian Network Classifiers by Maximizing Conditional Likelihood

3. The BNC Algorithm

We now introduce BNC, an algorithm for learning the structure of a Bayesian network classifier by maximizing conditional likelihood. BNC is similar to the hill climbing algorithm of Heckerman et al. except that it uses the conditional log likelihood of the class as the primary objective function. BNC starts from an empty network, and at each step considers adding each possible new arc (i.e., all those that do not create cycles) and deleting or reversing each current arc. BNC pre-discretizes continuous values and ignores missing values in the same way that TAN does.

We consider two versions of BNC. The first, Learning Bayesian Network Classifiers by Maximizing Conditional Likelihood

The second version, Learning Bayesian Network Classifiers by Maximizing Conditional Likelihood

The goal of Learning Bayesian Network Classifiers by Maximizing Conditional Likelihood

Learning Bayesian Network Classifiers by Maximizing Conditional Likelihood的更多相关文章

  1. 概率图模型(PGM):贝叶斯网(Bayesian network)初探

    1. 从贝叶斯方法(思想)说起 - 我对世界的看法随世界变化而随时变化 用一句话概括贝叶斯方法创始人Thomas Bayes的观点就是:任何时候,我对世界总有一个主观的先验判断,但是这个判断会随着世界 ...

  2. Learning Deconvolution Network for Semantic Segme小结

    题目:Learning Deconvolution Network for Semantic Segmentation 作者:Hyeonwoo Noh, Seunghoon Hong, Bohyung ...

  3. [论文阅读笔记] Adversarial Mutual Information Learning for Network Embedding

    [论文阅读笔记] Adversarial Mutual Information Learning for Network Embedding 本文结构 解决问题 主要贡献 算法原理 实验结果 参考文献 ...

  4. [Scikit-learn] Dynamic Bayesian Network - Conditional Random Field

    李航,第十一章,条件随机场 参考:[PGM] Markov Networks 携代码:用 Python 通过马尔可夫随机场(MRF)与 Ising Model 进行二值图降噪[推荐!] CRF:htt ...

  5. 条件独立(conditional independence) 结合贝叶斯网络(Bayesian network) 概率有向图 (PRML8.2总结)

    本文会利用到上篇,博客的分解定理,需要的可以查找上篇博客 D-separation对任何用有向图表示的概率模型都成立,无论随机变量是离散还是连续,还是两者的结合. 部分图为手写,由于本人字很丑,望见谅 ...

  6. 条件独立(conditional independence) 结合贝叶斯网络(Bayesian network) 概率有向图 (PRML8.2总结)

    转:http://www.cnblogs.com/Dzhouqi/p/3204481.html本文会利用到上篇,博客的分解定理,需要的可以查找上篇博客 D-separation对任何用有向图表示的概率 ...

  7. [Scikit-learn] Dynamic Bayesian Network - HMM

    Warning The sklearn.hmm module has now been deprecated due to it no longer matching the scope and th ...

  8. 概率图模型(PGM) —— 贝叶斯网络(Bayesian Network)

    概率图模型是图论与概率方法的结合产物.Probabilistic graphical models are a joint probability distribution defined over ...

  9. 3.贝叶斯网络表示(The Bayesian Network Representation)

    对于一个n随机变量的联合分布,一般需要2**n-1个参数来表示这个分布.但是,我们可以通过随机变量之间的独立性,减少参数的个数. naive Beyes model: Bayesian Network ...

随机推荐

  1. Windows server 2012 AD DS 搭建步骤

    服务器版本:Windows server 2012 1.  配置网络,由于本机会搭建DNS服务器,因此首选DNS服务器设置为127.0.0.1 2.  打开服务器管理器 3.  点击添加角色和功能,下 ...

  2. Qt开发环境中使用报表控件FastReport遇到的一些问题(二)

    上一节中谈到的那个问题:传递的变量内容如果是纯英文,报表报错.经笔者反复测试,找到了解决办法:代码中第5行替换为以下 params<<"my_var"<<& ...

  3. sqlserver无法连接

    以下是我的检查信息及结果:1.telnet 192.168.1.100 1433 通过  telnet 116.3.15.198 1433 不通,提示“……无法打开连接,连接失败”的错误.2.通过端口 ...

  4. js正则验证手机号码有效性

    验证130-139,150-159,180-189号码段的手机号码 <script type="text/javascript"> [-]{})|([-]{})|([- ...

  5. 亚马逊EC2服务器申请&plus;NODE服务器部署&plus;阿里云域名申请&plus;SSL证书使用

    最近,由于项目需要,自己申请了一台亚马逊用于部署网站测试,在使用期间,发现网上没有一篇非常完整的文章讲解从服务器申请到域名解析,SSL证书申请的整个流程.所以自己总结一下,以供大家学习! 一.亚马逊E ...

  6. 基于Java的数据采集(三)

    <基于Java的数据采集(一)>:http://www.cnblogs.com/lichenwei/p/3904715.html <基于Java的数据采集(二)>:http:/ ...

  7. android仿支付宝输入车牌号

    这个是iOS的效果图,差异不大,楼主主攻OC,见谅 需要用到的xml文件 需要用到的类 number_or_letters.xml <?xml version="1.0" e ...

  8. possible error

    1● regedit 2● path [HKEY_CURRENT_USER\SOFTWARE\Microsoft\Windows\Windows Error Reporting]       3● 步 ...

  9. 创建maven项目前的准备工作

    第二步: 在maven中的settings.xml文件中指定 2.1 本地仓库:计算机中一个文件夹,自己定义是哪个文件夹. 2.1 示例语法 <localRepository>D:/mav ...

  10. 分享今天在客户那里遇到的SQLSERVER连接超时以及我的解决办法

    分享今天在客户那里遇到的SQLSERVER连接超时以及我的解决办法 客户的环境:SQLSERVER2005,WINDOWS2003 SP2  32位 这次发生连接超时的时间是2013-8-5  21: ...