17 Great Machine Learning Libraries

时间:2022-09-09 15:45:20

17 Great Machine Learning Libraries

08 October 2013

After wonderful feedback on my previous post on Scikit-learn from the guys at /r/MachineLearning, I decided to collect the list of machine learning libraries into this seperate note. Let me know if there’s a library that should be included here.


Update (15 May 2014): thanks to Djalel Benbouzid and Dwayne Campbell for additional suggestions. Sorry it’s taken me so long to add them…


Python

  • Scikit-learn: comprehensive and easy to use, I wrote a whole article on why I like this library.
  • PyBrain: Neural networks are one thing that are missing from SciKit-learn, but this module makes up for it.
  • nltk: really useful if you’re doing anything NLP or text mining related.
  • Theano: efficient computation of mathematical expressions using GPU. Excellent for deep learning.
  • Pylearn2: machine learning toolbox built on top of Theano - in very early stages of development.
  • MDP (Modular toolkit for Data Processing): a framework that is useful when setting up workflows.

Java

  • Spark: Apache’s new upstart, supposedly up to a hundred times faster than Hadoop, now includes MLLib, which contains a good selection of machine learning algorithms, including classification, clustering and recommendation generation. Currently undergoing rapid development. Development can be in Python as well as JVM languages.
  • Mahout: Apache’s machine learning framework built on top of Hadoop, this looks promising, but comes with all the baggage and overhead of Hadoop.
  • Weka: this is a Java based library with a graphical user interface that allows you to run experiments on small datasets. This is great if you restrict yourself to playing around to get a feel for what is possible with machine learning. However, I would avoid using this in production code at all costs: the API is very poorly designed, the algorithms are not optimised for production use and the documentation is often lacking.
  • Mallet: another Java based library with an emphasis on document classification. I’m not so familiar with this one, but if you have to use Java this is bound to be better than Weka.
  • JSAT: stands for “Java Statistical Analysis Tool” - created by Edward Raff and was born out of his frustation with Weka (I know the feeling). Looks pretty cool.

.NET

  • Accord.NET: this seems to be pretty comprehensive, and comes recommended by primaryobjects on Reddit. There is perhaps a slight slant towards image processing and computer vision, as it builds on the popular library AForge.NET for this purpose.
  • Another option is to use one of the Java libraries compiled to .NET using IKVM - I have used this approach with success in production.

C++

  • Vowpal Wabbit: designed for very fast learning and released under a BSD license, this comes recommended by terath on Reddit.
  • MultiBoost: a fast C++ framework implementing some boosting algorithms as well as some cascades (like the Viola-Jones cascades). It’s mainly focused on AdaBoost.MH so it is multi-class/multi-label.
  • Shogun: large machine learning library with a focus on kernel methods and support vector machines. Bindings to Matlab, R, Octave and Python.

General

  • LibSVM and LibLinear: these are C libraries for support vector machines; there are also bindings or implementations for many other languages. These are the libraries used for support vector machine learning in Scikit-learn.

Conclusion

This article is a work in progress, so please send me your comments or criticisms!

Want more? Sign up below to get a free ebook Machine Learning in Practice, and updates on new posts:

这两天开始折腾ML的开源库,ML的开源库有很多,比如Torch,MLC,Weka(基于java),Waffles,Shark,scikit,opencv-ml,等等,综合比较了各个开源库的优劣,决定搞搞以下几个库:
1. Shark,基于c++
2. scikit,基于python
3. weka,基于java
4. opencv-ml,基于c++,图像处理中用的比较多,之前已接触过

花了一个下午的时间终于成功安装配置Shark,感觉Shark库还是挺强大的,基本上包含了常用的ML算法,而且是基于C++,用起来比较顺手。
环境:win32, vs10

网上对于Shark的安装的相关文章很少,以下内容基本参考:(感谢分享)
http://www.cnblogs.com/xiangwengao/archive/2013/05/04/3059632.html
http://www.cnblogs.com/xiangwengao/archive/2013/05/01/3052821.html
http://www.cnblogs.com/xiangwengao/archive/2013/05/01/3052827.html

一、Shark——之正确获取
有两篇错误安装方法.这两篇介绍的获取Shark路径都有问题,根本不可用或者获取不了.(我已验证过确实这样)
第1篇错误http://www.iteye.com/news/27669
. 严重不对,因为SVN下载的是开发版,有时会缺少文件导致VS编译不成功,最终无法使用.我在按照svn下载安装时,缺少LinAlg的文件,根本无法使用.坚决建议大家别采用.
第2篇错误 http://shark-project.sourceforge.net/,根本找不到文件,地址早就失效了.该篇文章后面介绍的安装和使用还凑合.

正确的下载地址:https://sourceforge.net/projects/shark-project/files/Shark%20Core/下载zip文件进行安装.
版本:2.3.4

Shark利用CMake进行编译,需要C++ Boost库支持.具体后续.

二、Shark——之安装篇

Shark Machine Learning Library 的主页链接是:http://shark-project.sourceforge.net/,shark是由德国波鸿大学开发的,曾获得2011年世界开源大赛金奖。shark基于C++的泛型编程,里面大量使用了模板,因此封装性和继承性极佳。由于是基于C++的,所以函数的效率还是不错的。

shark的库主要分为4部分

  1. ReClaM     回归与分类模块 涵盖了线性方法、神经网络、SVM、Kernel 等
  2. EALib      进化计算模块
  3. MOO-EAlib  多目标的进化计算
  4. Fuzzy      模糊计算模块

OK, 开始吧,下面进入安装过程。shark的函数库可以安装在Microsoft,Linux,Mac 的操作系统上,本文介绍其在
Microsoft Windows 上的安装过程。值得注意的是,在下载的shark包路径 Shark/doc/TutorialsOld/
下面有一个在各种平台下的安装说明,但是比较老。

第一步,准备安装软件,产生编译文件。跨平台编译工具 Cmake v2.8,Mircosoft Visual Stdio 2005 或更高版本。我的shark 包的路径在 D:/shark ,cmake的设置如下
17 Great Machine Learning Libraries
点击configure 按钮,选择我们需要的编译器 VS2005,然后再点击 Generate。完成后显示如下

17 Great Machine Learning Libraries

这时候去看看 D:/build_shark 路径下,cmake 已经为我们生成了 VS2005 需要的编译文件了

第二步,使用 VS2005 编译连接,得到我们需要的 shark.lib 静态链接库。

双击 build_shark 文件夹下面的 shark.sln, 把工程导入到 vs2005 编译环境下。

这里大家就可以看到 shark
自带的所有实例工程和shark.lib的工程了,可以选择工具栏的“生成”—>“重新生成解决方案”,这时候vs2005就会为我们生成所有的实
例程序,由于实例比较多,整个过程可能持续数分钟,出去喝杯茶吧,保持耐心哦。当然,我是为了演示一下实例程序,所以选择重新生成了,你可以根据自己的需
要选择特定的工程,比如,你打开shark.vcproj,就会生成shark.lib。

这里再称赞一下德国人的严谨精神,70个工程,作为一个开源库居然没有错误一次编译成功,做工精细啊。

OK,编译完成后,看看 build_shark 文件夹下面多出来了好几个文件件,其中examples 下面就是所有的实例程序,当然还没有debug呢,需要哪个的话,自己去搞吧,关键是注意 debug 文件夹,下面终于见到我们需要的东西了:shark.lib

(Release也可以做一遍)

下一篇我讲一下如何把我们得到的shark.lib 导入到自己的工程里面,运行一个实例。

二、Shark——之运行篇

在上一篇里面,我们最后得到了Shark Machine Learning Library 的shark.lib 静态链接库。本文将继续讲解,使用得到的库,在VS2005 环境里运行一个shark自带的例子,这个例子叫做“TSP_GA”,看名字就知道了,使用遗传算法求解TSP问题的。

OK,开始吧。

第一步,先到这个路径Shark\examples\EALib 下面,找到本文要用的源文件TSP_GA.cpp。新建一个工程,文件路径下新建两个文件夹,一个叫include,一个叫lib,分别用于放置shark的头文件和链接库。

第二步,给工程添加静态链接库和头文件包含。点击“项目”->“属性”,选择“C/C++”->"常规",如下图所示,添加头文件的路径(附加包含目录)

17 Great Machine Learning Libraries

然后,点击“链接器”->“常规”,添加shark.lib的附加库目录,如下图

17 Great Machine Learning Libraries

继续,点击“链接器”->“输入”,填写库名称,如下图

17 Great Machine Learning Libraries

OK,到此为止,我们就把工程的链接库和头文件都设置好了。

第三步,运行 TSP_GA 工程,成功!恭喜你,你已经成功安装了 shark 库函数!

17 Great Machine Learning Libraries

说明一下,由于是控制台应用程序,最后运行完可能闪一下就没了。一个小技巧是,在程序最后加一句 getchar(); 这样敲回车才会退出。

总结:安装过程还算顺利,linux下面的安装待续......

17 Great Machine Learning Libraries的更多相关文章

  1. SOME USEFUL MACHINE LEARNING LIBRARIES.

    from: http://www.erogol.com/broad-view-machine-learning-libraries/ http://www.slideshare.net/Vincenz ...

  2. How do I learn machine learning?

    https://www.quora.com/How-do-I-learn-machine-learning-1?redirected_qid=6578644   How Can I Learn X? ...

  3. Python -- machine learning, neural network -- PyBrain 机器学习 神经网络

    I am using pybrain on my Linuxmint 13 x86_64 PC. As what it is described: PyBrain is a modular Machi ...

  4. 机器学习(Machine Learning)&深度学习(Deep Learning)资料(Chapter 2)

    ##机器学习(Machine Learning)&深度学习(Deep Learning)资料(Chapter 2)---#####注:机器学习资料[篇目一](https://github.co ...

  5. Python Tools for Machine Learning

    Python Tools for Machine Learning Python is one of the best programming languages out there, with an ...

  6. Deep Learning Libraries by Language

    Deep Learning Libraries by Language Tweet         Python Theano is a python library for defining and ...

  7. 【原】Coursera—Andrew Ng机器学习—课程笔记 Lecture 17—Large Scale Machine Learning 大规模机器学习

    Lecture17 Large Scale Machine Learning大规模机器学习 17.1 大型数据集的学习 Learning With Large Datasets 如果有一个低方差的模型 ...

  8. 【机器学习Machine Learning】资料大全

    昨天总结了深度学习的资料,今天把机器学习的资料也总结一下(友情提示:有些网站需要"*"^_^) 推荐几本好书: 1.Pattern Recognition and Machi ...

  9. Machine Learning : Pre-processing features

    from:http://analyticsbot.ml/2016/10/machine-learning-pre-processing-features/ Machine Learning : Pre ...

随机推荐

  1. Mac上MySQL忘记root密码且没有权限的处理办法&workbench的一些tips (转)

    忘记Root密码肿么办 Mac上安装MySQL就不多说了,去mysql的官网上下载最新的mysql包以及workbench,先安装哪个影响都不大.如果你是第一次安装,在mysql安装完成之后,会弹出来 ...

  2. 关于overflow-y:scroll ios设备不流畅的问题

    最近做双创项目的时候因为页面有很多数据显示,所以打算让它Y轴方向滚动条的形式展现,但在测试阶段发现IOS设备滑动效果非常不理想: search by google之后找到解决办法: -webkit-o ...

  3. CSS中的尺寸单位

    绝对单位 px: Pixel 像素 pt: Points 磅 pc: Picas 派卡 in: Inches 英寸 mm: Millimeter 毫米 cm: Centimeter 厘米 q: Qua ...

  4. PHPWord 打印 快递单/合同

    打印快递单有个特点: 被打印纸的背景是固定的, 你只能 在合适的位置输入快递单的内容,操作步骤如下: 1.制作 word 模板 参考文章 “图解如何用打印机套打快递单” 2.在 模板 中放置“占位符” ...

  5. 【中文分词】DAG、DP、HMM、Viterbi

    http://blog.sina.com.cn/s/blog_8267db980102wq41.html http://www.cnblogs.com/leeshine/p/5804679.html ...

  6. vue-cli新建vue项目安装axios后在IE下报错

    使用脚手架新建了一个vue项目,可以在IE9+浏览器运行,但是在添加了axios后,在IE下就报错了 首先是安装axios,在命令行执行: $ npm install axios -s //执行命令, ...

  7. jQuery progression 表单进度

    progression.js是一款表单输入完成进度插件.支持自定义提示框大小.方向.左边.动画效果.间距等,也支持是否显示进度条.字体大小.颜色.背景色等. 在线实例 实例演示 使用方法 <fo ...

  8. gym 102082B dp

    和51nod1055 一样: #include<iostream> #include<cstdio> #include<algorithm> #include&lt ...

  9. 在物理机上,用U盘安装esxi虚拟化环境

    一般使用U盘安装centos镜像,可使用镜像刻录工具UltraISO,详细方法参照如下链接: https://jingyan.baidu.com/article/647f0115ee55ba7f214 ...

  10. 《C程序猿:从校园到职场》出版预告&lpar;4&rpar;:从&OpenCurlyDoubleQuote;散兵游勇”到&OpenCurlyDoubleQuote;正规部队”

    看过电视剧<楚汉传奇>的朋友应该对这个场景还有印象:当刘邦第一次去找项羽帮忙的时候.他们一行人看到了项羽军营是怎样练兵的.想到自己练兵的方法,当时就震惊了."刘家军"就 ...