S
sadscv
编辑于 发布于

DeepForest

引子

最早知道周志华的文章还是它刚出来的时候,那个时候最先是一片叫好声,在知乎上。因为几乎都受过他的指引。(西瓜书),但是后来好像风声又不对了。一些先看论文的大牛评价没有那么积极,创新中规中矩。一些学得半桶水的人则觉得idea并没什么了不起。(可是他们都没有实现过),对这篇论文看衰,我也受其影响,因为对知识理解很深刻的总是少数,大多数人还是人云亦云,随波逐流。正好那时候对LightRNN又挺感兴趣,于是注意力放到实现MSRA的Lightrnn上了。

上次组会曾老师突然说要开始讲LAMDA组的论文,这让我有点懵逼,做了一个月的工作,已经到了最关键的时候突然换个方向,可是又没办法,摊手。 不过LAMDA组也比某些水文要强上不少,那看就看吧,于是便有了这次关于DeepForest的调研工作。

调研开始-国外

google 搜索gcForest, 第二条是在Github在LightGBM下的讨论,看得我担心不已,DeepForest好像实现起来并不难,已经有好些人写出代码了,但是效果并不好,这点和Lightrnn很像 :),外国大牛评议还是相对客观的,特别是着手实现的几位实战派,他们的意见都很中肯,指出了论文的不足之处和隐藏的某些trick,并吐曹它只是stacking的延展,并没有像标题所说的a alternative to DeepLearning.文中两个关键点经常被提到 ,级联森林(Cascade Forest)和多粒度检测(Multi-grained Scanning),之后看论文要重点关注这两处。论文没有提到的一些缺陷也一并给出。如过大的内存和CPU资源占用,以及给出的benchmark并不是最好效果,而有些现存的boosting之类效果则比DeepForest要好,并且开箱既用。

pros:

  • It is a meta-model (like the ones used in Kaggle competitions by top competitors)
  • It is a stacking ensemble (therefore, you are free to use the models you want - authors decided RF + CRTF, I would say it's OK and for users they should add xgboost/LightGBM on top of that, because it helps a a lot - see at the end of this post)
  • It is an ensemble of diverse models (so diverse that even poor Random Forest + CRTF are outperforming a single boosted model - but at a massive CPU cost)
  • It is not a single model like LightGBM/xgboost (the synergy of two models is usually better than a strong single model)

cons:

  • I'm getting better and faster results with LeNet compared to Deep Forest on a portion of MNIST (2k).

  • Myself and @chrisorm are getting better results out of the box with a Random Forest on the Adult dataset than their Deep Forest, something is wrong/fishy (even worse: we are using two different Random Forest implementations).

  • I'm getting better results just by using a gradient boosted model instead of a Cascade Forest (why are we needing the complexity of Cascade Forest when we already know boosting methods are typically better? - not talking about diversity here).

  • Just putting Intel MKL on MXnet made LeNet training very fast on CPU (cf issue 1), faster than Cascade Forest, Boosting being "only" twice faster using a 10x speedup (fast histogram method). Even if we assume a log scale for gradient boosting and linear scale for neural network, this doesn't fit what they have.

  • The authors do not want to publish their code (what is research for, then? contributing or not contributing? reproducible or not reproducible? why can't we reproduce their results?).

  • The authors are technically too vague in their model description, which leads to many different implementations which differ in their way of working (reproducibility issue).

总的来说,缺少源码,对此不算乐观。


国内

看完这个Issue我本想换篇论文来试试的,最好是离NLP更加接近的内容。但是又回头在国内搜索了一下作者们的一些观点。

二作ji feng在知乎的回答很有意思,貌似啪啪地打脸了其它反对者的声音。将文章吹得神乎其神,并拍着胸脯表示要在儿童节前发布Python源码,给各位同学一份儿童节礼物,然而转身又被一作老板打脸,在微薄郑重声明:没征求过我意见,不代表本人观点。。。。这对师徒我也是无语了。不过翻阅了一下他的微薄,看到几条关于这篇文章的信息,对于Keras作者大佬表示的Nothing new,但又同时想借鉴期forest观点的略显轻蔑的Twittes。周老师如下回复:

DeepForest的一些思想如果大家觉得有意思,借鉴去设计更好的DNN,不是挺好的吗?文章无人问津才可悲。学术研究不是比武。附带说一下,因为资源限制,有些事我们做不了,现在有东主愿提供资源可以做做了。后面肯定会发布代码的.

觉得他挺像十几二十年前的Hinton等人,在远离喧嚣的旧地,默默地坚持着自己的信仰。

总结

相对新宠儿Deeplearning来说,Stacking算是老树发新芽了。LAMDA组肯定藏着不少东西,准备一篇一篇论文慢慢挤牙膏。

拭目以待。

`

阅读 2 评论 0

评论区

登录后发表评论

暂无评论

成为第一个发表评论的人吧!