DeepForest

引子

最早知道周志华的文章还是它刚出来的时候，那个时候最先是一片叫好声，在知乎上。因为几乎都受过他的指引。（西瓜书），但是后来好像风声又不对了。一些先看论文的大牛评价没有那么积极，创新中规中矩。一些学得半桶水的人则觉得idea并没什么了不起。（可是他们都没有实现过），对这篇论文看衰，我也受其影响，因为对知识理解很深刻的总是少数，大多数人还是人云亦云，随波逐流。正好那时候对LightRNN又挺感兴趣，于是注意力放到实现MSRA的Lightrnn上了。

上次组会曾老师突然说要开始讲LAMDA组的论文，这让我有点懵逼，做了一个月的工作，已经到了最关键的时候突然换个方向，可是又没办法，摊手。不过LAMDA组也比某些水文要强上不少，那看就看吧，于是便有了这次关于DeepForest的调研工作。

调研开始-国外

google 搜索gcForest, 第二条是在Github在LightGBM下的讨论，看得我担心不已，DeepForest好像实现起来并不难，已经有好些人写出代码了，但是效果并不好，这点和Lightrnn很像：），外国大牛评议还是相对客观的，特别是着手实现的几位实战派，他们的意见都很中肯，指出了论文的不足之处和隐藏的某些trick，并吐曹它只是stacking的延展，并没有像标题所说的a alternative to DeepLearning.文中两个关键点经常被提到，级联森林(Cascade Forest)和多粒度检测（Multi-grained Scanning)，之后看论文要重点关注这两处。论文没有提到的一些缺陷也一并给出。如过大的内存和CPU资源占用，以及给出的benchmark并不是最好效果，而有些现存的boosting之类效果则比DeepForest要好，并且开箱既用。

pros:

It is a meta-model (like the ones used in Kaggle competitions by top competitors)
It is a stacking ensemble (therefore, you are free to use the models you want - authors decided RF + CRTF, I would say it's OK and for users they should add xgboost/LightGBM on top of that, because it helps a a lot - see at the end of this post)
It is an ensemble of diverse models (so diverse that even poor Random Forest + CRTF are outperforming a single boosted model - but at a massive CPU cost)
It is not a single model like LightGBM/xgboost (the synergy of two models is usually better than a strong single model)

cons:

I'm getting better and faster results with LeNet compared to Deep Forest on a portion of MNIST (2k).
Myself and @chrisorm are getting better results out of the box with a Random Forest on the Adult dataset than their Deep Forest, something is wrong/fishy (even worse: we are using two different Random Forest implementations).
I'm getting better results just by using a gradient boosted model instead of a Cascade Forest (why are we needing the complexity of Cascade Forest when we already know boosting methods are typically better? - not talking about diversity here).
Just putting Intel MKL on MXnet made LeNet training very fast on CPU (cf issue 1), faster than Cascade Forest, Boosting being "only" twice faster using a 10x speedup (fast histogram method). Even if we assume a log scale for gradient boosting and linear scale for neural network, this doesn't fit what they have.
The authors do not want to publish their code (what is research for, then? contributing or not contributing? reproducible or not reproducible? why can't we reproduce their results?).
The authors are technically too vague in their model description, which leads to many different implementations which differ in their way of working (reproducibility issue).

总的来说，缺少源码，对此不算乐观。

国内

看完这个Issue我本想换篇论文来试试的，最好是离NLP更加接近的内容。但是又回头在国内搜索了一下作者们的一些观点。

二作ji feng在知乎的回答很有意思，貌似啪啪地打脸了其它反对者的声音。将文章吹得神乎其神，并拍着胸脯表示要在儿童节前发布Python源码，给各位同学一份儿童节礼物，然而转身又被一作老板打脸，在微薄郑重声明：没征求过我意见，不代表本人观点。。。。这对师徒我也是无语了。不过翻阅了一下他的微薄，看到几条关于这篇文章的信息，对于Keras作者大佬表示的Nothing new，但又同时想借鉴期forest观点的略显轻蔑的Twittes。周老师如下回复：

DeepForest的一些思想如果大家觉得有意思，借鉴去设计更好的DNN，不是挺好的吗？文章无人问津才可悲。学术研究不是比武。附带说一下，因为资源限制，有些事我们做不了，现在有东主愿提供资源可以做做了。后面肯定会发布代码的.

觉得他挺像十几二十年前的Hinton等人，在远离喧嚣的旧地，默默地坚持着自己的信仰。

总结

相对新宠儿Deeplearning来说，Stacking算是老树发新芽了。LAMDA组肯定藏着不少东西，准备一篇一篇论文慢慢挤牙膏。

拭目以待。

DeepForest

引子

调研开始-国外

pros:

cons:

国内

总结

评论区

暂无评论