Neural Network Conversion of Machine Learning Pipelines

2026-03-26Machine Learning

Machine LearningArtificial Intelligence
AI summary

The authors studied a way to teach a small neural network to learn from a non-neural system called a random forest, which is usually good at solving tasks but works differently from neural networks. They tested this teaching method on 100 different tasks and found that the small neural network could often copy the random forest’s performance if the network’s settings were chosen carefully. They also looked at using the random forest to help pick the best settings for the neural network. This approach could help combine different machine learning tools into one system.

transfer learningknowledge distillationstudent-teacher learningrandom forestneural networkshyper-parametersOpenMLmachine learning pipelineclassificationmodel optimization
Authors
Man-Ling Sung, Jan Silovsky, Man-Hung Siu, Herbert Gish, Chinnu Pittapally
Abstract
Transfer learning and knowledge distillation has recently gained a lot of attention in the deep learning community. One transfer approach, the student-teacher learning, has been shown to successfully create ``small'' student neural networks that mimic the performance of a much bigger and more complex ``teacher'' networks. In this paper, we investigate an extension to this approach and transfer from a non-neural-based machine learning pipeline as teacher to a neural network (NN) student, which would allow for joint optimization of the various pipeline components and a single unified inference engine for multiple ML tasks. In particular, we explore replacing the random forest classifier by transfer learning to a student NN. We experimented with various NN topologies on 100 OpenML tasks in which random forest has been one of the best solutions. Our results show that for the majority of the tasks, the student NN can indeed mimic the teacher if one can select the right NN hyper-parameters. We also investigated the use of random forest for selecting the right NN hyper-parameters.