Novel GPU Boruta algorithms for feature selection from high-dimensional data

2026-05-11Machine Learning

Machine LearningArtificial Intelligence
AI summary

The authors noticed that common feature selection methods are slow on regular computers, making it hard to use them with big datasets. They created two faster versions of the Boruta feature selection method that run on GPUs, which are better at handling many calculations quickly. Their tests showed these new versions pick important features just as well as the original but much faster. They also found one version might sometimes give too much credit to certain features. Overall, the authors suggest using GPUs can make feature selection more efficient for large data.

feature selectionwrapper methodsBoruta algorithmGPU accelerationpermutation importanceimpurity reductioncomputational efficiencylarge scale datasets
Authors
Xurui Li, Zhiguo Gan, Jiaming Zhang, Zheng Liu, Diannan Lu
Abstract
Most feature selection algorithms, especially wrapper methods, run inefficiently on CPU based platforms because of their high computational complexity. This inefficiency makes them unsuitable for processing large scale datasets. To address this challenge, the present study proposed two GPU accelerated versions of the Boruta feature selection procedure, in which Boruta-Permut relies on permutation based feature importance and Boruta-TreeImp employs importance based on impurity reduction. To evaluate these methods we conducted experiments on both a self constructed dataset and several publicly available datasets. The experimental results show that the proposed GPU accelerated algorithms greatly improve computational efficiency while preserving feature selection accuracy comparable to the original Boruta algorithm. In our analysis we also observe that the impurity reduction based version can overestimate the importance of some features. Overall these findings suggest that performing Boruta feature selection on GPUs offers an effective and cost efficient solution for large scale data analysis, which is a good deal.