Vincent (2023-05-31 13:56):
#paper doi: https://doi.org/10.1111/j.1467-9868.2008.00674.x Journal of he Royal Statistical Society, 2008, Sure independence screening for ultrahighdimensional feature space. 高维数据往往面临着两大难题,参数估计的准确性和计算负担。先前的方法(Dantzig selector)在处理极高维数据(log p > n)时还是不够有效,这篇文章提出了一种基于相关性学习的特征筛选方法,能够将数据从极高维降到的合适的维度(小于n)。文章展示了在十分普遍的渐进框架下,相关性学习有可靠的筛选性能。同时作为该方法的扩展,文章还提出了一种迭代式的特征筛选,能够在有限数据量的情况下,提高筛选的准确性。此外当使用该方法把高维数据降低到低维之后,其他变量选择的方法例如lasso等也可以被运用进来,从而实现更准确和更快速的变量选择。
Sure Independence Screening for Ultrahigh Dimensional Feature Space
翻译
Abstract:
SummaryVariable selection plays an important role in high dimensional statistical modelling which nowadays appears in many areas and is key to various scientific discoveries. For problems of large scale or dimensionality p, accuracy of estimation and computational cost are two top concerns. Recently, Candes and Tao have proposed the Dantzig selector using L1-regularization and showed that it achieves the ideal risk up to a logarithmic factor log(p). Their innovative procedure and remarkable result are challenged when the dimensionality is ultrahigh as the factor log(p) can be large and their uniform uncertainty principle can fail. Motivated by these concerns, we introduce the concept of sure screening and propose a sure screening method that is based on correlation learning, called sure independence screening, to reduce dimensionality from high to a moderate scale that is below the sample size. In a fairly general asymptotic framework, correlation learning is shown to have the sure screening property for even exponentially growing dimensionality. As a methodological extension, iterative sure independence screening is also proposed to enhance its finite sample performance. With dimension reduced accurately from high to below sample size, variable selection can be improved on both speed and accuracy, and can then be accomplished by a well-developed method such as smoothly clipped absolute deviation, the Dantzig selector, lasso or adaptive lasso. The connections between these penalized least squares methods are also elucidated.
翻译
回到顶部