面向高维数据的凹型自表示特征选择方法A Concave Self-Representation Based Feature Selection for High-dimensional Data
朱国荣,冯昊,叶玲节
ZHU Guorong,FENG Hao,YE Lingjie
摘要(Abstract):
在大数据时代,特征选择对于降低复杂度、压缩存储量、提升数据分析泛化能力等具有重要作用。针对大量的无标签高维样本,无监督特征选择技术能够更好地缓解维数灾难问题并得到了广泛应用。对此,提出了一种凹型正则约束的自表示模型,通过特征间的互线性表示以及l_(2,p)范数用于无监督特征选择。对比常见的凸函数约束,所提方法具有更为稀疏的系数解,能更有效地选择显著性特征子空间。为求解目标系数,进一步提出了一种有效的迭代重加权最小二乘算法,保证模型得以收敛至稳定解。9个公开数据集中的试验表明,所提方法在分类精度和聚类性能方面都优于其他对比算法。
Feature selection is a vital in the big data era for reducing complexity, decreasing storage level and enhancing the generalization capacity of data analysis. For the flooded untagged high dimensional samples,feature selection with unsupervised feature is also very useful to alleviate the curse of dimensionality and can be applied in numerous fields. A concave constrained self-representation method is proposed where features are represented by a linear combination of the other ones, and the l_(2,p) norm is used as a regularizer for feature selection. Compared with the traditional convex regularization, more compact solutions can be obtained via the concave constraint, which makes the proposed method more effective for choosing salient features. For solving the targeted coefficient, we further devise an efficient iterative reweighted least square method, which guarantees the convergence of the proposed model to a stationary point. We conduct several experiments on nine publicly available databases, and the results show that our method for feature selection outperforms other competing methods in terms of clustering effectiveness and recognition accuracy.
关键词(KeyWords):
大数据;高维数据;自表示;特征选择
big data;high-dimensional data;self-representation;feature selection
基金项目(Foundation): 国网浙江省电力有限公司科技项目(5211JY15001V)
作者(Author):
朱国荣,冯昊,叶玲节
ZHU Guorong,FENG Hao,YE Lingjie
DOI: 10.19585/j.zjdl.201712004
参考文献(References):
- [1]TANG J,LIU H.Unsupervised feature selection for linked social media data[C]//Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining,ACM,2012:904-912.
- [2]CAI D,ZHANG C,HE X.Unsupervised feature selection for multi-cluster data[C]//Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining,ACM:2010:333-342.
- [3]DY J G,BRODLEY C E.Feature selection for unsupervised learning[J].J Mach.Learn.Res,2004(5):845-889.
- [4]ZHAO Z,WANG L,LIU H,et al.Efficient spectral feature selection with minimum redundancy[C]//in:Proceedings of the 24th AAAI Conference on Artificial Intelligence(AAAI),2010:673-678.
- [5]HE X,CAI D,NIYOGI P.Laplacian score for feature selection[C]//Advances in Neural Information Processing Systems,2005:507-514.
- [6]NIE F,HUANG H,CAI X,et al.Efficient and robust feature selection via joint l2,1-norms minimization[C]//Advances in Neural Information Processing Systems,2010:1813-1821.
- [7]YANG Y,SHEN H T,MA Z,et al.l2,1-norm regularized discriminative feature selection for unsupervised learning[C]//IJCAI Proceedings-International Joint Conference on Artificial Intelligence,2011:1589.
- [8]CONG Y,WANG S,FAN B,et al.UDSFS:Unsupervised deep sparse feature selection[J].Neurocomputing,2016(196):150-158.
- [9]KOHAVI R,JOHN G H.Wrappers for feature subset selection[J].Artif Intell,1997(1):273-324.
- [10]GUYON I,A ELISSEEFF.An introduction to variable and feature selection[J].J Mach Learn Res,2003(3):1157-1182.
- [11]HOU C,NIE F,YI D,et al.Feature selection via joint embedding learning and sparse regression[C]//IJCAI Proceedings-International Joint Conference on Artificial Intelligence,2011:1324.
- [12]NIE F,XIANG S,JIA Y,et al.Trace ratio criterion for feature selection[C]//AAAI Conference on Artificial Intelligence,2008:671-676.
- [13]MITRA P,MURTHY C,PAL S K.Unsupervised feature selection using feature similarity[J].IEEE Trans Pattern Anal Mach Intell,2002,24(3):301-312.
- [14]ZHAO Z,LIU H.Spectral feature selection for supervised and unsupervised learning[C]//Proceedings of the 24th International Conference on Machine Learning,ACM,2007:1151-1157.
- [15]LI Z,YANG Y,LIU J,et al.Unsupervised feature selec tion using nonnegative spectral analysis[J].AAAI,2012:1026-1032.
- [16]ZHOU N,XU Y,CHENG H,et al.Global and local structure preserving sparse subspace learning:an iterative approach to unsupervised feature selection[J].Pattern Recogn,2015(53):87-101.
- [17]ZHU P,ZUO W,ZHANG L,et al.Unsupervised feature selection by regularized self-representation[J].Pattern Recogn,2015,48(2):438-446.
- [18]ZHU J,ROSSET S,HASTIE T,et al.1-Norm support vector machines[J].Adv Neural Inf Proces Syst,2004,16(1):49-56.
- [19]HE R,TAN T,WANG L,et al.l2,1regularized correntropy for robust feature selection,Computer Vision and Pattern Recognition(CVPR)[C]//2012 IEEE Conference on,IEEE,2012:2504-2511.
- [20]YAO J,MAO Q,GOODISON S,et al.Feature selection for unsupervised learning through local learning[J].Pattern Recogn Lett,2014,53:100-107.
- [21]BERTSEKAS D P.Constrained Optimization and Lagrange Multiplier Methods[M].New York:Academic Press,2014.
- [22]WANG W,ZHANG H,ZHU P,et al.Non-convex regularized self-representation for unsupervised feature selec tion,Intelligence Science and Big Data Engineering[C]//Big Data and Machine Learning Techniques Springer.2015:55-65.
- [23]LIN Z,CHEN M,MA Y.The augmented lagrange multiplier method for exact recovery of corrupted low-rank matrices[C]//ar Xiv preprint ar Xiv,2010:1009,5055.
- [24]CAI X,NIE F,HUANG H.Exact top-k feature selection via l2,0-norm constraint[C]//International Joint Conference on Artificial Intelligence,2013:2013.
- [25]WANG Y,YANG J,YIN W,et al.A new alternating minimization algorithm for total variation image reconstruction[J].SIAM J Imag Sci,2008,1(3):248-272.
- [26]LUO D,DING C,HUANG H.Toward structural sparsity:an explicit l2/l0 approach[J].Knowl Inf Syst,2013,36(2):411-438.
- [27]XU L,ZHENG S,JIA J.Unnatural L0sparse representation for natural image deblurring[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition,2013:1107-1114.
- [28]SU A I,WELSH J B,SAPINOSO L M,et al.Molecular classification of human carcinomas by use of gene expression signatures[J].Cancer Res,2001,61(20):7388-7393.
- [29]BHATTACHARJEE A,RICHARDS W G,STAUNTON J,et al.Classification of human lung carcinomas by m RNA expression profiling reveals distinct adenocarcinoma subclasses[J].Proc Natl Acad Sci,2001,98(24):13790-13795.
- [30]NUTT C L,MANI D,BETENSKY R A,et al.Gene expression-based classification of malignant gliomas correlates better with survival than histological classification[J].Cancer Res,2003,63(7):1602-1607.