Abstract- Feature selection has been an active area of research for decades. In 1977, Thomas M. Cover and Jan M. Van Capenhout showed that only exhaustive search can guarantee the best combination of features, but it is costly in terms of computational resources and time. This work proposed the use of two selection criteria in a stepwise search methods, i.e., sequential forward floating selection algorithms which wraps support vector regression, and compares the results obtained by two of high-dimensional lineal regression problem. Adjusted R2 and mean squared error are used as optimality or selection criteria. One of the many areas which make heavy use of feature selection techniques is bioinformatics. Genome wide association studies in bioinformatics aims at determining whether a genetic variant is associated with a certain phenotype. Single nucleotide polymorphism (SNP) is the most popular marker used to identify genetic polymorphisms. Testing of the proposed method for variable selection in high-dimensional linear regression was conducted using two simulated SNP datasets generated by the 'scrime' package and in low dimensional linear regression using a datasets from the 'UsingR' package in R.Our results show that the intersection of the tw.selected subsets produced by the two selection criteria can reduce the number of false positives
|
|