Please use this identifier to cite or link to this item:
http://hdl.handle.net/123456789/14706
Title: | Ensemble of Penalized Logistic Models for High-Dimensional Data |
Authors: | Shaheen, Musarrat |
Keywords: | Statistics |
Issue Date: | 2020 |
Publisher: | Quaid i Azam University |
Abstract: | In modern era of science and technology, abundance of data is ubiquitous and challenge is to make sense of data particularly in prediction problems. A number of approaches has been suggested to deal with a huge ow of data and their classi cation issues. Classi ers aggregation known as ensemble methods have proven improved classi cation accuracy in a broad range of applications. These methods can also be applied for accuracy gain in classi cation of high-dimensional data. In this study, we propose an Ensemble of Penalized Logistic models (EPL) which utilizes Ridge regression, Lasso and Elastic net as base learners for ensemble generation. Prediction through Logistic regression is not possible because of the constraint on observation to be greater in number than the features. Therefore, we implemented penalized logistic models for predictions where, di erent penalty terms lead to di erent predictions. For construction of an ensemble we utilize logistic models with three penalties as base learners. In the EPL ensemble integration is done in two steps. In the rst step, predictions of the three learners are fused together through majority voting to classify the response in each bootstrap sample. This procedure is repeated m times. In the second step, the nal class prediction is obtained on the basis of majority votes among the m models. The EPL is assessed on both simulated data and microarray data sets. The simulated data is generated in ve di erent scenarios with the view to assess the performance of the EPL in various problems. Its performance in terms of classi cation accuracy is compared with state-of-the art classi ers, i.e., K-Nearest Neighbours (KNN), Support Vector Machines (SVM) and Random Forest (RF). The experimental comparisons show that the EPL achieves accuracy gain in classi cation of high dimension data. Particularly, it enhances classi cation accuracy in the presence of 11 irrelevant variables and high correlation in the data. Results reveal that the EPL out performs as compared to the other methods considered for comparison and also provide better accuracy than base learners that are used to construct it. In addition, the EPL is applied for class membership probabilities estimations. The results of EPL on simulated data and microarray datasets, show that its performance in terms of Brier score is better than KNN, SVM, RF and baseline classi ers. Another contribution in this thesis is variable selection procedure known as the ensemble of penalized logistic models (EPLM). The EPLM employs logistic models, using the penalties Lasso, Adaptive Lasso and Elastic net Lasso, as baseline classi ers. This method uses Adaptive Lasso instead of Ridge as Ridge is not able of doing variables selection. This method selects the feature according to the importance score calculated for each variable on the basis of each penalty. The method is assessed on the simulated datasets in addition to microarray data sets. Experimental comparisons report that the EPLM has gain classi cation accuracy for all classi ers as compared to the full features set. |
URI: | http://hdl.handle.net/123456789/14706 |
Appears in Collections: | Ph.D |
Files in This Item:
File | Description | Size | Format | |
---|---|---|---|---|
STAT 373.pdf | STAT 373 | 7.5 MB | Adobe PDF | View/Open |
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.