Please use this identifier to cite or link to this item:
http://hdl.handle.net/123456789/27486
Title: | Comparison of Different Classification Algorithm Based on Significant Water Parameters |
Authors: | Tayyaba Aurangzeb |
Keywords: | Statistics |
Issue Date: | 2018 |
Publisher: | Quaid I Azam University Islamabad |
Abstract: | In recent years, water issues have come to claim a firm position among the top challenges facing globally. Through the Sustainable Development Goals, it becomes a global concern where a goal dedicated specifically to water and its sanitation system. It is said in World Economic Forum, that water issue will become a major issue in coming years. Like many other developing countries, Pakistan is also facing great public health challenge due to unhygienic and polluted water. People in our country are forced to buy bottled water because of poor quality. The intake of bottled water has been increasing constantly over the last era, even in countries wherever tap water quality is reported excellent. But it is found that many mineral water companies are selling contaminated water. The present study aimed to monitor Bottled water quality of 538 commercially available brands of mineral water from local market of 15 cities of Pakistan, from J anuary-2011 to September-2015. All mineral water brands data were analyzed by using physio-chemical parameters named ; pH, Electrical Conductivity (EC), Total Dissolve Solids (TDS), Calcium (Ca), Magnesium (Mg), Hardness, Bicarbonate (HC03), Chloride (Cl), Sodium (Na) and Sulfate (S04). In order to communicate the quality of water, it is needed that all parameters should be compressed in a standard format to interpret t he quality of water precisely. Three Water Quality Indices have been used to evaluate the quality of bottled water, named Weighted Arithmetic Water Quality Index, National Sanitation Foundation Water Quality Index and Water Quality Index (WQI) . Classes of Bottled WQI falls in three groups i.e. Excellent class of water, Good class of water and Poor class of water. Out of 538 brands, 56 were found to supply very poor quality of water. Moreover, the objective of this research is two fold , first Bottled water data is analyzed by using supervised machine learning algorithms such as artificial neural network, support vector machine by using different kernel functions and random forest Model and its improved version i.e C4.5, C5.0. Each algorithm is trained using 80% of data and remaining 20% of data is used for testing purpose. Comparison is made within and between algorithms to demonstrated the most useful classification method to classify the quality of Bottled water with smallest error rate. Results reveal that although random forest methods showed highest accuracy rate, however C.50 is t he most useful to classify test data with high accuracy level within minimum time and required less memory as compared to remaining algorithms. Support vector machine with complex polynomial kernel revealed significant result. The second objective is to reduce the dimension of data in such a way that classification of bottled water quality is preserved. To accomplish this goal, Principal Component Analysis (PCA) and t-SNE is used to determine the lower dimension of data. PCA determines the optimal linear combinations that appropriately explain the data. It is concluded that four component explain the 80.4% variation of the data. The t-SNE is another method which is more efficient dimension reduction in such a way that similarity of classes is remain in lower dimension well. Graphs of both methods are displayed and it is observed t hat t-SNE is more stable algorithm by showing clustering of different classes accurately. Pakistan Council of Research in Water Resources (PCRWR) should review the quality of Bottled water consistently, and it is also suggested that PCRWR should take forward steps to ban those brands which found by saling contaminated water. |
URI: | http://hdl.handle.net/123456789/27486 |
Appears in Collections: | M.Phil |
Files in This Item:
File | Description | Size | Format | |
---|---|---|---|---|
STAT 369.pdf | STAT 369 | 5.73 MB | Adobe PDF | View/Open |
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.