Please use this identifier to cite or link to this item: http://hdl.handle.net/123456789/29247
Title: Prediction of Consanguinity Marriages in various regions of Pakistan using Machine Learning Algorithms
Authors: Muhammad Akhlaq
Keywords: Statistics
Issue Date: 2023
Publisher: Quaid I Azam University Islamabad
Abstract: Abstract Consanguinity is the genetic connection between individuals through shared ancestry from a common parent. Consanguineous unions can lead to an increased risk of inherited genetic disorders due to the sharing of a higher proportion of common ancestors. This study aims to identify factors influencing consanguineous marriages, employing machine learning algorithms to develop and validate predictive models. For this purpose, the data sourced from a prior study by (Jabeen and Malik, 2014a) focuses on consanguinity in marriages, with 16 variables including age, language, caste, and education for both women and husbands in the Bhimber district. External validation is conducted using a separate dataset from Mardan, Pakistan Tufail et al. (2017), with 13 variables, including age, caste, income, and education, to assess the generalizability of findings on consanguinity. Initially, the risk factors are investigated using association analysis and odds ratios for each category of the covariates. The Boruta algorithm is also employed to capture the relative importance of each factor concerning consanguineous marriage. The prediction of consanguineous marriages is carried out using various machine learning algorithms (i.e., logistic regression, decision trees, random forests, ensemble stacked meta-models, and support vector machines) based on the selected most important factors. Furthermore, the generalizability, reliability, and robustness of the models were assessed by external validation in the Mardan district of Pakistan. The study identified several statistically significant factors associated with consanguineous marriages, including the age, caste, language, and marriage year of wives, as well as the age, caste, and language of husbands. Additionally, variables such as marriage type, family type, and specific tehsils within the Bhimber district were found to significantly impact consanguineous unions. The bagging and ensemble method of the stacked meta-model demonstrated superior performance with AUC values of 0.59 for the Bhimber district, 0.67 for the Mardan district, and 0.58 when the Bhimber district served as the training dataset and Mardan district as the test dataset, outperforming other models, with logistic regression being the second-best performing. The study concluded that consanguineous marriage is significantly influenced by the factors, i.e., age, caste, language, family type, and marriage types of both parents. The findings highlighted the sensitive interplay of diverse factors in shaping consanguinity trends and emphasized the effectiveness of ensemble methods in predictive modeling for such complex phenomena. The study provides significant insights to improve healthcare policies and may help in executing targeted measures
URI: http://hdl.handle.net/123456789/29247
Appears in Collections:M.Phil

Files in This Item:
File Description SizeFormat 
STAT 537.pdfSTAT 5371.18 MBAdobe PDFView/Open


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.