Prediction of factors influencing rats tuberculosis detection performance using data mining techniques
Loading...
Date
2019
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Uppsala Univers1tet
Abstract
This thesis aimed to predict the factors that influence rats TB detection performance using data
mining techniques. A rats TB detection performance dataset was given from APOPO TB
training and research center in Morogoro. Tanzania. After data preprocessing, the size of the
dataset was 471,133 rats TB detection performance observations and a sample size of 4 female
rats. However, in the analysis, only 200,000 data observations were used. Based on the CRISP-
DM methodology, this thesis used R language as a data mining tool to analyze the given data.
To build the predictive model the classification technique was used to predict the influencing
factors and classify rats using a decision tree, random forest, and naive Bayes algorithms. The
built predictive models were validated with the same test data to check their classification
prediction accuracy and to find the best. The results pinpoint that the random forest is the best
predictive model with an accuracy of 78.82%. However, the accuracy differences are
negligible. When considering the predictive model accuracy (78.78%) and speed (3 seconds)
of the decision tree, it is the best predictive model since it has less building time compared to
the random forest (154 seconds). Moreover, the results manifest that age is the most significant
influencing factor, and rats of ages between 3.1 to 6 years portrayed potentiality in detection
performance. The other predicted factors are Session_Completion_Time, Session_Start_Timc,
and Av_Weight_Pcr_Ycar. These results are useful as a reference to rats TB trainers and
researchers in rats TB and Information Systems. Further research using other data mining
techniques and tools is valuable.
Description
PhD
Keywords
Giant African Tuberculosis-human sputum, Pouched rats Tuberculosis-human sputum, Trained African giant pouched rats-human sputum, Data Mining-healthcare, Data Mining, Classification Technique-Diagnosis-Tuberculosis, Classification Technique.