Chagas Disease Vectors Identification using Data Mining and Deep Learning Techniques

UDM Libraries / IDS Digital Repository


Show simple item record Ghasemi, Zeinab 2021-04-23T18:17:50Z 2021-04-23T18:17:50Z 2021-04-23
dc.description Department of Electrical and Computer Engineering and Computer Science Thesis en_US
dc.description.abstract Chagas Disease (CD) is a vector–borne infectious disease transmitted from animals to humans and reversely. It is caused by the parasite Trypanosoma cruzi (abbv. as T. cruzi). It is forcing an enormous social burden on public health and counts as one of the most major threats to human health. Based on WHO statistical analysis in 2019, CD affects about 7 million people and is responsible for nearly 50,000 annual mortalities around the world. Also an average of 80 million people are living in risky areas for infection in different parts of the world. The disease has two phases of acute and chronic. Diagnosing of CD can be performed at both acute and chronic phases. It invloves analyzing clinical, epidemi ological, and laboratory data. Since controlling and treating CD is easier in the early stages, detecting it in the acute phase plays an essential role in overcoming and controlling it. There are many clinical trials dedicated to this problem, but progress in compu tational research (automatic identification) has been limited. Therefore, this work presents four automated CD vector identification approaches that classify several different vectors of kissing bugs with acceptable accuracy rates. Classification of different CD vectors is important because carriers of CD belong to different species classes unevenly scattered in different parts of the world. Therefore, differentiating all species of CD vectors plays an important role in designing a robust global system for automatic identification. Three of our proposed methods are composed of preprocessing, feature extraction, feature selection, data balancing, and classification phases. The preprocessing steps are background removal, gray–scaling, and down–sizing. The Principal component analysis (PCA) algorithm is utilized for feature extraction. A correlation–based subset selection is used for feature selection. The classes are balanced by oversampled the minority classes. Finally, the employed classification techniques include Decision Tree (DT), Random Forrest (RF), and Support Vector Machine (SVM). These three methods are named “PCA+DT”,“PCA+RF”, and “PCA+SVM”. In the fourth approach, we applied two deep convolutional neural networks (CNN) on our preprocessed datasetiii and omitted the feature extraction and feature selection steps. Our two convolutional neural networks VGG16 and 7–layer CNN are trained using the same oversampled image dataset. The average accuracy using 150–features dataset for Brazilian vectors is 100% for PCA+DT and PCA+RF methods; 98.20% for PCA+SVM; 88.60% for VGG16; and 97.57% for 7–layer CNN. Brazilian vectors belong to 39 species of kissing bugs with 1620 images in the utilized dataset. The average accuracy using 150–features dataset for Mexican vectors is 100% for PCA+DT and PCA+RF; 98.40% for PCA+SVM; 89.20% for VGG16; and 96.48% for 7–layer CNN. Mexican vectors belong to 12 species of kissing bugs with 410 images in the utilized dataset. Our results are promising and outperform previously developed systems. Given that we have a small dataset, the results of tree–based algorithms (DT and RF) are better than SVM and convolutional neural networks (CNN). Upon availability of larger datasets of kissing bugs, the results of SVM and CNN are most likely to improve. en_US
dc.language.iso en_US en_US
dc.subject Engineering, Computer, Chagas, Disease, Data Mining, Deep Learning en_US
dc.title Chagas Disease Vectors Identification using Data Mining and Deep Learning Techniques en_US
dc.type Image en_US
dc.type Map en_US
dc.type Thesis en_US

Files in this item

This item appears in the following Collection(s)

Show simple item record

Search DSpace

Advanced Search


My Account