dc.description.abstract |
Chagas Disease (CD) is a vector–borne infectious disease transmitted from animals
to humans and reversely. It is caused by the parasite Trypanosoma cruzi (abbv. as
T. cruzi). It is forcing an enormous social burden on public health and counts as one
of the most major threats to human health. Based on WHO statistical analysis in
2019, CD affects about 7 million people and is responsible for nearly 50,000 annual
mortalities around the world. Also an average of 80 million people are living in risky
areas for infection in different parts of the world.
The disease has two phases of acute and chronic. Diagnosing of CD can be
performed at both acute and chronic phases. It invloves analyzing clinical, epidemi ological, and laboratory data. Since controlling and treating CD is easier in the
early stages, detecting it in the acute phase plays an essential role in overcoming
and controlling it.
There are many clinical trials dedicated to this problem, but progress in compu tational research (automatic identification) has been limited. Therefore, this work
presents four automated CD vector identification approaches that classify several
different vectors of kissing bugs with acceptable accuracy rates. Classification of
different CD vectors is important because carriers of CD belong to different species
classes unevenly scattered in different parts of the world. Therefore, differentiating
all species of CD vectors plays an important role in designing a robust global system
for automatic identification.
Three of our proposed methods are composed of preprocessing, feature extraction,
feature selection, data balancing, and classification phases. The preprocessing steps
are background removal, gray–scaling, and down–sizing. The Principal component
analysis (PCA) algorithm is utilized for feature extraction. A correlation–based subset
selection is used for feature selection. The classes are balanced by oversampled the
minority classes. Finally, the employed classification techniques include Decision Tree
(DT), Random Forrest (RF), and Support Vector Machine (SVM). These three methods
are named “PCA+DT”,“PCA+RF”, and “PCA+SVM”. In the fourth approach, we
applied two deep convolutional neural networks (CNN) on our preprocessed datasetiii
and omitted the feature extraction and feature selection steps. Our two convolutional
neural networks VGG16 and 7–layer CNN are trained using the same oversampled
image dataset.
The average accuracy using 150–features dataset for Brazilian vectors is 100% for
PCA+DT and PCA+RF methods; 98.20% for PCA+SVM; 88.60% for VGG16; and
97.57% for 7–layer CNN. Brazilian vectors belong to 39 species of kissing bugs with
1620 images in the utilized dataset. The average accuracy using 150–features dataset
for Mexican vectors is 100% for PCA+DT and PCA+RF; 98.40% for PCA+SVM; 89.20%
for VGG16; and 96.48% for 7–layer CNN. Mexican vectors belong to 12 species of
kissing bugs with 410 images in the utilized dataset.
Our results are promising and outperform previously developed systems. Given
that we have a small dataset, the results of tree–based algorithms (DT and RF) are
better than SVM and convolutional neural networks (CNN). Upon availability of
larger datasets of kissing bugs, the results of SVM and CNN are most likely to
improve. |
en_US |