Malaria parasite detection in red blood cells

Short description

Capstone project for the MIT Applied Data Science Program

Background

According to the World Health Organization (WHO), the world saw approximately 241 million malaria cases in 2020. In addition, the WHO estimates that malaria accounted for about 627,000 deaths in 2020. The traditional procedure for malaria detection in a laboratory requires careful examination by a specialist who can discern between infected and healthy red blood cells. Unfortunately, the process is time-consuming and yields varying results in accuracy because of the different levels of experience of the professionals inspecting the cells.

Objective

The project aimed to create an efficient and precise computer vision model using deep learning algorithms to differentiate between infected and healthy red blood cells. The automated system should help with the early, rapid, and accurate detection of the Plasmodium parasite in red blood cells that causes malaria.

Client Name

MIT Applied Data Science Program

Release Date

April 22, 2022

Project Types

Computer Vision, Deep Learning, Data Science

Skills

Exploratory Data Analysis, Convolutional Neural Networks, Transfer Learning, Data Augmentation, Feature Engineering, Data Visualization

Tools

Python 3, TensorFlow 2.0, Google Colab, Jupyter Notebooks, JetBrains DataSpell, Anaconda

Data set

  • Data type: colored images
  • Train data: 24,958 images
  • Test data: 2,600 images

Results

Using multiple diverse layers in a Convolutional Neural Network (CNN) model proved effective. The model I created yielded a test accuracy of 98.31%, with generalized performance for the validation data accuracy. The precision and recall of the model were between 98% and 99%, respectively, for detecting infected and healthy red blood cells in the images. The number of false positives and false negatives were 30 and 14, respectively, out of 1,300 individual cell images.

  • Test accuracy: 98.31%
  • Precision range: 98% to 99%
  • Recall range: 98% to 99%
  • False positives: 30 of 2600 (1.15%)
  • False negatives: 14 of 2600 (0.54%)
Healthy and infected red blood cells from the data set
Healthy and infected red blood cells from the data set arranged in a 6x6 grid
Red blood cells converted from RGB to HSV using OpenCV
Red blood cells converted from RGB to HSV using OpenCV
Red blood cells with Gaussian blur using OpenCV
Red blood cells with Gaussian blur using OpenCV
Accuracy plot for the final computer vision model
Accuracy plot for the final computer vision model
Confusion matrix for the final computer vision model
Confusion matrix for the final computer vision model