Partnership for Research and Education in Materials between NMHU and OSU in Electronic, Optical, and Magnetic Materials

NSF DMR-2031548

RAPID: Machine Learning Methods to Understand, Predict and Reduce the Spread of COVID-19 in Small Communities

Using knowledge in machine learning applied to material science, NMHU PREM participants developed proposal to study spread of COVID-19 pandemic in Navajo Nation, which is among the five most infected areas in the U.S. Project was funded in May 2020.

Dr. Gil Gallegos’ group is working on analysis of COVID-19 data using machine learning and artificial intelligence techniques. The team includes an anthropologist Dr. Orit Tamir with considerable experience in Navajo studies, and NMHU graduate student-representative of the tribe (Fernando Sarracino). The specific goal of the project is found in prediction/mitigation of the virus in the Navajo Nation. The group plans to identify the most important socio-economic factors influencing incidence. It will help develop measures contributing to mitigate the current COVID-19 spread. Results of the study will be conveyed to the Navajo people.

Research Group Members

Dr. Gil Gallegos



PI

Computer Science



Dr. Tatiana Timofeeva


Co-PI

Chemistry



Dr. Orit Tamir



Co-PI

Anthropology




Viktor Glebov


BS and MS in Optotechnics, University of Information Technologies, Mechanics and Optics (ITMO University), Saint Petersburg, Russia


Graduate Student

Major in Computer Science


Performing data analysis and cloud computing system configuration



Jesse Ibarra


BS in Computer Science, New Mexico Highlands University


Graduate Student

Major in Computer Science


Maintaining, upgrading, and configuring of GPU cluster through the use of cloud computing



Fernando Sarracino


BS in Computer Science, New Mexico Highlands University


Graduate Student

Major in Computer Science




Svetlana Riabova


BS in Applied Mathematics in Economics, Saint Petersburg State University of Economics, Saint Petersburg, Russia


Graduate Student

Major in Computer Science


Developing predictive models with Python/Wolfram Mathematica, curve clustering, AI/ANN for longitudinal data



Christopher Torres


BA in Mathematics, New Mexico Highlands University


Graduate Student

Major in Computer Science


Helping discover the factors and labels that may be key to predicting the contraction of COVID-19, specific to the Navajo Nation



Research Methods

Map showing COVID-19 cases in Navajo Nation

Time series clustering will help find the reasons for the different character of COVID-19 development in different communities

  1. Gather multiple publicly available COVID-19 data sets
  2. Construct both small training and large data sets from (1)
  3. Preprocess the data utilizing machine learning methodology (normalization, missing data, encoding character/string data…)
  4. Analyze small training sets utilizing principal component analysis (PCA) and autoencoders (AE) models
  5. Compare results
  6. Analyze full sets utilizing PCA and AE models
  7. Compare results
  8. Iterate AE model on 4-7 until results are valid based on literature currently available
  9. Distinguish different scenarios in the spread of the virus in the countries, which have successfully overcome the peak and find the reasons behind those differences
  10. Utilizing data sets from 4 and 6 run through traditional machine learning models, such as: regression, nonlinear regression, logistic regression and k-nearest means in order to find the explanation of the different characteristics of the curves for different countries. The methods will be applied to the whole data sets as well as to the datasets including only the countries whose trends show that they have passed the peak of the pandemic spread in order to find the explanation to the different character of the curves
  11. Utilizing data sets from (4 and 6) run more complex deep learning methods on the data sets (AE (unsupervised), multilayer perceptron (MLP) artificial neural networks (supervised), convolutional neural networks (CNN, supervised/semisupervised/supervised) and recurrent neural networks with long-short term memory (RNN-LSTM, supervised, semisupervised) and generative adversarial networks (GAN, supervised)
  12. Compare results from 10 and 11
  13. Compare with current pandemic data and make short term and long term predictions from tuned models
  14. Relying on the results of 9, 10 and 11 propose possible measures to effectively quench the spread of similar viral infections in the future