COVID-19 Modeling, Forecasting, and Projections
We proposed the SIkJalpha model at the beginning of the COVID-19 pandemic. Over the years, as the pandemic evolved, more complexities were added to capture crucial factors and variables that can assist with projecting desired future scenarios.
Throughout the pandemic, multi-model collaborative efforts have been organized to predict short-term outcomes (cases, deaths, and hospitalizations) of COVID-19 and long-term scenario projections. We have been participating in five such efforts: US Scenario Modeling Hub, US Forecast Hub, Europe Scenario Modeling Hub, Europe Forecast Hub, Germany/Poland Forecast Hub.
[Paper on an early version] [Paper on evolution of the model][Scenario Modeling at CDC MMWR][PNAS on US Forecast Hub]
Shape-based Representation and Forecasting
Infectious disease forecasting for ongoing epidemics has been traditionally performed, communicated, and evaluated as numerical targets - 1, 2, 3, and 4 week ahead cases, deaths, and hospitalizations. While there is great value in predicting these numerical targets to assess the burden of the disease, we argue that there is also value in communicating the future trend (description of the shape) of the epidemic. For instance, if the cases will remain flat or a surge is expected. To ensure what is being communicated is useful we need to be able to evaluate how well the predicted shape matches with the ground truth shape.
Instead of treating this as a classification problem (one out of n shapes), we define a transformation of the numerical forecasts into a shapelet-space representation. In this representation, each dimension corresponds to the similarity of the shape with one of the shapes of interest (a shapelet). We prove that this representation satisfies the property that two shapes that one would consider similar are mapped close to each other, and vice versa. We demonstrate that our representation is able to reasonably capture the trends in COVID-19 cases and deaths time-series. With this representation, we define an evaluation measure and a measure of agreement among multiple models. We also define the shapelet-space ensemble of multiple models which is the mean of the shapelet-space representation of all the models. We show that this ensemble is able to accurately predict the shape of the future trend for COVID-19 cases and trends. We also show that the agreement between models can provide a good indicator of the reliability of the forecast.
Influenza Modeling and Forecasting
The lack of Influenza case tracking makes it difficult to use traditional epidemiological models for influenza hospitalization forecasting. However, hospitalizations data from multiple past seasons provides an opportunity for Machine Learning.
We hypothesize that we can improve forecasting by using multiple mechanistic models to produce potential trajectories and use machine learning to learn how to combine those trajectories into an improved forecast. We propose a Tree Ensemble model design that utilizes the individual predictors of our baseline model SIkJalpha to improve its performance. Each predictor is generated by changing a set of hyper-parameters. We compare our prospective forecasts deployed for the FluSight challenge (2022) to all the other submitted approaches. Our approach is fully automated and does not require any manual tuning. We demonstrate that our Random Forest-based approach is able to improve upon the forecasts of the individual predictors in terms of mean absolute error, coverage, and weighted interval score. Our method outperforms all other models in terms of the mean absolute error and the weighted interval score based on the mean across all weekly submissions in the current season (2022). Explainability of the Random Forest (through analysis of the trees) enables us to gain insights into how it improves upon the individual predictors.
Graph Neural Networks
Training and inference on deep GNNs on large graphs are difficult due to computational complexity and lack of accuracy improvements with deeper layers. Subgraph-based methods to address training on large graphs exist, but they do not apply during inference, making inference the bottleneck. Such methods also do not address poor accuracy for deep networks due to "oversmoothing". We address the following challenges: (i) Developing subgraph-based schemes that apply to training and inference. (ii) Identifying good subgraph-sampling strategies. (iii) Pruning weights to reduce computations during inference.
Prior to my faculty position, I worked on a range of problems spanning from theoretical to experimental to real-world deployments, that involved a mix of Algorithms, Network Science, and Data Mining. The figure summarizes my past research. Please see my Publications page or contact me to learn more about my contributions to these problems.