Fourth Midwest Healthcare Conference Causal Diagram Challenge

Estimating Causal Effects of Glucocorticoids on COVID-19 Survival 

As part of the 4th Midwest Healthcare Conference, we are hosting a Mini Data Challenge, which offers a valuable opportunity for participants to apply cutting-edge causal inference methodologies to real-world data. By focusing on the causal effects of glucocorticoids on COVID-19 survival, participants contribute to a critical area of research with significant clinical implications. The challenge fosters innovation and emphasizes the importance of rigorous, transparent, and ethical research practices in healthcare.

Structural Causal Models

Structural causal models (SCMs) are used as a powerful framework for understanding and analyzing cause-and-effect relationships within complex systems. By utilizing directed acyclic graphs (DAGs), the causal structure between variables can be visually and mathematically represented, allowing for the distinction between correlation and causation. The identification of causal pathways, the prediction of outcomes under hypothetical interventions, and the determination of the impact of changes within a system are enabled by this approach. In summary, SCM describes relationships that cause an outcome.

Variables+ Interventions -> outcomes 

This will be depicted as an interactive diagram in the software used during competition.

Challenge Overview

Identifying the causal effects of an intervention is a central task in the health sciences. This Causal Diagram challenge aims to optimize the use of SCMs to estimate the causal effects of glucocorticoids on in-hospital survival rates among COVID-19 patients using real-word data

Variable + interventions (hydroxychloroquine vs steroid) -> outcome (28-day survival) 

Challenge Aim

The primary goal is to assess how participants can develop SCMs to estimate the causal effects of glucocorticoids and hydroxychloroquine on 28-day survival rates among COVID-19 patients using a large de-identified COVID-19 dataset. This will help evaluate the robustness and transportability of SCMs in replicating results from randomized controlled trials (RCTs) in a real-world setting. 

Challenge Question

Estimate the causal effects of glucocorticoids and hydroxychloroquine on the 28-day survival of COVID-19 patients stratified by Covid-19 disease severity (low, moderate,severe) using a structural causal model and the provided real-world dataset.

Task Description

Participants will submit causal diagrams (formally, SCMs) that precisely describe the causal factors of all-cause 28-day survival in the context of Covid-19 infection. The SCMs will define variables that the participant believes introduce confounding bias, such as age, race and gender that must be handled with adjustment. 

Each variable should be labeled as adjusted or unadjusted. Adjusted variables must be associated with variables from the provided dataset. The challenge allows for creating diagrams for analyses using inverse probability of treatment weights or doubly robust models. Participants are encouraged to describe the rationale for each causal relationship in their SCM; it is crucial to avoid specifying implausible causal relationships, such as treatment causing age. Full details on the process for model creation and submission in the cStructure platform will be provided to registered teams on the day of the challenge launch.

Data Source

An observational dataset collected at a large U.S. health system during the early waves of the global pandemic will be provided to the participants once they sign up for the competition. Participants are allowed to use only data provided by organizers.

Output

Submitted causal diagrams will be used to assess the impact of  glucocorticoids and hydroxychloroquine on 28-day all-cause survival rates in COVID-19 patients.  These estimates will be expressed as relative risk ratios, include bootstrapped confidence intervals, and will be stratified by Covid-19 disease severity. 

Participants must also submit documentation (max 3 pages) describing their modeling strategy that would allow the challenge organizers to reproduce their model if needed.

How to Participate

The competition will be held on the cStructure platform, allowing participants to create and refine their models user-friendly and intuitively. Moreover, each data variable will have a limited number of transformations available in the causal diagrams. The platform allows participants without any knowledge of code to create a causal diagram in a matter of minutes.

  • Fill out this form to register your team by July 24 (team size 1-4 members). One form per team.
  • Upon filling out the form, you and your team members will be given access to the cStructure platform.
  • Sign in to the cStructure platform and accept the competition rules and data use agreement.
  • Analyze the data, build and refine your model.
  • Submit your final entry by the deadline.

Contact Information: ahsen@illinois.edu

Ethical Considerations and Data Privacy

  • Data Confidentiality: All data is de-identified to ensure patient privacy and participation in the challenge requires acceptance of the User Agreement for the NIAID Immunology Database and Analysis Portal (ImmPort), which can be reviewed at https://docs.immport.org/home/agreement/.
  • Model Transparency: While the structural causal models developed by participants will not be disclosed, the causal effect estimates and compatibility intervals will be shared to maintain transparency.
  • Ethical Compliance: The challenge adheres to ethical standards in data handling and analysis, ensuring the integrity of the research process.

Evaluation Metrics

Primary Metric (Causal Estimate Coverage)

  • Alignment between the causal effect estimates generated by the team’s SCM and those reported from high-quality RCTs.  In other words, the SCM models generated by participating teams in the cStructure platform will be compared to pre-existing results generated by RCTs.
  • Focus: Patients who would have met the RCT emulation eligibility criteria stratified by Covid-19 disease severity at randomization.
  • Importance: Measures the model’s ability to replicate findings from rigorous clinical trials.

Secondary Metrics

Participants will also be evaluated based on:

  • The rigor of the analysis with a particular focus on plausible relationships between variables and outcomes that reflect real-life applications. For example, age contributing to disease severity is a plausible relationship. However, treatment causing age is not a plausible relationship.
  • Creativity and innovation in participants’ approach to reducing causal estimate bias and increasing causal estimate precision.
  • Clear and concise documentation of participants’ work, including explanations of their models and assumptions, which is crucial for reproducibility and understanding.

Timeline

  1. Start Date: Monday, July 18
  2. QA Webinar: Monday, July 29
  3. Submission Deadline: Wednesday, August 15
  4. Announcement of Winner: Monday, August 19
  5. Healthcare Workshop: Friday, August 23 (Top three teams might be asked to give a brief presentation at the workshop)

Further Competition Information

Upon the launch of the challenge on July 18, we will provide participants who filled out the signup form with access to the cStructure platform. Through the cStructure platform, participants will have access to real-life training data, which they can use to build their initial model.

Leaderboard Posting

Halfway through the challenge, around July 29, we will hold a leaderboard stage where we display the individual performances of all teams and rank teams using proprietary leaderboard data. This fosters a competitive yet collaborative environment where teams can track progress and refine their models. We will announce the leaderboard phase results along with the QA webinar.

Final Validation of Models

To be eligible for prizes, participants should submit their final model by August 14, 23.59 PM CT. Upon the completion of the competition, we will use a separate validation set (distinct from the training and leaderboard data) to evaluate the performance of the final model. Moreover, we will use additional criteria as described to rank different models.