Reproducibility in Medical Research:
A Case Study

Quarto + tidymodels + GitHub Actions

2024 R/Medicine Conference
Marly Gotti

About Me

Data Scientist at Apple

Senior Data Scientist, Biogen

Posit intern (tidymodels)

Executive member of the R Validation Hub

R, open source,
{ ML } \(\cap\) { cancer research }

Marly Gotti                

Status Quo in Medical Research



Reproducibility

Adhering to journal formatting Reproducibility

Introduction

Quarto Manuscripts + tidymodels + GitHub Actions


Case Study: Thyroid Cancer Prediction Using Quarto, tidymodels, and GitHub Actions

About the Research Paper

  • Affiliation: Part of the MIT PRIMES program
  • MIT PRIMES: A research program for high school students
    • Offers research opportunities in mathematics, computer science, and computational biology
    • Students work on individual and group projects with MIT researchers and affiliates
  • Students: Anay Aggarwal, Ekam Kaur, and Susie Lu

https://math.mit.edu/research/highschool/primes/program/

Quarto Manuscript - Project Structure

quarto-manuscript/
├── quarto-manuscript.Rproj
├── _quarto.yml
├── references.bib
├── index.qmd
└── notebooks/
      ├── notebook1.qmd
└── .github/ (recommended)
      └── workflows/
            └── publish.yml
├── renv.lock (recommended)
├── renv/

Quarto Manuscript - Project Structure

quarto-manuscript/
├── quarto-manuscript.Rproj
├── _quarto.yml
├── references.bib
├── index.qmd
└── notebooks/
      ├── notebook1.qmd
└── .github/ (recommended)
      └── workflows/
            └── publish.yml
├── renv.lock (recommended)
├── renv/

Quarto Manuscript - Project Structure

quarto-manuscript/
├── quarto-manuscript.Rproj
├── _quarto.yml
├── references.bib
├── index.qmd
└── notebooks/
      ├── notebook1.qmd
└── .github/ (recommended)
      └── workflows/
            └── publish.yml
├── renv.lock (recommended)
├── renv/

Quarto Manuscript - Project Structure

quarto-manuscript/
├── quarto-manuscript.Rproj
├── _quarto.yml
├── references.bib
├── index.qmd
└── notebooks/
      ├── notebook1.qmd
└── .github/ (recommended)
      └── workflows/
            └── publish.yml
├── renv.lock (recommended)
├── renv/

Quarto Manuscript - Project Structure

quarto-manuscript/
├── quarto-manuscript.Rproj
├── _quarto.yml
├── references.bib
├── index.qmd
└── notebooks/
      ├── notebook1.qmd
└── .github/ (recommended)
      └── workflows/
            └── publish.yml
├── renv.lock (recommended)
├── renv/

Quarto Manuscript - Project Structure

quarto-manuscript/
├── quarto-manuscript.Rproj
├── _quarto.yml
├── references.bib
├── index.qmd
└── notebooks/
      ├── notebook1.qmd
└── .github/ (recommended)
      └── workflows/
            └── publish.yml
├── renv.lock (recommended)
├── renv/

Quarto Manuscript - Project Structure

quarto-manuscript/
├── quarto-manuscript.Rproj
├── _quarto.yml
├── references.bib
├── index.qmd
└── notebooks/
      ├── notebook1.qmd
└── .github/ (recommended)
      └── workflows/
            └── publish.yml
├── renv.lock (recommended)
├── renv/

Summary: Tools for Reproducibility

Quarto Manuscripts

  • Ensures research is fully reproducible
  • Adheres to journal formatting requirements
  • Allows code in R, Python, etc.

renv R Package

  • Manages R package versions
  • Ensures consistency across environments

Summary: Tools for Reproducibility

GitHub Actions

  • Continuous integration and deployment
  • Automates testing and building processes

Summary: ML Framework

tidymodels Ecosystem

  • Streamlines model-building process
  • Enhances accuracy and interpretability
  • Packages: rsample, recipes, workflows, tune, yardstick, etc.

Algorithms Used

  • ANN, KNN, SVM, Logistic Regression, XGBoost, Random Forest

Resources

Attributions