MIT PRIMES 2024 Update

Logistic Regression
Random Forest
Cancer Research
Predictive Models
Student Research Progress

Marly Gotti


June 21, 2024


MIT PRIMES (Program for Research In Mathematics, Engineering, and Science) is a prestigious program designed for high school students interested in conducting original research in mathematics, computer science, and related fields. Hosted by the Massachusetts Institute of Technology (MIT), this program offers students the unique opportunity to work closely with MIT faculty, postdoctoral researchers, and graduate students on cutting-edge research projects.

It is highly competitive, with a rigorous selection process. Applicants typically need to demonstrate exceptional mathematical ability, a strong academic record, and a keen interest in research. The program is open to high school students primarily from the Greater Boston area, but there are also programs available for students outside this region, such as PRIMES-USA and PRIMES Circle.

This year, I have the privilege of guiding a talented group of three students, Anay Aggarwal, Ekam Kaur, and Susie Lu, as part of the MIT PRIMES program. Our team has delved into advanced mathematical and computational research, and their dedication and enthusiasm have been truly inspiring.

Research Focus

Our project focuses on utilizing machine learning techniques to predict the recurrence of Differentiated Thyroid Cancer (DTC). The students are exploring various machine learning models, including Artificial Neural Networks (ANN), Logistic Regression, K-Nearest Neighbors (KNN), Support Vector Machines (SVM), and ensemble methods like Random Forest and Extreme Gradient Boosting (XGBoost). The aim is to identify the most effective model for accurately predicting cancer recurrence, which could significantly enhance personalized treatment strategies and improve clinical outcomes for patients.

Current Progress

The students have made impressive strides in their research. They have learned the intricacies of several predictive models and applied them to the data. Additionally, they have implemented and evaluated each model using key performance metrics such as accuracy, precision, recall, and specificity.

Key Findings

  • Random Forest: Emerged as the top performer in terms of overall accuracy and specificity, achieving a 94% accuracy rate and a 94% specificity rate. This model has shown robustness in correctly identifying negative cases.
  • Precision: The Artificial Neural Network, Logistic Regression, and Random Forest models all achieved an 85% precision rate, reflecting their ability to correctly classify positive cases.
  • SVM: Stood out in terms of recall, achieving a perfect score of 100%, indicating its effectiveness in capturing all positive cases.

These results highlight the Random Forest model as the most balanced and reliable for this classification task, combining high accuracy and specificity with competitive precision. Meanwhile, the SVM model excels in recall, making it particularly suitable for ensuring that all positive cases are identified.

For more detailed updates and insights into their work, you can follow their progress in GitHub and in the corresponding live report.


Guiding these students through the MIT PRIMES program has been an incredibly rewarding experience. Their commitment to learning and their innovative approach to research promise to yield significant contributions to the field of cancer prediction and treatment. Stay tuned for more updates as we continue this exciting journey!