Strengths, Limitations Found in Penicillin Allergy Machine Learning Prediction Model

Published on: 

Data presented at the AAAAI annual meeting showed that machine learning models had a high accuracy rate in penicillin allergy prediction, though researchers concluded that more data is needed before adoption.

A machine learning model called the SHapley Additive exPlanations (SHAP) framework was found to be effective at predicting risk drivers for penicillin allergy but, due to certain limitations, it was not endorsed for adoption.1

The data was presented at the the American Academy of Allergy, Asthma & Immunology (AAAAI) 2023 Annual Meeting in San Antonio, TX.

This study, authored by Alexei Gonzalez-Estrada, MD, from the Mayo Clinic, aimed to develop machine learning models to predict penicillin allergy using multi-site US data.

The rationale behind the team’s research was to utilize the reaction history in logistic regression and machine learning models to identify penicillin allergy predictors based on data outside of the US.

The investigators utilized retrospective data from Mayo Clinic (Rochester, Arizona, Florida) and Massachusetts General Hospital. They grouped the data into 4 sets:

  • Enriched testing
  • Enriched training
  • Non-enriched external testing
  • Non-enriched internal testing

Machine learning algorithms were also used by the team for development of the model, and the investigators assessed its accuracy using the area under the curve (AUC).

The study included data from 4,777 participants, with a mean age of 57 and 68% of them being female. The team noted that 11% were found to have a confirmed allergy to penicillin.

The investigators noted that the gradient boosted model was found to be the strongest model with an AUC of 0.67.

They added that the main SHAP features for allergy confirmation were having had reactions in the past year needing medical care, having had hives/urticaria as a reaction, and being of the female sex.

The biggest ‘‘unknowns’’ identified by the investigators were the timing (71%), treatment (31%), and symptoms (13%) of those with the allergy.

Neither strong discrimination nor calibration in the non-enriched internal and the external testing datasets were demonstrated by the model. Additionally, the investigators found that the performance of the predictive model was limited by ‘‘unknowns,’’ and as a result the model was found to not be strong enough for them to endorse its adoption.

The team also noted that large-scale US-based prospective data may be necessary for better predictive modeling for allergy to penicillin.


  1. Gonzalez-Estrada A, Radojicic C, Chow S, et al. Predicting Penicillin Allergy: A US Multicenter Retrospective Study. J Allergy Clin Immunol Pract. 2020;8(7):2295-2302.e2. doi:10.1016/j.jaip.2020.03.016.