Algorithm from Nonstandardized Internet Images of Melanocytic Lesions Shown to Be Effective

Published on: 

This new synthetic data set could work as a potential resource for training on AI as well as to help facilitate a consensus among clinicians.

A new non–hospital-based and larger-scale public data set could be helpful in improving the diversity of current melanocytic lesion data sets and to further improve artificial intelligence (AI) models, according to new findings.1

This new diagnostic investigation managed to generate a synthetic data set using diverse AI technologies and images found on the internet, and the neural network with this data set outperformed the network that had been trained with preexisting data.

The new research was seen as invaluable to dermatologists given that, while algorithms that employ convolutional neural networks (CNNs) can identify lesion images with strong accuracy, a deep learning algorithm to be used clinically would require an extensive dataset with a wide range of diverse images.2 To address this, the research was conducted and it was led by Soo Ick Cho, MD, PhD, from Lunit Inc. in Seoul, South Korea.

“Herein, we created data sets for melanoma and melanocytic nevus semi-automatically by crawling the internet and annotating the photographs,” Cho and colleagues wrote. “A CNN was trained using the created synthetic training data set. Its performance was externally validated using the pathologically confirmed public data sets.”

Background and Findings

The research team carried out their diagnostic investigation, conducting the annotation of 5619 total images from the CAN5600 dataset as well as 2006 total images drawn from the CAN2000 dataset (a subset of CAN5600 that had been revised manually). These pictures were of melanoma or nevus lesions and the images were semi-automatically gathered from a group of around 500,000 pictures found online.

The team used CNNs, region-based CNNs, and large mask inpainting methods for their process of annotation. They created another data set to facilitate the unsupervised pre-training—titled LESION130k—and the set was made up of 132,673 potential lesion pictures taken from several different sources located all around the world.

Furthermore, a set of 5000 synthetic images, termed GAN5000, was generated using the generative adversarial network StyleGAN2-ADA, with training on the CAN2000 dataset and pretraining on the LESION130k dataset. The investigators main focus was to assess the diagnostic performance of their models.

The research team implemented the area under the receiver operating characteristic curve (AUROC) as their main metric for better distinguishing malignant neoplasms, choosing 1 out of 7 publicly-available sets of data, which collectively had 2312 total images.

These data sets available to the public included Edinburgh, Waterloo, a subset from SNU, Asan test, PAD-UFES-20, 7-point criteria evaluation, and MED-NODE. The investigators then used a comparative analysis between the performance of EfficientNet Lite0 CNN on their proposed data set and the network’s performance after being trained on the remaining 6 pre-existing sets of data.

Overall, the research team found that the EfficientNet Lite0 CNN, after being trained on annotated or synthetic images, was shown to have had superior or equivalent mean (standard deviation) AUROCs versus the EfficientNet Lite0 trained on the publicly-confirmed pathological sets of data.

The investigators noted that these results included CAN5600 (0.874 [0.042]; P = 0.02), CAN2000 (0.848 [0.027]; P = 0.08), and GAN5000 (0.838 [0.040]; P = 0.31 [Wilcoxon signed-rank test]), along with the combined pre-existing sets of data (0.809 [0.063]). They added that the improvements can be attributed to having a larger training dataset size.

The research team did acknowledge that evaluation of the algorithms in real-world settings with consecutive cases would be necessary, as opposed to relying solely on the curated sets that aim at specific diseases.

Additionally, the team referenced the importance of noting a shortage of images representing Fitzpatrick skin types V and VI, and certain ethnic groups such as Hispanic individuals, were not shown to have been adequately represented in the available photos of the scraped images.

“Therefore, the generated images lacked examples of dark Fitzpatrick skin types,” they wrote. “This underscores the importance of having diverse real images available for synthetic image and algorithmic development.”


  1. Cho SI, Navarrete-Dechent C, Daneshjou R, et al. Generation of a Melanoma and Nevus Data Set From Unstandardized Clinical Photographs on the Internet. JAMA Dermatol. Published online October 04, 2023. doi:10.1001/jamadermatol.2023.3521.
  2. Petrie T, Samatham R, Witkowski AM, Esteva A, Leachman SA. Melanoma early detection: big data, bigger picture. J Invest Dermatol. 2019;139(1):25-30. doi:10.1016/j.jid.2018.06.187.