International Journal of Retina (IJRETINA) 2025. Volume 8. Number 2. P-ISSN. E-ISSN. THE USE OF ARTIFICIAL INTELLIGENCE FOR DIAGNOSING RETINOPATHY OF PREMATURITY A SYSTEMATIC REVIEW Adinda Mulya Pertiwi1. Yeni Dwi Lestari2. Dian Estu Yulia3* Residency Program in Ophthalmology. Department of Ophthalmology. Faculty of Medicine Universitas Indonesia Ae Cipto Mangunkusumo General Hospital. Jakarta. Indonesia Community Ophthalmology Division. Department of Ophthalmology. Faculty of Medicine Universitas Indonesia Ae Cipto Mangunkusumo General Hospital. Jakarta. Indonesia Pediatric Ophthalmology Division. Department of Ophthalmology. Faculty of Medicine Universitas Indonesia Ae Cipto Mangunkusumo General Hospital. Jakarta. Indonesia Abstract Introduction: Retinopathy of prematurity (ROP) is a major but preventable cause of childhood Screening in developing countries is challenging due to skilled staff shortages. Recent advances in artificial intelligence (AI) offer promising result. This study evaluates the diagnostic performance of AI models for ROP screening. Methods: This systematic review followed PRISMA guidelines and included studies from Cochrane. MEDLINE, and ScienceDirect. Eligible studies were cross-sectional or cohort designs that compared AI diagnostic accuracy for ROP against a gold standard and reported relevant metrics. Studies were graded using the Oxford CEBM levels of evidence. Results: Of 608 studies, 12 were included. i-ROP DL showed high sensitivity and specificity (AUC 0. , with ResNet-152 and EfficientNet-B0 also performing well. Despite variations in specificity and PPV. AI shows promise for ROP screening. i-ROP DL and ResNet-152 may need demographic Though cost-effectiveness data are lacking. AI could reduce workload and improve diagnostic consistency. Conclusion: AI shows high sensitivity, but variable specificity highlights the need for The review also underscores the importance of validation across diverse populations to ensure generalizability. AI integration in clinical practice can enhance early detection, standardize diagnoses, and alleviate the burden on healthcare professionals, particularly in low-resource settings. Keywords: Artificial Intelligence. Retinopathy of Prematurity. Cite This Article: PERTIWI. Adinda Mulya. YULIA, Dian Estu. LESTARI. Yeni Dwi. The use of Artificial Intelligence for Diagnosing Retinopathy of Prematurity Ae A Systematic Review. International Journal of Retina, [S. ], v. 8, n. 2, p. 123, oct. ISSN 2614-8536. Available at: . Date accessed: 01 oct. doi: https://doi. org/10. 35479/ijretina. Published by: INAVRS https://w. org/ | International Journal of Retina https://ijretina. weeks, or babies with risk factors. In India, the INTRODUCTION Retinopathy of screening criteria for Retinopathy of Prematurity also prematurity (ROP) is a refer to the same criteria as in Indonesia. Several ROP vaso-proliferative screening programs are conducted like a multicenter disease of the retina study conducted at Harapan Kita Women and ChildrenAos Health Centre and Cipto Mangunkusumo prematurity and the Hospital. 3 Jakarta-ROP (JakROP) is one of Cipto leading cause of childhood blindness worldwide. Mangunkusumo HospitalAos flagship mobile ROP screening program in several selected vertical Retinopathy of Prematurity (ET-ROP) showed that hospital in Jakara. In a general population, only 5- 68% of premature infants with less than 1250 gram 10% of babies screened will develop visual of bodyweight will develop at least mild ROP. impairment secondary to ROP. However, there are a multicenter study in Indonesia, the incidence of all- number of challenges for this screening. Regular and stage ROP was 18% and in Cipto Mangunkusumo wide population screening is difficult especially in Correspondence to: Adinda Mulya Pertiwi. Universitas Indonesia Ae Cipto Mangunkusumo General Hospital. Jakarta. Indonesia, mulya@yahoo. Early Treatment Hospital was 4. 8% in 2014. It is estimated that more than 10% of premature infants with ROP will develop associated with inadequate training, remote area, severe visual impairment and blindness. middle-income Global and skilled staff shortages. 8 This lacked of skilled burden of disease analysis showed that in 2010, staff, especially physician that able to recognized there were estimated around 257,000 years lived and diagnosed ROP from fundus image is the core with disability due to visual impairment associated problem tackled by many in rural developing with ROP. The underlying link between prematurity regions/countries. and development of this disease is because the nasal screening is that clinical diagnosis in ROP is and temporal portions of the retina form late in subjective with high rates of interobserver variability, pregnancy, 32 and 40 weeks respectively causing and there is inconsistency to real-world treatment preterm birth infants had less developed retina. Birth body weight is also known to strongly associated with ROP. Pediatric Ophthalmology and Strabismus. American Academy of Pediatrics, and American Academy of Ophthalmology state that infants born O30 weeks gestational age or O1500 gram of body weight is a candidate for screening. 6 Screening for ROP requires bedside or telemedicine examination of fundus Screening for ROP in Indonesia is also done in some hospitals, especially those in big cities. The Indonesia recommendations from the 2014 RoP national Pokja and Premature Infant Working Group workshop. These criteria also use references from the United States. Screening is carried out on babies with a birth weight of <1500 grams or a gestational age of <34 The ROP photography for recording ROP and in telemedicine initiatives has paved the way for the adoption of Guidelines from the American Association for Another artificial intelligence in ROP management. Artificial intelligence (AI) is a machine algorithm designed to mimick human problem-solving skill. The foundation of artificial intelligence dates back in 1950 when Alan Turing in his paper AuComputing machinery and intelligenceAy. 9 Currently. AI is widely used in medicine especially in aiding identification, classification, and diagnosis of various diseases. AI is already developed to aid early diagnosis for diabetic retinopathy,10 highlighting the potential an AI for retinopathy of prematurity. Increase used of fundus ROP programs has facilitated the implementation an AI model for diagnosis. AI model has an advantage over human in ROP screening program especially because Published by: INAVRS https://w. org/ | International Journal of Retina https://ijretina. computers are not susceptible to fatigue and bias models in screening for retinopathy of prematurity that may affect assessment result, it is a low risk (ROP). Key performance indicators include the area In healthcare economics. AI has under the receiver operating characteristic curve shown to reduced overall diagnosis burden of a (AUC), sensitivity, and specificity. These metrics are critical for evaluating how well AI models can enables early detection with minimal device, and distinguish between diseased and non-diseased High sensitivity is particularly important in the Integrating wide-field imaging and automated context of ROP screening to minimize the risk of diagnosis within a teleophthalmology system offers a potential solution to these problems. This specificity reduces false positives that may lead to approach could facilitate quick screening and prioritization of infants, even in areas with limited outcomes include additional diagnostic metrics such Given the prevalence of ROP and the as inter-rater agreement . o assess consistency increasing demand for efficient screening solutions, between AI and human grader. , negative predictive this systematic review aims to update the current value (NPV), positive predictive value (PPV), and development of AI technologies for ROP diagnosis overall diagnostic accuracy. These outcomes reflect and screening, considering the appropriate AI types how AI tools might perform in real-world clinical that align with the specific needs and workloads of settings, particularly in varying disease prevalence ROP screening programs. and image quality conditions, and help determine Artificial Intelligence (AI) has emerged as a promising tool in addressing challenges of timely sight-threatening Secondary the reliability and applicability of AI-assisted screening in different healthcare contexts. and accurate diagnosis of retinopathy of prematurity This study is a systematic review study conducted (ROP), particularly in resource-limited settings where by systematically searching relevant studies through access to trained specialists may be limited. Its ability several online database which includes Cochrane, to process large volumes of retinal images rapidly MEDLINE, and ScienceDirect. The search was conducted in 20th April 2024. The PICO of this study screening coverage and reduce missed diagnoses. is defined as follows: premature infants (Patient. However, despite growing interest, current AI diagnosed by artificial intelligence (AI) models models vary significantly in design, dataset diversity, (Interventio. and validation methods. This systematic review aims to critically evaluate the latest AI models developed (Compariso. , with the outcome being diagnostic for ROP screening, highlight their diagnostic performance measured by AUC, sensitivity, and performance, and identify existing limitations in their specificity (Outcom. The keywords used in each clinical validation and generalizability. database is presented in table 1. The primary outcome of this review is the diagnostic performance of artificial intelligence (AI) Published by: INAVRS https://w. org/ | International Journal of Retina https://ijretina. Table 1. Search terms in each database Database Cochrane Entries Keywords ("artificial intelligence"):ti,ab,kw ("deep learning"):ti,ab,kw ("machine learning"):ti,ab,kw #2 ("retinopathy of prematurity"):ti,ab,kw OR (ROP):ti,ab,kw #3 ("diagnosis"):ti,ab,kw OR . :ti,ab,kw OR ("sensitivity analysis"):ti,ab,kw OR ("specificity"):ti,ab,kw OR ("area under the curve"):ti,ab,kw #4 #1 AND #2 AND #3 MEDLINE ((((("artificial intelligence"[All Field. ) OR ("machine learning"[All Field. )) OR ("ai"[All Field. )) OR ("convolutional neural network"[All Field. )) AND ((((("diagnosis"[All Field. ) OR ("prediction"[All Field. )) OR ("sensitivity"[All Field. )) OR ("area under the curve"[All Field. )) OR ("screening"[All Field. )) OR ("specificity"[All Field. ))) AND (("retinopathy of prematurity"[All Field. ) OR ("rop"[All Field. )) ScienceDirect ("Artificial intelligence" OR "Machine learning" OR "Deep learning") AND ("Retinopathy of prematurity") AND (Diagnosis OR prediction OR sensitivity OR area under the curv. This study was conducted in accordance with the Preferred Reporting Items for Systematic Reviews and Meta-Analysis (PRISMA) guidelines. This systematic review will include studies assessing the capabilities of an AI model to predict and diagnosing retinopathy of prematurity using several relevant clinical parameters. The study must include validation test using the prespecified golden standard and presented the relative capability of the algorithm in detecting retinopathy of prematurity relative to the golden standard. The inclusion criteria for this study were cross-sectional analytical diagnostic or cohort studies that compared the diagnostic capabilities of an AI model for retinopathy of prematurity against a gold standard examination, provided a clear description of both the gold standard and AI model used, and reported the primary outcomes of interest, while exclusion criteria comprised studies without full-text availability, non-English studies, studies limited to AI model generation, and publications in the form of case reports, case series, case-control studies, reviews, editorials, or commentaries. Risk of bias was assessed using the QUADAS-2 tool, which is designed to evaluate the quality of primary diagnostic accuracy studies. Each study was independently assessed across four domains: . patient selection, . index test, . reference standard, and . flow and timing. For each domain, we evaluated the risk of bias and applicability concerns using the signaling questions provided in the QUADAS-2 framework. Discrepancies between reviewers were resolved through discussion and consensus. In addition to risk of bias assessment, the Oxford Centre for Evidence-Based Medicine (CEBM) 2011 Levels of Evidence were used to classify the overall strength of the included studies (Table . 13 A critical appraisal of diagnostic accuracy was also performed using the CEBM checklist to support our interpretation of each studyAos methodological rigor. Published by: INAVRS https://w. org/ | International Journal of Retina https://ijretina. Table 2. Oxford Center of Evidence-Based Medicine 2011 Levels of Evidence. LOE Studies Systematic reviews . ith homogeneit. of RCT RCT or observational studies with dramatic effect i Non-randomized controlled cohort / follow-up studies Case series, case control, or historically controlled studies Mechanism-based reasoning LOE: level of evidence We extracted the information from each study that fulfilled the inclusion and exclusion criteria. Data regarding the authorAos name, year of publication, study design, type of artificial intelligence used, the training data used, and the outcomes RESULTS Figure 1 in this systematic review represents a PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyse. flowchart, which details the study selection process. The flowchart begins with the identification phase, where a total of 608 studies were found across several databases including Cochrane. PubMed, and ScienceDirect. A total of 560 studies were excluded during the screening phase. In the eligibility phase, the full texts of these 16 studies were examined in detail. During this process, 4 studies were excluded with two studies excluded because of its literature review design and two studies because it lacks AI model validation test. Critical Appraisal for each included studies is presented in table 4 below. Figure 1. PRISMA Flowchart Published by: INAVRS https://w. org/ | International Journal of Retina https://ijretina. Table 4. Critical appraisal of included studies Validity Study ID Brown . Applicability Was the diagnostic test evaluated in a Was the reference standard Was there an independent, blind comparison between the index test Were the methods for performing Representative spectrum of patients . ike those applied regardless of the index test and an appropriate reference (AogoldA. standard of diagnosis? the test described in sufficient detail in whom it would be used in practic. Answer Yes Yes Yes Greenwald . Yes Yes Yes Details Consensus of image-based grading by three experts Graded by an ophthalmologist using to permit replication? Yes Yes ICROP Campbell . Chen . Campbell . Cole . Coyner . Li . Bai . Liu . Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Three clinicians using Consensus diagnosis by three expert graders Determined by 34 ROP Manual diagnosis using ICROP Manual diagnosis using ICROP Consensus diagnosis from three ROP experts. five expert pediatric Determined by senior Yes Yes Yes Yes Yes Yes Yes Yes Published by: INAVRS https://w. org/ | International Journal of Retina https://ijretina. Validity Study ID Rao . Applicability Was the diagnostic test evaluated in a Was the reference standard Was there an independent, blind comparison between the index test Were the methods for performing Representative spectrum of patients . ike those applied regardless of the index test and an appropriate reference (AogoldA. standard of diagnosis? the test described in sufficient detail in whom it would be used in practic. Yes Yes Answer Yes to permit replication? Details Grading by trained ROP Yes Siegfried . Yes Yes Yes majority vote of three senior Yes pediatric ophthalmologists. The study characteristics table (Table . provides a detailed summary of the studies included in this systematic review on the use of artificial intelligence (AI) for diagnosing retinopathy of prematurity (ROP). It encompasses a wide range of study designs, geographical locations, and AI models, offering a comprehensive overview of the current state of research in this area. Overall, these studies illustrate the global effort in utilizing AI for ROP diagnosis, employing various AI models and training datasets to improve diagnostic accuracy and early detection in premature infants. The diversity in study designs, populations, and AI technologies highlights the extensive research dedicated to enhancing the screening and management of ROP through artificial intelligence. Table 5. Study Characteristics Study ID Brown . Cross-sectional USA Mean PMA/GA Age . N/A ROP type No plus, pre-plus, and Cross-sectional USA 2 A 2. Cross-sectional Model name 5511 images Retinal image i-ROP DL Type 1 & Type 2 79 without ROP Retinal images i-ROP DL India 6 A 4 No plus, pre-plus, and 4175 images from 1253 eyes Retinal images i-ROP DL 10894 images Retinal images ResNet-152 plus disease ROP Cross-sectional . Training data 2 with ROP Chen Number of samples plus disease ROP Campbell . Country Greenwald . Study Design North America North America: 26. 6 A 2. and Nepal Nepal: 32. 6 A 2. Stage 1-3 ROP Published by: INAVRS https://w. org/ | International Journal of Retina https://ijretina. Study ID Campbell Study Design Cross-sectional Country USA Mean PMA/GA Age ROP type . Unspecified . Stage 1-5 Number of samples Training data Model name Unspecified Retinal images i-ROP DL Retinal images i-ROP DL Retinal images No name Retinal images Dense Net No plus, pre-plus, and plus ROP Cole . Cross-sectional Coyner . Cross-sectional Li . Cross-sectional Nepal and Nepal: 33. 3 A 2. No plus, pre-plus, and Nepal: 391 eyes Mongolia Mongolia: 30. 4 A 2. plus ROP Mongolia: 467 eyes India. Nepal. Not treated: 33. 5 A 2. Unspecified Not treated: 3633 patients Mongolia Treated: 29. 7 A 2. China 31 A 5. Treated: 127 patients Stage 1-3 ROP Training set: 14,626 images Test set: 3680 images Comparison set: 521 images Bai . Retrospective Australia 74 A 2. ROP 8052 images Retinal images ROP. China N/A Treatment indicated 24,495 images from 1075 eyes Retinal image ResNet-18 and Cohort Liu . Retrospective Cohort ROP DenseNet121 Rao . Cross-sectional India N/A ROP 7,489 images Retinal image EfficientNet-B0 Siegfried Retrospective Less than 32 weeks old or No plus, pre-plus, and Training set: 6141 images Retinal image Bespoke and . Cohort birthweight less than 1501 plus ROP Test set: 200 images CFDL model UK: United Kingdom. PMA: post-menstrual age. GA: gestational age. ROP: retinopathy of prematurity Published by: INAVRS https://w. org/ | International Journal of Retina https://ijretina. The risk of bias assessment for the diagnostic studies included in this systematic review was conducted using the Quality Assessment of Diagnostic Accuracy Studies-2 (QUADAS-. (Table . QUADAS-2 is designed with four main domains, each evaluated for risk of bias and relevance to the research question. There are four points evaluated for risk of bias are patients selection, index test, references standard, and flow timing. To aid in assessing these aspects, each domain includes a set of signalling questions. There remained a potential risk of bias due to the following factors: . inappropriate exclusions during patient selection. not all subjects were included in the analysis. qualifications of the examiners were not specified Table 6. Risk of bias assessment of cohort studies using QUADAS-2 Proportion of studies with low, high, or Risk of bias Proportion of studies with low, high, or unclear Concerns regarding applicability Reference standard Flow and timing Index test Index test Reference standard Patient selection Patient selection Low High Low Unclear High Unclear Brown . Greenwald . Campbell images from preterm infants, two modified Retina U- . 1 and 2. , and Cole . used i-ROP Deep Nets were employed to segment blood vessels and Learning (DL) model for ROP diagnosis (Table . The demarcation lines. 25 Bai . aimed to test the earliest study that uses this model was a study by ROP. AI model in Australian population. ROP. AI was Brown . The model utilized two primary neural developed using retinal images collected from a single center in New Zealand. The algorithm employs network and a classification network. The vessel convolutional neural networks (CNN. to analyze segmentation network was designed using the U- retinal images and detect features indicative of plus Net architecture, which is highly specialized for 26 Liu . study aimed to develop an AI biomedical image segmentation. 17 The study by Chen . aimed to develop a deep learning recommending treatment modalities for retinopathy model for diagnosing ROP, specifically focusing on of prematurity (ROP). The AI system's tasks included identifying stages 1-3 in retinal images of preterm ROP identification, severe ROP identification, and Coyner . study focused on developing treatment modality identification . etinal laser and validating a deep learning-based model to photocoagulation or intravitreal injection. 27 Rao screen for ROP in infants from low- and middle- . study aimed to develop and validate an AI- Li . developed an based screening tool for detecting ROP in South automatic deep convolutional neural network Indian infants. They employed convolutional neural (DCNN)-based system for early diagnosis and networks (CNN. , specifically the EfficientNet-B0 quantitative analysis of ROP. Using 18,827 retinal architecture, to train a deep learning algorithm income countries (LMIC. Published by: INAVRS https://w. org/ | International Journal of Retina https://ijretina. capable of binary classification (ROP present vs. ROP absen. 28 Siegfried . aimed to develop and validate deep learning models for detecting plus disease in ROP. Two types of models were developed: bespoke and code-free deep learning (CFDL) models. Lastly, the CFDL model was developed using Google Cloud AutoML Vision API, which does not require advanced coding skills, making it accessible for use in low-resource settings. Table 7 provides an in-depth analysis of the diagnostic performance of various AI models used in detecting retinopathy of prematurity (ROP) across multiple studies. This detailed examination Table 7. Diagnostic performace of AI model Study ID Brown . AI model i-ROP DL Greenwald i-ROP DL . Campbell i-ROP DL . (%) Sp (%) AUC Additional metrics Inter-rater agreement 0. Plus disease Pre-plus disease Referral-requiring ROP Treatment-requiring ROP Stage 1-3 ROP Comments i-ROP DL shows high diagnostic accuracy with higher agreement than 6 out of 8 experts Severity score above 3 is highly predictive for ROP early PPV 12% . ith plus diseas. ResNet-152 (Type 1 and 2 ROP) . Chen Detection AI can be effectively integrated into telemedicine programs to enhance screening efficiency and monitor disease The study highlights the domain shift, significant drops of Nepal data set AI model performance when tested in different ResNet-152 population/different camera North America data Campbell . i-ROP DL ROP inter-rater agreement 0. The deep learning-derived vascular severity score showed Correlation coefficient 0. strong consistency with expert classifications for disease severity Cole . i-ROP DL Plus ROP This study demonstrated that the i-ROP DL algorithm Nepal data set i-ROP DL performed well across different camera systems and Plus ROP ROP Varying specificity indicates room for improvement to Mongolia data set Coyner No name . India data set reduce false positives No name Normal images Stage I ROP as the width of demarcation lines and vascular bifurcation Stage II ROP ratios provides an objective basis for diagnosis Stage i ROP Plus ROP Nepal data set No name Mongolia data set Li . Bai . Dense Net ROP. Inter-rater agreement 0. NPV 96% The system's ability to quantitatively analyze features such The relatively low specificity indicates a higher rate of false positives. Misclassifications often occurred in images with darker fundus or slight blurring. Liu . ResNet-18 and ROP Accuracy 88. The AI system outperformed experienced ophthalmologists DenseNet- Severe ROP Accuracy 84. in accuracy, especially in determining the need for Treatment modality Accuracy 86. treatment and selecting the appropriate treatment modality for ROP Rao . EfficientNet-B0 ROP PPV 81. The false negatives in the test set were mainly from Stage 1 NPV 96. and Stage 2 ROP, which are harder to detect due to subtle Ensuring high-quality images is crucial for the model's performance. Siegfried Bespoke . CFDL Healthy Inter-rater agreement 0. Pre-plus ROP among less experienced clinicians. This variability Plus ROP underscores the challenge in establishing a consistent Healthy Pre-plus ROP Plus ROP Inter-rater agreement 0. The study found high inter-observer variability, especially reference standard for training and validating AI models Published by: INAVRS https://w. org/ | International Journal of Retina https://ijretina. DISCUSSION Variability This systematic review aimed to evaluate the performance across clinical environments reflects diagnostic performance of various AI models in differences in patient populations, imaging systems, detecting ROP. Across multiple studies. AI models performance difficult to achieve. highlighting their strong potential for early detection of ROP. Prioritizing sensitivity and comparing results to the accuracy of the reference standard within the same population aligns with the standard approach in diagnostic studies. Sensitivity values of 100% reported in Brown . AA and Greenwald . AA confirm the highest sensitivity potential of AI models, consistent with earlier findings on AI use in In the broader field of eye disease. AI is mainly used for diagnosis, including in glaucoma and diabetic retinopathy. While the i-ROP DL model showed excellent sensitivity in several reports, specificity ranged The ResNet-152 model demonstrated a marked drop in sensitivity when applied to a different population, as seen in Chen . AA. Such variability indicates that although AI models are effective at identifying true positives, calibration is needed to reduce false positives. High AUC values reported in Li . a and Siegfried . AA indicate generally good discriminatory ability, with accuracy categories ranging from excellent . Ae100%) to very poor . Ae60%). Nonetheless, lower PPV in some studies, such as Campbell . AA, suggests that false-positive Furthermore, many included studies were real-world The lack of standardized performance reporting across studies also complicates direct model comparisons. Although AI is often assumed to cost-efficient large-scale particularly in developing countries, none of the This review identified clear evidence of Performance, however, varied across studies. contexts where specificity could lead to unnecessary diagnostic While AI holds promise for ROP screening, several barriers remain. Image quality is a critical factor influencing diagnostic accuracy, especially in low-resource settings where imaging equipment may be suboptimal. Training personnel to produce high-quality images before implementation is domain shift, where models trained on one dataset or population perform less accurately when applied to another. The drop in sensitivity for the ResNet-152 model in Chen . AA exemplifies this problem. Domain shift occurs when external samples differ in features from the original training dataset, leading to reduced performance. Such findings underscore the need for domain adaptation, which may involve finetuning models with local data or training on large, multi-center Differences demographic composition, disease spectrum, and generalizability, reinforcing the importance of validating AI models in the specific populations and clinical settings where they will be used. When effectively deployed. AI-assisted ROP screening offers substantial clinical benefits. High sensitivity ensures most true cases are detected, enabling timely interventions such as anti-VEGF injection, cryotherapy, or surgery. AI applications in telemedicine, demonstrated by Campbell . AA and Greenwald . AA, can improve accessibility in remote and low-resource areas. AI can also standardize grading, particularly in differentiating plus from non-plus disease, reducing inter-observer Published by: INAVRS https://w. org/ | International Journal of Retina https://ijretina. variability on treatment-requiring ROP, with 100% sensitivity Automating initial screenings can reduce ophthalmologistsAo workload, allowing specialists to focus on complex cases. However, these benefits telemedicine, despite a lower PPV of 12%. Chen depend on maintaining high image quality, ensuring . used the ResNet-152 model on datasets from Nepal and North America. For Nepal, the model achieved 98% sensitivity and 96% specificity (AUC unnecessary follow-up procedures. specificity without compromising sensitivity, thereby reducing false positives. Large, prospective, multicenter trials involving diverse demographics and multiple imaging systems will be essential for validating performance and improving robustness. Advancing domain adaptation techniques will help mitigate population and equipment differences. Integrating AI outputs with other diagnostic tools and clinical data may enable comprehensive ROP assessments, prediction of infants at highest risk for Further research should also explicitly evaluate cost-effectiveness, particularly in low- and middle-income (AUC This Standardizing performance metrics, maintaining high image quality, and providing healthcare worker training in AI-assisted workflows will be crucial to maximize clinical impact. By addressing these factors. AI can evolve from a promising diagnostic tool to an integral component of ROP management. the effectiveness of these AI models in identifying different stages and severities of ROP, emphasizing their sensitivity, specificity, area under the curve (AUC), and additional diagnostic metrics. Brown . and Greenwald . utilized the iROP Deep Learning (DL) model, achieving high sensitivity and specificity for plus and pre-plus disease ROP. Greenwald reported perfect sensitivity . %) and high specificity . %) with an AUC of classifications but did not provide specific metrics, emphasizing high inter-rater agreement. Cole . evaluated i-ROP DL in Nepal and Mongolia. Coyner . developed a model tested in India. Nepal, and Mongolia, achieving 100% sensitivity but varying specificity . 3% for India, 77. 8% for Nepal, and 8% for Mongoli. , suggesting high sensitivity but the need for improved specificity. Rao . used EfficientNet-B0, achieving 91. 5% sensitivity and 2% specificity (AUC 0. Overall, sensitivity across various studies, indicating their strong potential for early detection of ROP. However, variability in specificity and other metrics such as PPV and NPV suggests that while these models are effective in identifying true positives, there is a need for further refinement to reduce false positives. This is particularly important in clinical settings to avoid unnecessary treatments and interventions. The consistent high performance of models like i-ROP DL and ResNet across different studies and populations underscores their reliability. The integration of AI models in telemedicine and clinical workflows, as suggested by studies like Campbell . and Greenwald . , can enhance screening efficiency and improve the management of ROP. The use of AI in low-resource settings, as explored by Coyner . and Rao . , demonstrates its potential to bridge gaps in healthcare access and quality 99 for referral-requiring ROP, underscoring the model's potential for early detection. Campbell . also employed the i-ROP DL model, focusing Campbell . , continuing with i-ROP DL. Future research should aim to improve AI Published by: INAVRS https://w. org/ | International Journal of Retina https://ijretina. CONCLUSION prematurity at regional and global levels for This systematic review, conducted to evaluate the Pediatr Res. 2013 Dec. 74 Suppl 1(Suppl diagnostic performance of AI models for ROP . :35Ae49. Michaud C. Ezzati M, et al. Years lived with potential as effective early screening tools. However, disability (YLD. for 1160 sequelae of 289 diseases and injuries 1990-2010: a systematic predictive value across studies highlights the need to analysis for the Global Burden of Disease refine algorithms to reduce false positives and Study 2010. Lancet (London. Englan. improve clinical applicability. Evidence of domain Dec. :2163Ae96. shift underscores that AI models must be validated and, where necessary, adapted to the target Good W V. Final results of the Early Treatment for Retinopathy of Prematurity (ETROP) The History Artificial Irawan G. Sulistijono E, et al. Multicentre BMJ Paediatr Padhy SK. Takkar B. Chawla R. Kumar A. Indian Curr Opin Ophthalmol. 2020 Sep. :312Ae7. Blencowe H. Lawn JE. Vazquez T. Fielder A. Preterm-associated impairment and estimates of retinopathy of Gensure RH. Chiang MF. Campbell JP. Artificial intelligence for retinopathy of prematurity. Open. 5:e000761. Artificial intelligence in diabetic retinopathy: A survey of retinopathy of prematurity in Indonesia. Available Ophthalmol. 2019 Jul. :1004Ae9. Siswanto JE. Bos AF. Dijk PH. Rohsiswatmo R. Gilbert Rockwell ry-artificial-intelligence/ 102:233Ae50. Pediatr. https://sitn. edu/flash/2017/histo randomized trial. Trans Am Ophthalmol Soc. Sari Scruggs BA. Chan RVP. Kalpathy-Cramer J. Brown AC. Nwanyanwu K. Retinopathy of Prematurity. In Treasure Island (FL). Terbatas. Intelligence [Interne. Harvard University. REFERENCES