Abstract:
Objective To explore the application value of three mainstream large language models in the diagnosis, differential diagnosis, and treatment decision support of the primary diseases related to pediatric liver transplantation.
Methods Seventy-nine cases of pediatric liver transplantation-related diseases diagnosed through pathological or clinical follow-up data were collected from Renji Hospital, Shanghai Jiao Tong University School of Medicine or published high-quality case reports. These cases covered 25 types of primary diseases such as cholestatic liver disease, metabolic diseases, and tumors. Standardized prompts were used to input the case information into the DeepSeek-R1, ChatGPT-4o and Grok-3 models, and the accuracy of their preliminary diagnosis and differential diagnosis based on basic clinical data was evaluated. The final diagnosis accuracy and the response time after supplementary examination were also assessed, as well as the completeness and rationality of their analysis of disease treatment principles.
Results In the initial diagnosis and differential diagnosis stage, the comprehensive accuracy of DeepSeek-R1 was the highest 72.1%, 95% confidence interval (CI) 61.4% - 80.8%, and there was a statistically significant difference in the comprehensive accuracy of the three models for initial diagnosis (P = 0.008). After adding further examination information, the final diagnosis accuracy of the three models increased, with DeepSeek-R1 at 88.6% (95% CI 79.7% - 93.9%), ChatGPT-4o at 87.3% (95% CI 78.2% - 93.0%), and Grok-3 at 78.5% (95% CI 68.2% - 86.1%). There was no statistically significant difference among the three models (P = 0.05). The scores given by experts for the treatment principles showed good consistency (Kappa = 0.769). In addition, the response time of ChatGPT-4o is shorter than that of the other two models (24 ± 7) s.
Conclusions Large language models demonstrate good efficacy in the diagnosis and treatment decision-making process of various pediatric liver diseases, have a good application prospect for auxiliary diagnosis and decision support, and are expected to help improve the accuracy and efficiency of clinical diagnosis and treatment of pediatric liver transplantation-related primary diseases.