In a recent study evaluating the multimodal capabilities of ChatGPT-4o, researchers found that artificial intelligence (AI) may assist with estimating carbohydrates in simple foods but remains unreliable when assessing more complex meals. The findings, published in the Journal of Diabetes Science and Technology, raise important considerations as adolescents with type 1 diabetes (T1D) increasingly turn to smartphone-based tools to assist with disease management.
"The use of electronics and apps among children and teenagers has increased significantly, making young people more susceptible to using and adapting to emerging technologies, such as generative," wrote Asta Risak Johansen, of Aalborg University, Gistrup, Denmark, and colleagues.
Accurate carbohydrate counting is considered the gold standard approach for determining mealtime insulin doses. Although several mobile applications have been developed to assist patients with these calculations, users must still measure food portions manually. For adolescents, this process can be difficult to maintain consistently, particularly in social settings where weighing food is impractical. These challenges have prompted interest in digital tools capable of estimating nutritional content from images of meals captured with smartphones.
ChatGPT-4o can identify foods in images and combine this information with nutritional data to estimate macronutrient composition. Researchers therefore sought to evaluate whether this technology could accurately estimate carbohydrate content in meal images and potentially serve as a supportive tool for adolescents managing T1D.
In the study, investigators compared carbohydrate estimates generated by ChatGPT-4o with manual carbohydrate calculations considered the reference standard. The analysis included 60 fruits and vegetables and 60 composite meals representing Nordic cuisine. Researchers captured images of each food item using smartphone cameras under standardized lighting conditions and with a uniform background. For composite meals, each ingredient was weighed individually using digital scales, and carbohydrate values were calculated using nutrition labels or the Danish National Food Institute’s FRIDA food composition database.
Each food item was photographed and uploaded to ChatGPT-4o using a standardized prompt instructing the model to estimate carbohydrate content. Images were evaluated both with and without a size reference card comparable to a credit card to determine whether including a scale would improve accuracy. In total, 240 images were analyzed.
Researchers evaluated model performance using several statistical measures and percentage of agreement with the manual reference value within a tolerance of ±10 grams of carbohydrates. This threshold was selected because it approximates the smallest insulin dose adjustment typically used in injection therapy.
The results showed that ChatGPT-4o performed relatively well when estimating carbohydrate content for fruits and vegetables. The model achieved an agreement of 93.3% when no size reference was included and 95% when a reference object was present in the image. Mean absolute error was 4.8 grams without a size reference and 3.3 grams when a reference was provided. Bland–Altman analysis demonstrated relatively narrow limits of agreement, suggesting that the estimates generally fell within a clinically acceptable range for insulin dosing.
In contrast, the model performed substantially worse when evaluating composite meals containing multiple ingredients. Agreement with manual carbohydrate counting occurred in fewer than half of cases. The percentage of agreement was 46.7% without a size reference and decreased slightly to 43.3% when a reference card was included. Mean absolute error increased to 13.9 grams without a reference and 17.5 grams with a reference. Root mean squared error also increased substantially, reaching 26.3 grams in some analyses.
The Bland–Altman analysis revealed wide limits of agreement for composite meals, indicating large variability in the model’s estimates. In several instances, carbohydrate estimates differed from manual calculations by more than 40 grams. One extreme example involved a meal consisting of zucchini patties with tzatziki and potato slices. Manual carbohydrate counting determined the meal contained 132.5 grams of carbohydrates, whereas ChatGPT-4o estimated only 52 grams.
Researchers observed systematic patterns in the model’s estimation errors. When carbohydrate content was relatively low, ChatGPT-4o tended to underestimate values. Conversely, when meals contained moderate carbohydrate levels between approximately 20 and 40 grams, the model often overestimated carbohydrate content. The addition of a size reference object produced only modest improvements and did not meaningfully improve accuracy for composite meals.
The investigators suggest that the inaccuracies may be explained by limitations in visual recognition when ingredients are mixed together or not clearly distinguishable. Dishes such as stews, patties, or ravioli present particular challenges because individual components cannot easily be identified in a single image. Furthermore, the model relies on general knowledge of food composition rather than a dedicated nutritional database optimized for carbohydrate counting.
Despite these limitations, the findings highlight potential applications for AI-assisted dietary estimation. For simple foods such as fruits and vegetables, carbohydrate estimates generated by ChatGPT-4o were often close enough to manual calculations to avoid clinically significant insulin dosing errors. However, the variability observed with mixed meals suggests that reliance on such tools could increase the risk of both hypoglycemia and postprandial hyperglycemia.
The authors noted that future improvements may be possible through model fine-tuning using large food image datasets, optimization of prompt design, and the use of multiple images captured from different angles to improve volume estimation. Additional studies examining diverse cuisines and real-world patient use are also needed.
"The findings of this study indicate that individuals living with T1D should use ChatGPT-4o to estimate carbohydrates with caution," concluded investigators.
The researchers disclosed having no financial conflicts of interest.