top of page

A Comprehensive Evaluation of Data Quality in Nutrient Databases


Zhaoping Li

Shavawn Forester

Emily Jennings-Dobbs

David Heber

Publication date

May 2023


Advances in Nutrition


Nutrient databases are a critical component of nutrition science and the basis of exciting new research in precision nutrition (PN). To identify the most critical components needed for improvement of nutrient databases, food composition data were analyzed for quality, with completeness being the most important measure, and for FAIRness, how well the data conformed with the data science criteria of findable, accessible, interoperable, and reusable (FAIR). Databases were judged complete if they provided data for all 15 nutrition fact panel (NFP) nutrient measures and all 40 National Academies of Sciences, Engineering, and Medicine (NASEM) essential nutrient measures for each food listed. Using the gold standard the USDA standard reference (SR) Legacy database as surrogate, it was found that SR Legacy data were not complete for either NFP or NASEM nutrient measures. In addition, phytonutrient measures in the 4 USDA Special Interest Databases were incomplete. To evaluate data FAIRness, a set of 175 food and nutrient data sources were collected from worldwide. Many opportunities were identified for improving data FAIRness, including creating persistent URLs, prioritizing usable data storage formats, providing Globally Unique Identifiers for all foods and nutrients, and implementing citation standards. This review demonstrates that despite important contributions from the USDA and others, food and nutrient databases in their current forms do not yet provide truly comprehensive food composition data. We propose that to enhance the quality and usage of food and nutrient composition data for research scientists and those fashioning various PN tools, the field of nutrition science must step out of its historical comfort zone and improve the foundational nutrient databases used in research by incorporating data science principles, the most central being data quality and data FAIRness.

bottom of page