The Life of a GDD Survey

The life of a survey

As we near our next model release later this year, we're finalizing all our individual-level dietary intake data – an extremely time-consuming task. Here's a look into the multiple steps we take to ensure our inputs are of the highest possible quality.

Step 1: Data Collection

We conduct systematic searches and communicate with global experts to identify individual-level dietary data around the world. Surveys were systematically screened and included if they had one or more of the 57 GDD dietary factors. We prioritized nationally-representative surveys, followed by subnational surveys and, if necessary, community-level surveys if no other datasets were identified. Depending on their availability, we either directly download the data or invite the data owner to contribute them. The impact and success of GDD heavily rely on this step, as a majority of our data are privately held. 

Step 2: Preliminary Checks

Data quality and characteristics are verified and recorded, and are eventually used to inform our model. Datasets are manually evaluated to extract any relevant dietary variables, including individual foods, nutrients, and food groups. For each dietary variable, definitions and units were confirmed based on survey documentation or direct contact with Corresponding Members, then converted to standardized definitions and units. Missing or unclear information was resolved through correspondence with Corresponding Members or, for public surveys, using published documentation and, if necessary, direct contacts with survey directors or other survey personnel.

Step 3: Coding and Standardizations

This step can be extremely time-consuming, as the format in which data are shared is often extremely variable between surveys. Once all relevant data are identified, GDD team members code and standardize these data using predetermined GDD coding schemes. This step allows us to accurately compare data from around the world, despite having often been initially gathered in different languages or using different instruments. Quality checks by other team members are then conducted to ensure all data are properly captured and classified. 

It's during this step that the assessment method is accounted for when estimating individual-level intakes; surveys based on multiple days of recall are averaged per individual, frequencies of intake from food-frequency questionnaires are converted into daily portion sizes, and data from household-level surveys are converted to individual-level intakes.

Step 4: Aggregation

Survey microdata is aggregated by our characteristics of interest: age, sex, residence, education level, and pregnancy status (for women). Means and standard deviations are then calculated for each new level of aggregation. In addition to absolute intakes, dietary intakes are also energy-adjusted to maximize the comparability of results across these different groups.

Step 5: Plausibility Checks

Individual data and group-level means are then subject to our plausibility guidelines to identify and rule out any impossible outliers. These plausibility guidelines are specific to the group or individual level and are specific to each of our 57 dietary factors, based on dietary reference intakes, tolerable upper limits, toxicity ranges for foods and nutrients, and existing regional data on average usual intakes in populations globally. Implausible values are then further reviewed with data-owners or survey directors prior to final decisions on inclusion or exclusion.

Step 6: Model Incorporation

Once all data points are checked, aggregated, and verified, they are incorporated into our final dataset to be used as input for the model.