Dietary patterns and non-communicable disease risk in Indian adults: secondary analysis of Indian Migration Study data

Objective Undernutrition and non-communicable disease (NCD) are important public health issues in India, yet their relationship with dietary patterns is poorly understood. The current study identified distinct dietary patterns and their association with micronutrient undernutrition (Ca, Fe, Zn) and NCD risk factors (underweight, obesity, waist:hip ratio, hypertension, total:HDL cholesterol, diabetes). Design Data were from the cross-sectional Indian Migration Study, including semi-quantitative FFQ. Distinct dietary patterns were identified using finite mixture modelling; associations with NCD risk factors were assessed using mixed-effects logistic regression models. Setting India. Subjects Migrant factory workers, their rural-dwelling siblings and urban non-migrants. Participants (7067 adults) resided mainly in Karnataka, Andhra Pradesh, Maharashtra and Uttar Pradesh. Results Five distinct, regionally distributed, dietary patterns were identified, with rice-based patterns in the south and wheat-based patterns in the north-west. A rice-based pattern characterised by low energy consumption and dietary diversity (‘Rice & low diversity’) was consumed predominantly by adults with little formal education in rural settings, while a rice-based pattern with high fruit consumption (‘Rice & fruit’) was consumed by more educated adults in urban settings. Dietary patterns met WHO macronutrient recommendations, but some had low micronutrient contents. Dietary pattern membership was associated with several NCD risk factors. Conclusions Five distinct dietary patterns were identified, supporting sub-national assessments of the implications of dietary patterns for various health, food system or environment outcomes.

analysis methods (8) and have typically focused either on undernutrition or NCD risks but not both.
In the present study we examined dietary patterns and associated health outcomes among Indian adults based on a large multi-state survey of urban migrants and their rural-dwelling siblings. The primary aim of the study was to identify distinct dietary patterns among the study population using finite mixture modelling. The secondary aim was to examine the association of the identified dietary patterns with macro-and micronutrient intakes and five key NCD risk factors: BMI, waist:hip ratio (WHR), systolic or diastolic blood pressure, serum cholesterol (total:HDL) and fasting blood glucose.

Participants and setting
The Indian Migration Study (IMS) was a cross-sectional, sibling-pair comparison study conducted around four factories situated in northern (Lucknow), central (Nagpur, Hyderabad) and southern (Bangalore) India during 2005-2007 (9) . Factory workers and their co-resident spouses were surveyed to establish their migration status. Rural-tourban migrants, their non-migrant sibling still residing in the place of origin and a 25 % random sample of urban non-migrants were recruited to the study. Siblings were preferably of the same sex and closest in age; cousins or close friends were recruited if siblings were unavailable. A total of 7067 individuals were included in the final sample (10,11) .
Dietary intake and food composition data Dietary intake was assessed using an intervieweradministered semi-quantitative FFQ (12) . Participants reported the portion size and frequency of consumption of up to 184 meals or food items over the past year (13,14) . For fruit and vegetable items, participants were asked whether their consumption was seasonal and how much they consumed when the item was in season. To quantify average consumption over the year, this value was multiplied by the proportion of the year for which the item was in season, determined through a survey of local market vendors. The FFQ included commonly consumed dishes for which weighed recipes were generated for rural and urban areas of the four study sites. These recipe sheets were used to calculate individual intake of 201 distinct food items; these food items were aggregated into thirty-six food groups based on similarity in nutritional content (see online supplementary material, Supplemental Table 1).
The FFQ was repeated 1-2 months and 12 months after initial collection in a sub-sample of participants to check reliability, while the FFQ was validated by administering three 24 h recalls in a sub-sample of 530 participants. Reported energy consumption was on average 1711 kJ/d greater in the FFQ than the 24 h recall but the FFQ data were deemed valid for comparison between groups (12) . Data on sociodemographic factors, anthropometry and biochemical risk factors were also collected (10,15) .
The nutrient composition of the 201 distinct food items was derived from Indian food composition tables (16) and US composition tables (17) where data from India were unavailable. Average nutrient composition of the thirty-six food groups was calculated as the average composition of constituent items weighted by the mean consumption of items across the study population. The SFA and PUFA composition of meals was specific to the type of cooking oil used by each household.

Defining dietary patterns
The FFQ method estimated mean energy intake as 12 129 (SD 4192) kJ/capita per d. The FFQ method is liable to misreporting (18) and we removed individuals with extreme daily energy intake defined as mean ± 2SD (19) . Individuals consuming >20 510 kJ/d (n 272) or < 3749 kJ/d (n 20) were excluded, leaving 6775 observations in the data set.
For each individual and food group, consumption was categorised into four levels, i.e. zero consumption and tertiles of energy from that food group as a proportion of total dietary energy consumption. The thirty-six categorical variables representing proportional consumption of food groups were entered into a mixture model using latent class analysis (20) , to identify distinct patterns of food consumption. Solutions containing one to ten distinct dietary patterns were specified; we used a combination of diagnostic criteria (Bayesian information criterion, minimum proportion per class and entropy of model) (21) to select the solution that fitted the data best. Individuals were assigned to the dietary patterns based on probability.
Dietary patterns, nutrition and health Dietary supply of nutrients was calculated for each individual as the product of daily food group consumption and mean food group composition. We described the nutritional profile of each dietary pattern and compared these with WHO guidelines for macronutrients (22) and micronutrients (23) .
We constructed mixed-effects multiple regression models to investigate whether dietary patterns were associated with five key NCD risk factors: BMI, WHR, systolic blood pressure or diastolic blood pressure, serum total:HDL cholesterol and fasting blood glucose. Analytical procedures were reported previously (10,15) . We analysed binary outcomes as appropriate for South Asian populations. Thus: BMI < 18·5 kg/m 2 and BMI ≥ 25·0 kg/m 2 were classed as 'underweight' and 'obese', respectively (6,24) ; WHR > 0·9 for men and WHR > 0·8 for women were classed as detrimental to health (25) ; systolic blood pressure ≥ 140 mmHg or diastolic blood pressure ≥90 mmHg was classed as hypertensive (26) ; serum total: HDL cholesterol ≥4·5 was classed as detrimental to health (27)(28)(29) ; and fasting blood glucose ≥7·0 mmol/l indicated diabetes (30) . Participants may have taken medication or adapted dietary or lifestyle choices in response to a known condition, so those self-reporting diabetes and hypertension were included as cases.
Our models took account of the sib-pair clustered study design by including sib-pair as a family-specific random effect. A causal path diagram helped to visualise the relationships between dietary patterns and risk factors and assisted in construction of the models. Dietary pattern membership was treated as a predictor of each risk factor in turn, while adjusting for the potential confounding factors of age, sex, education level and rural/urban residency. The following factors were also considered as potential confounders for specific outcomes: energy consumption by quartile for underweight, obesity and WHR; obesity for hypertension, total:HDL cholesterol and diabetes; and current smoking status for hypertension. An age-squared term was included for underweight and obesity. Models were tested for multicollinearity using a correlation matrix of bivariate relationships between explanatory variables and the variance inflation factor for each regression model. Statistical analyses were conducted in MPlus version 7.3 (Muthén & Muthén, Los Angeles, CA, USA) for the mixture models and in Stata version 14 and R version 3.2.2 for descriptive statistics and regression models.

Participant and dietary pattern characteristics
Over half of study participants were male (57 %), while the majority were married (88 %) and identifying as Hindu (91 %; Table 1). If states are allocated to 'major regions' (31) , almost half of the respondents (49 %) were from the South, 30 % from the North and 20 % from the West. Only 2 % of respondents were from the East. The mean age of respondents was 41 (SD 10) years (range 17-76 years).
The results of the mixture modelling showed that a fivepattern solution best described the data according to model fit. The probability of correct pattern assignment was >90 % for all individuals. The five-pattern solution also appeared to describe the data well with clear distinctions in consumption of different food groups and this was apparent when visualising consumption of aggregated food groups ( Fig. 1  Region was the strongest predictor of pattern membership, with participants from the South region tending to consume rice-based patterns while participants from the North and West regions tended to consume wheat-based patterns (Table 1). Religion was also important, with a greater proportion of non-Hindus in the 'Rice & meat' pattern. Among consumers of the 'Rice & low diversity' pattern, 57 % lived in rural areas compared with 37 % of the sample population. The 'Rice & low diversity' pattern is probably the closest representation of a 'pre-nutrition transition' diet and we therefore used it as the reference pattern for the epidemiological analysis. Consumers of the mixed 'Wheat, rice & oils' pattern had a younger age distribution, i.e. mean of 30 years compared with 44 years for other patterns.

Characteristics of dietary patterns
The 'Rice & low diversity' pattern had the lowest energy supply, i.e. 9912 (SD 3180) kJ/capita per d compared with the overall mean of 12 062 (SD 3732) kJ/capita per d. Consumption of other food groups including fruits and vegetables was low ( Table 2, Fig. 1). Mean consumption of fruit in the 'Rice & fruit' pattern was 761 (SD 423) kJ/capita per d (or 186 (SD 106) g/capita per d) compared with the overall mean of 561 (SD 402) kJ/capita per d (or 149 (SD 103) g/capita per d). The mean consumption of pulses and legumes in the 'Wheat & pulses' pattern was 1619 (SD 707) kJ/capita per d compared with an overall mean of 1050 (SD 686) kJ/capita per d. Vegetable consumption was greater and fruit consumption lower than in the rice-based patterns. The mixed 'Wheat, rice & oils' pattern had the greatest mean total energy consumption, i.e. 13 991 (SD 3632) kJ/capita per d. A distinct feature of this pattern was the high consumption of both rice and wheat, with mean consumption of 2326 (SD 1079) kJ/capita per d and 4427 (SD 1607) kJ/capita per d, respectively. The mean consumption of fats and oils in the 'Wheat, rice & oils' pattern was 2703 (SD 992) kJ/capita per d compared with the overall mean of 2000 (SD 933) kJ/capita per d. Finally, meat and fish consumption was greatest in the 'Rice & meat' pattern, i.e. mean of 427 (SD 368) kJ/capita per d compared with an overall mean of 146 (SD 226) kJ/capita per d.

Dietary patterns, nutrition and health
The mean balance of macronutrients was within the WHO guidelines for all five dietary patterns (Table 2); however, the proportion of individuals outside the recommended ranges varied between patterns, as did the proportion of individuals with micronutrient intake below the Estimated Average Requirement. For example, in the 'Rice & fruit' pattern 30 % of participants derived >30 % of energy from fat and 13 % of females had inadequate Ca intake; whereas in the 'Rice & low diversity' pattern 11 % of participants derived >30 % of energy from fat and 42 % of females had inadequate Ca intake. In addition, 49 % of individuals did not meet the recommended 400 g/d consumption of fruits and vegetables, rising to 84 % among the 'Rice & low diversity' pattern. Mean levels of dietary Na and cholesterol also varied between dietary patterns ( Table 2). All dietary patterns exceeded the recommended Na intake level and the intake of saturated fat was greater than recommended for 20 % of consumers of the 'Rice & fruit' pattern.
Mean (unadjusted) levels of NCD risk factors differed between the dietary patterns ( Table 3) and suggested that despite its high total energy content, consumers of the mixed 'Wheat, rice & oils' pattern had the most favourable health profile. However, consumers of this pattern were typically younger and associations were explored further in mixed-effects logistic regression models (Table 4). In fully adjusted models, compared with the reference 'Rice & low diversity' dietary pattern, consumers of the 'Wheat, rice & oils' pattern had the greatest odds of being underweight (OR = 3·48; 95 % CI 2·46, 4·92) and the lowest odds of being obese (OR = 0·43; 95 % CI 0·33, 0·56). Consumers of the 'Wheat & pulses' pattern had raised odds of a high WHR (OR = 1·23; 95 % CI 1·01, 1·51).

Principal findings
The present study aimed to define distinct, typical dietary patterns among adults participating in the IMS and investigate associations with NCD risk factors and micronutrient intakes. Using finite mixture models, we identified five distinct dietary patterns that represented different subpopulations and had diverse nutrient content. Our analysis identified three rice-based patterns, one wheat-based pattern and one pattern that contained both rice and wheat as staple foods. The patterns reflect well-established culturally and geographically relevant dietary preferences in India (i.e. rice-based diets in the South; wheat-based dies in the North and West) but also credible socioeconomic differences in dietary habits such as a lowdiversity rice-based diet consumed by poorer, rural adults and a mixed rice and wheat diet with oils consumed by younger, urban adults. In some cases, the dietary patterns identified were significantly associated with several NCD risk factors after controlling for sociodemographic variables, demonstrating the utility of dietary pattern analysis in identifying diet-related health risks. Potential mechanisms through which dietary patterns influenced NCD risk factors can be proposed, although further studies are required to validate these. For example, consumers of the 'Wheat & pulses' pattern may have had lower odds of hypertension due to greater consumption of pulses (32) , but greater odds of diabetes due to greater consumption of sugar (33) . Consumers of the 'Wheat & pulses' pattern had raised odds of a high WHR but lower odds of obesity. This may be an important finding considering the particular risk factors associated with diabetes among Asian Indians (34) , and consumers of this pattern had greater odds of diabetes, although this was not significant at the 95 % level. Similarly, consumers of the 'Rice & meat' pattern had greater odds of a high WHR, which was significant at the 95 % level if energy intake was not controlled for (see online supplementary material, Supplemental Table 3), and greater odds of diabetes, although this was not significant at the 95 % level. Greater dietary diversity including fruit and vegetable consumption may have reduced risks of hypertension among consumers of the 'Rice & fruit', 'Wheat & pulses' and 'Wheat, rice & oils' patterns (35) .
Consumers of the 'Wheat, rice & oils' pattern had the greatest odds of being underweight and the lowest odds of obesity, despite having the greatest energy consumption    -28  13  16  13  3  15  97  63  18  3  52  30  47  30  All  -24  13  14  19  2  4  96  49  12  3  62  24  45  37 %E, percentage of total energy; CHO, carbohydrate; F&V, fruit and vegetables. *For the proportion of individuals missing the target intake, targets were set as the maximum of range for fat, SFA and CHO; minimum of range for PUFA and protein; and on an individual level on the basis of age and sex for mineral micronutrients.
(although this was controlled for in the analyses). This apparently anomalous result may be due to the distinctively younger age profile of consumers of this pattern, perhaps indicating that the models were under-adjusted for age or physical activity levels. There was also evidence of an interaction effect between age, BMI and dietary pattern as indicated by the different trajectories of BMI v. age by dietary pattern (see online supplementary material, Supplemental Fig. 1). Energy intake was controlled for to distinguish the effects of dietary patterns on the outcomes underweight, obesity and WHR, independent of energy intakes. However, energy intakes are connected to body size and physical activity level, so this may be an over-adjustment. As a sensitivity analysis, we re-ran the regression analyses without controlling for energy intakes. For all dietary patterns in comparison to the reference, the odds ratios were similar but slightly smaller for underweight, and similar but slightly larger for obesity and WHR (Supplemental Table 3). For consumers of the 'Rice & meat' pattern, the lower bound of the 95 % CI of WHR was now >1.
Longitudinal data would help establish whether the association between age and BMI was a life course, period or cohort effect.

Study limitations
There are two primary limitations in our work. First, the IMS was designed to study the influence of rural-to-urban migration on the health of adults and the population is not nationally representative, e.g. 37 % of the IMS population living in rural areas compared with 70 % in the whole of India in 2006 (36) . Furthermore, participants were recruited from four factory settings which led to geographic clustering, e.g. <2 % of participants were drawn from the East region. While this limits the ability to interpret our findings at a national level, we have met our main aims to identify typical dietary patterns and their association with health within this large data set. In addition, the IMS data were collected in 2005-2007 and dietary patterns may have changed subsequently.
The second main limitation relates to the reliability of FFQ data. Estimated energy intakes were 19 % greater in the IMS FFQ compared with 24 h recalls (12) and misreporting may have affected some food items more than others. Notably, the mean of reported energy consumption in the 'Wheat, rice & oils' pattern was 13 991 kJ/capita per d, which is greater than the upper 2·5th percentile of energy consumption reported via dietary recall in the UK (37) . Thus, the proportion of individuals with inadequate intakes of micronutrients may be underestimated and intakes of fat and other nutrients overestimated. However, 24 h recalls are also susceptible to misreporting, particularly underreporting (38) . Furthermore, our methods to identify dietary patterns relied on relative consumption patterns rather than on absolute consumption quantities and this may have reduced the errors associated with misreporting in dietary surveys.
Other limitations were as follows. First, food composition data were derived from a study published in 1971. The accuracy and relevance of these data could be improved through spatially refined analysis of modern crop varieties using the latest analytical procedures (39,40) . Second, NCD risk factors were measured at one point in time only; repeat measures would be preferable. Third, participants may have adapted their dietary choices for a known condition, raising the possibility of reverse causality. Fourth, there was potential residual confounding in the regression model, e.g. due to the binary categorisation of some confounding variables. Fifth, the number of individuals with diabetes was relatively small and results of the regression models should be interpreted with caution. Sixth, physical activity was not controlled for, partly because this was not reliably captured in the IMS and partly because it is likely to be strongly correlated with age, rural/urban residency, education level (and therefore occupation) and energy consumption; however, this might have led to under-adjustment for physical activity. Finally, we did not control for alcohol consumption which is a known risk factor for obesity, hypertension and diabetes. The quantity and frequency of alcohol consumption are important but alcohol consumption was recorded in the IMS only as 'never', 'current (consumed within last 6 months)' and 'previous (stopped >6 months ago)'. In addition, alcohol consumption was associated with 'upstream' variables included in the model such as age, sex WHR, waist:hip ratio.
and education level as well as dietary pattern membership, and was therefore excluded from the analysis.

Comparison with other studies
Most dietary pattern analyses for India have used principal component analysis, finding two to six distinct dietary patterns in various large data sets (8) . The majority of patterns were defined by vegetarian food groups, e.g. Satija et al. (41) identified three dietary patterns in the IMS data with one 'animal food' pattern. The importance of fish in defining dietary patterns in eastern India has been reported in previous studies (8) , yet the East region of India was under-represented in the IMS. Mixture modelling such as LCA provides a number of benefits over principal component analysis, namely the ability to calculate the mean consumption of each food group in each pattern and to allocate each individual to a single dietary pattern based on probability (42) . Thus, we were able to quantify the nutritional content of typical dietary patterns and compare with WHO dietary guidelines. A further strength of the current study is the quantification of aspects relating to both undernutrition and diet-related NCD. This approach was taken specifically to reflect the existence of the double burden of malnutrition in the Indian context. Similar to previous studies, we found evidence of an association between dietary pattern and body size (41,(43)(44)(45) , including prevalence of underweight and obesity. Unlike previous studies (46) , we found no evidence that vegetarian diets incur a lower risk of diabetes.

Policy relevance and research needs
Although the present study is cross-sectional, comparing patterns may provide an insight into the dietary changes. For example, the 'Rice & low diversity' and 'Rice & fruit' patterns represent predominantly rural and urban populations, respectively, from the South region. Moving from the rural to the urban pattern there is an increase in overall energy and salt consumption, a decline in the proportion of dietary energy derived from rice and an increase in the proportion of dietary energy from fats and oils, fruit, pulses and legumes. Consumers of the urban pattern had greater odds of obesity, but lower odds of underweight, hypertension or dietary Ca, Fe and Zn deficiencies (Tables 2 and 4). Thus, there are likely to be beneficial and detrimental impacts on health due to dietary changes in India. An improved understanding of the links between dietary patterns and health may help to guide policies such as public education about diets and nutrition, strategies to improve access to healthy foods and investments in health care in preparation for changing disease profiles in the population. The results of the current study are indicative of the links between typical dietary patterns and health in India, but complementary studies are required including with more recent and nationally representative dietary data. The reference diet is the 'Rice & low diversity' pattern. All models controlled for age, sex, rural/urban residency and education level of participants and included sib-pair as a family-specific random effect. Additional potential confounders included in the models are listed in the footnotes below.
** ≥ 95 % confidence that the OR is not equal to 1.
Previous studies have reported that India is undergoing a nutrition transition (4) . In many other countries where nutrition transitions have occurred or are underway, consumption of meat has increased greatly, e.g. from 68 to 169 g/capita per d between 1990 and 2013 in China (47) . Mean meat and fish consumption was low in the present study, i.e. 27 (SD 42) g/capita per d, but was greater in the 'Rice & meat' pattern, i.e. 78 (SD 69) g/capita per d. Compared with the overall population, consumers of the 'Rice & meat' pattern had a similar profile of occupations and were more likely to be living in rural areas. Thus, whereas meat consumption rose steeply with urbanisation and greater incomes in China, the association in India is likely to be mediated by cultural factors, in particular the preference for vegetarian or lacto-ovo-vegetarian diets among many Hindus.
Determining associations between dietary patterns and risks of disease may help in the development or targeting of public nutrition and health strategies. Thus, future work could apply the methodology to other large sample populations in India including those with more recent dietary data.