Know Your Heart: Rationale, design and conduct of a cross-sectional study of cardiovascular structure, function and risk factors in 4500 men and women aged 35-69 years from two Russian cities, 2015-18

Russia has one of the highest rates of cardiovascular disease in the world. The International Project on Cardiovascular Disease in Russia (IPCDR) was set up to understand the reasons for this. A substantial component of this study was the Know Your Heart Study devoted to characterising the nature and causes of cardiovascular disease in Russia by conducting large cross-sectional surveys in two Russian cities Novosibirsk and Arkhangelsk. The study population was 4542 men and women aged 35-69 years recruited from the general population. Fieldwork took place between 2015-18. There were two study components: 1) a baseline interview to collect information on socio-demographic characteristics and cardiovascular risk factors, usually conducted at home, and 2) a comprehensive health check at a primary care clinic which included detailed examination of the cardiovascular system. In this paper we describe in detail the rationale for, design and conduct of these studies.


Introduction
Russia has one of the highest rates of mortality from cardiovascular disease (CVD) in the world (see non-communicable disease mortality data from the World Health Organisation (WHO)), despite an ongoing pattern of decline that began in 2005. In 2015 the CVD mortality rate was four times higher in Russia than in England and Wales or Norway (see Human Cause-of-Death Database and WHO mortality database). These exceptional CVD mortality rates are an important reason for the lower life expectancy in Russia compared to other industrial countries (70.9 years in 2014; see The Demographic Yearbook of Russia 2015).
CVD mortality in Russia has a number of specific features that pose a challenge to our understanding. In most countries, the risk of death from CVD correlates well with levels of established risk factors such as smoking, serum cholesterol, blood pressure and obesity 1 . However in Russia, while some of the risk of CVD death is explained by conventional risk factors such as smoking (in men) and a high prevalence of uncontrolled hypertension, some aspects of the cardiovascular risk profile of the population do not appear to be high risk 1,2 . Lipid profiles appear to be particularly surprising. Previous studies dating from 1975-2000 have tended to find relatively low risk lipid profiles in Russia compared to Western countries, with unexceptional low density lipoprotein (LDL) cholesterol, higher levels of high density lipoprotein (HDL) 3 cholesterol and more favourable ratios of ApoB/A1 4 or HDL/total cholesterol 2,5 .
One specific and highly distinctive feature of CVD mortality in Russia, that it shares with several other countries that were previously part of the Soviet Union, is that it has shown dramatic fluctuations over the past 30 years. Remarkably, these fluctuations parallel those from rates of mortality from acute alcohol poisoning 6 . This suggests that hazardous alcohol drinking in Russia over this period has been one of the main drivers of fluctuations in CVD deaths 7 . However the mechanisms underlying this association have not been identified and contrast with the dominant literature on alcohol and CVD that has in the past tended to be preoccupied with the apparent cardio-protective effects of moderate drinking 8 .
The International Project on Cardiovascular Disease in Russia (IPCDR) was set up to throw new light on the high rates of premature mortality from cardiovascular disease in Russia. The project has four separate but inter-related themes. These are: 1) investigating the extent to which the differences between Russia and other countries in CVD mortality rates may be biased because of differences in the way in which deaths are certified and coded; 2) generating improved overviews of trends and differences on CVD mortality and established risk factors in Russia by bringing together and synthesising already collected data; 3) examining the potential role of the health-care system and treatment in contributing to the trends in CVD rates within Russia and to differences with other countries; 4) characterising the nature and causes of cardiovascular disease in Russia by conducting large cross-sectional surveys in two Russian cities Novosibirsk and Arkhangelsk. This paper describes in detail the rationale, objectives, design and conduct of these cross-sectional studies that are collectively known in Russia as "Узнай своё сердце" (Know Your Heart).

Rationale
To help uncover the nature and causes of the higher CVD mortality in Russia today compared to other countries, it is desirable to be able to compare the cardiovascular health of a random cross-sectional sample of the Russian population with data from an equivalent sample from a country with much lower CVD mortality (such as Norway). In this context, cardiovascular health refers to objectively measured aspects of the structure and function of the cardiovascular system (such as echocardiography derived indices), blood and urine derived biomarkers and behavioural risk factors. This detailed information may be thought of as the cardiovascular phenotypic profile of a population. The assumption underlying this approach is that the future CVD event rates in the surveyed populations in Russia will be appreciably higher than the event rates found in the population surveyed in the lower mortality country. If this is true, then these future differences should be prefigured in differences in the cardiovascular phenotypic profile. Identifying the principle differences in the phenotype between Russia and a lower mortality country will throw light on the drivers of these differences. Aside from the international comparisons, information on the cardiovascular phenotype of a sample of the Russian population today will also be valuable for understanding the distribution and determinants of CVD within Russia, including socio-economic differences, use of health systems, treatment and the potential role of particular risk factors including alcohol.

Amendments from Version 2
In response to the third reviewer's comments we have made the following changes: • Map showing the location of the cities has been added ( Figure 2).
• We have added cIMT and plaque as areas of special interest alongside left ventricular ejection fraction.
• Discussion on the possibilities for follow up has been added to the discussion section.

Objectives
The objectives of the cross-sectional studies conducted as part of the fourth component of the IPCDR study were as follows: 1) To characterise the CVD phenotypes of the Russian population samples, including in depth objective measures of cardiac and vascular structure and function, laboratory-derived biomarkers from biological samples and behaviours including risk factors as well as health service use; 2) Determine the extent to which the CVD phenotypes in two Russian cities, Arkhangelsk and Novosibirsk differ from those seen in other countries, and to identify whether any such differences may plausibly explain the excess of CVD mortality seen in Russia. In particular comparisons will be made with the 7th wave of the Tromsø Study in Norway conducted in 2015-16. Key aspects of the protocol of the medical examination were aligned in order to be able to make direct comparisons. These comparisons are being taken forward under the "Heart to Heart" initiative established jointly with UiT, The Arctic University of Norway.
3) Investigate the associations of these CVD phenotypes with socio-demographic factors, health behaviours including alcohol use and known cardiovascular risk factors within Russia in order to improve understanding of the determinants of these phenotypes; 4) Undertake exploratory studies of the association of gut microbiota with behaviours (especially heavy drinking) and the CVD phenotypes.
The key associations and comparisons of interest are shown in Figure 1. Examples of the types of data collected on cardiovascular phenotype are shown in Table 1.

Sample size calculation
The original target sample size was determined based on both the power needed to make comparisons with other population based studies and to investigate associations of interest within the Know Your Heart Study. For example, if we wished to compare the prevalence of a binary ECHO phenotype between Know Your Heart (N=4500) with a smaller study with data available on this phenotype for N=1500 (e.g. the UK 1946 National Birth Cohort study) we would have 80% power to detect an odds ratio of 1.4 significant at an alpha of 0.01 assuming a prevalence in the smaller study of 10%. Comparisons with the larger Tromsø 7 study will be even more powerful. Within the Know Your Heart Study we estimate that we will have 80% power to detect an OR of 1.6 or more between the top and bottom 20% of a continuous exposure variable (e.g. levels of a particular lipoprotein entity) and a binary CVD phenotype with a prevalence of 10% in the lowest group, that is significant with an alpha of 0.01. We are aware that applying many statistical tests can lead to false-positive correlations / associations, and we propose to apply stringent significance cut-offs (less than the nominal 0.05) to be determined through data simulation (e.g. permutation), complemented by a false discovery  Target population and study setting We undertook identical cross-sectional studies of clinical and life style factors in two Russian cities (Arkhangelsk and Novosibirsk) in the period 2015-18 with a target sample size of 4500 adults. These cities were chosen as they had a previous track-record of conducting large population-based epidemiological surveys and thus could be expected to conduct complex research to a high standard 2,9-12 . The target population was men and women aged 35-69 years. This is the age group in which in relative terms mortality from cardiovascular disease and many other conditions is much higher than in Western countries.
The location of the cities is shown in Figure 2. The city of Novosibirsk, in Western Siberia, has a population of more than 1,500,000 people and is the third largest city in Russia, after Moscow and Saint Petersburg. Arkhangelsk, located in the North of European Russia, is a smaller city with a population of about 350,000 people. Levels of cardiovascular mortality vary across Russia. In the period 2012-16, mortality from total circulatory disease at ages 35-69 years among the urban population of Novosibirsk region was slightly lower than the national average, while in the urban population of Arkhangelsk region it was above the national average (Table 2). Mortality from ischaemic heart disease was above the national average in both cities. Mortality rates from total circulatory disease and ischaemic heart were considerably higher in Russia and in both of the Russian cities compared to Tromsø and Norway overall.
The age and education distribution of the populations of Novosibirsk and Arkhangelsk compared to the total Russian urban population, according to 2010 census data, are shown in Figure 3 and Figure 4. The age distribution was similar to the national average in both cities but the proportion of people with higher education was higher in Novosibirsk.

Study design
The study had two components: 1) a baseline interview to collect information on socio-demographic characteristics and cardiovascular risk factors, usually conducted at home, and 2) a subsequent comprehensive health check at a primary care clinic (polyclinic) which included examination of the cardiovascular system. An overview of the study design is shown in Figure 5.

Recruitment of Participants from the general population.
Within each city four districts were selected for the recruitment of participants. In Arkhangelsk these were Lomonosovsky, Maymaksansky, Mayskaya Gorka and Oktyabrsky. In Novosibirsk these were Dzerzinsky, Kirovsky, Leninsky and Oktyabrsky. The districts were selected purposefully (not through random sampling) to represent a range of socio-demographic and mortality levels in each city. A sampling frame of people within each district using information on age and sex of occupants at individual addresses was provided by the regional health insurance funds. Because of data protection regulations, the study team was not provided with individual names. From the sampling frames we selected at random addresses to visit stratified by age (in 5-year bands), sex and district. The aim was to recruit equal numbers of participants in each sex and 5-year age group in the city as a whole.
Participants were recruited to the study by home visits carried out by trained and experienced interviewers from a local commercial survey company. They attempted to identify a person of the correct age and sex who, according to the sampling frame, should be living at the selected addresses. If the participant was not available on the first visit addresses were visited a minimum of two more times at varying times of day and at weekends. At the end of a successful interview participants were invited to attend the health check at a polyclinic and if they agreed an appointment was made for them straight away using an online calendar.
To maximize the probability of participants agreeing to take part in the study information campaigns were conducted in both cities. The campaigns were implemented on the assumption ** ICD 10 codes I00-I99 (Diseases of the circulatory system) ***ICD 10 codes I20-I25 (Ischemic Heart Diseases)  that if people had previously heard that the study was legitimate and important through the media they would be more likely to participate. Special consideration was given to the name of the study used for participants "Know your heart" (in Russian "Узнай своё сердце"), the study logo and the visual style of study materials. We used focus groups with the general public to guide the final design. The information campaign included production of two short films about the study (one for each city) which were shown regularly on TV throughout the period of the study (Supplementary File S1). In addition, news items about the study progress, and the experience of participants who had taken part in the study were periodically disseminated on TV, radio and in print media. Large bill board advertisements about the study were also placed on rotation throughout the city at bus stops, super markets and areas where advertisements were concentrated in the city (Supplementary File S2). These activities were more intensive and consistent in Arkhangelsk than in Novosibirsk. Recruitment of participants from the general population started in November 2015 and finished in December 2017. Recruitment paused at Christmas and over the summer (July and August) in keeping with Russian holidays when participants were likely to leave the city.

Recruitment of participants receiving treatment for alcohol problems.
Given the potential importance of hazardous alcohol use as a risk factor for cardiovascular disease in Russia, an additional 275 participants aged 35-69 years with a primary diagnosis of alcohol problems were recruited from Arkhangelsk Regional Psychiatric hospital. Where possible, participants were recruited from the same four districts of the city as the general population sample. By using a clinical facility as a source of participants we were aware that we would be recruiting a highly selected group of heavy drinkers. However, our aim was to be able to characterize the cardiovascular phenotype in a group of heavy drinkers per se.
Participants were recruited by clinical staff at the hospital at least one week after admission in order to ensure that the acute detoxification stage of treatment was complete and they were not suffering from alcohol withdrawal. The same interviewers involved in the general population study visited participants at the hospital and administered a shortened version of the baseline questionnaire with some supplementary questions on alcohol use included to obtain more detailed characterization of drinking behaviour in this sub-group (Supplementary File 3). The day after their interview, participants were provided with free transport to attend the health check. The health check took place in the same polyclinic as for the general population survey, but to avoid placing an excessive burden on the participants, the health check itself was shortened by dropping a few of the more onerous aspects of the examination: pulse wave velocity, physical function tests, spirometry, and use of the Actiheart devices to measure physical activity continuously over a period of days.
Recruitment of and examination of participants for this sub-study began in January 2017 and ended in October 2017.

Repeatability study.
In each city approximately 200 participants from the general population sample (397 participants in total) were re-interviewed and had a repeated health check one year after their initial health check. The main aim was to estimate correction factors that can be used to correct for measurement error during the analysis stage, specifically Page 10 of 29 when regressing an outcome on a single continuous predictor variable that is measured with error (i.e. to correct for regression dilution bias). The time period of one year was chosen to minimize the effects of seasonal variation on within-person variability. A secondary aim of the repeatability study was to investigate reproducibility of those characteristics that by definition should not change, such as educational level, whether an ever smoker and drinker, and so on.

Fieldwork outcomes
General population sample. The study recruited 5089 participants for the baseline interview of whom 4542 participants went on to attend a health check. Of these 4542 participants, 2381 were from Arkhangelsk (41.5% male) and 2161 were from Novosibirsk (42.0% male). The median age of participants from Arkhangelsk was 54 years (IQR 45-62) and from Novosibirsk 56 years (IQR 47-64) with a higher percentage of participants in the older age categories in Novosibirsk than Arkhangelsk.
Response percentages were calculated from individual level data on the outcome of every visit made to each address. A list of the codes used to classify the outcome of the visits is provided in Supplementary Table S1 (Supplementary File 4). Three types of response percentages were defined based on the denominator used: Type 1: The denominator was all households in the sampling frame where an attempt was made to contact a participant. This is the most conservative estimation of response percentage.
Type 2: The denominator excluded addresses which were found to be invalid or where no participant of the correct age or sex was living. These exclusions are largely accounted for by the original sampling frame being out of date or inaccurate.
Type 3: The denominator was restricted to addresses where it was determined that an eligible participant of the correct age and sex lived there. This response percentage reflected the willingness and ability of households to engage and the skill of the interviewer in motivating them to do so. The primary reason for non-response here was mainly a refusal to take part.
The response percentages with respect to obtaining a baseline interview for each city by age and sex are shown in Table 3. The overall response percentages for both cities were: Type 1 28.1% Type 2 35.1% and Type 3 51.0%. For all types, percentages were higher in Arkhangelsk than Novosibirsk, in women compared to men, and among older compared to younger participants.
One way of judging the extent of sampling bias introduced by non-response is to compare the educational distribution of those with a baseline interview and health check with the educational distribution for each city as determined at the 2010 Russian Census. Table 4 shows the observed distribution against the expected distribution from the Census distribution using indirect standardisation for age and sex for both completing the baseline interview and attending the health check. For Arkhangelsk the ratio overall for completing baseline questionnaire was 0.98 and that for attending the health check was 0.99. However, younger participants were more likely than expected to have higher education and older participants were less likely than expected to have higher education. For Novosibirsk the ratio of observed to expected education was above 1 for both completion of the baseline interview (1.14) and attending the health check (1.26).
Not everybody who had a baseline interview had a subsequent health check. Some people elected not to have one, while others were unable to arrange a suitable time or failed to attend at an arranged time. These proportions varied by city, with 96% attending in Arkhangelsk, but only 83% in Novosibirsk.
The proportions of interviewed participants by age and sex for each city are shown in Table 5. The response percentages with respect to health check attendance using the three types of response are shown in Supplementary Table S2 by age, sex and city (Supplementary File 4). As with response percentages for the baseline interview these were higher in Arkhangelsk and among women and older people.
There is evidence that those who did not attend the health check were different to those who did. The associations of characteristics measured at baseline with not subsequently having a health check are shown in the form of odds ratios in Supplementary Table S3 (Supplementary File 4). Adjusting simultaneously for city, age, sex, and education and distance from the clinic, those who did not have a the health check were more likely to be younger, male, with lower educational level, not in regular paid employment, have a worse financial situation, problem drinkers, smokers and report symptoms of major depression. Those who self-reported a history of hypertension, high cholesterol, myocardial infarction, heart failure or angina were more likely to have a health check but those who with self-reported previous stroke were less likely to do so. Participants living further away from the clinic were also less likely to attend the health check.

Patients receiving treatment for alcohol problems.
In total 275 participants receiving treatment for alcohol problems were recruited from Arkhangelsk out of 322 patients invited to take part (85.4%). It should be noted that although clinicians were instructed to invite all eligible participants they were allowed to use their clinical judgement as to who should be approached, as the patient's well-being was considered paramount. However this sample was not intended to be representative of all patients receiving treatment for alcohol problems in Arkhangelsk but to obtain a sample of people who drank extremely heavily. The sample was predominantly male (76.4% men) and the age distribution was skewed toward younger aged participants (median age 47 IQR 41-55). * Response type 1 denominator is total number of potential participants whose address was issued to interviewers. Type 2 denominator excluded addresses that could not be found or where no one of expected age and sex was found. Type 3 denominator restricted to those addresses where it was established that person of expected age and sex was resident. Further details can be found in Supplementary Table S1. ** Age self-reported at baseline interview or where participant was not interviewed age defined using expected age of participant at address from sampling frame Signed informed consent was obtained both at baseline interview and at the health check. At baseline interview the consent was obtained for passing on name, address, and telephone number to the polyclinic medical team for those deciding to have a health check. Agreement for interview per se was obtained verbally. At the health check written informed consent was obtained for participation in the study. Participants were given the option also to consent to be re-contacted by the study team in the future.
Baseline Interview. At the baseline interview, a questionnaire was administered by a trained interviewer using a computer assisted personal interviewing device (CAPI) implemented on a tablet computer. For quality assurance purposes these devices were programmed so both location of the interview (using GPS) and the time taken for each question were recorded automatically. The topics covered at the interview are summarised in Table 6. Where appropriate we used established and validated questions or question sets, as indicated in     36 . Baseline interview questions on smoking were repeated.
A summary of the components of the physical examination including the devices used for measurement is shown in Table 7.
Briefly the physical examination included measurements of blood pressure, pulse oximetry, anthropometry (height, waist and hip circumference, weight and body composition), digital ECGs, pulse wave velocity and pulse wave analysis. Physical function was assessed through measurement of grip strength using the Southampton protocol 38 , the time taken to stand up and sit down from a chair ten times in line with the MRC National Survey of Health and Development Protocols 35 , and standing balance on one leg with eyes open and eyes closed using protocol from the National Health and Aging Trends Study (Funded by the National Institute of Aging (U01AG032947); 2011).
The clinics were requested to offer 50% of the participants lung function tests and the option to wear a combined heart rate and movement sensor (Actiheart, CamNtech Ltd, UK) on the chest for 5 days in order to provide an objective measure of physical activity 41 . Those wearing the monitor were asked to complete a 200m self-paced walk test for the purposes of individual calibration of the heart rate response 42 . This approach was recently found to be valid for estimating freeliving activity energy expenditure 43 . For practical reasons, the selection of participants to be offered these additional components was done on the basis of offering them to all participants on days when medical personnel trained in these procedures were working in the clinic. The days of the week these procedures were offered on varied throughout the course of the fieldwork and included weekends.
Vascular ultrasound and echocardiography examination were done in accordance with a very detailed protocol. Participants underwent transthoracic echocardiography (ECHO) in the left lateral decubitus position using a commercially available systems equipped with a 1.0 ~ 5.0 MHz matrix sector transducer (Vingmed Vivid q or E9, GE Healthcare, Horten, Norway). A common standard operational procedure (SOP) for ECHO was developed for the study by an international team of nine leading experts (including AR, SM, HS, AH, DL) which was used in the Know Your Heart (Russia) study and in the Tromsø 7 (Norway) study. ECG-gated M-mode and two-dimensional grey-scale images as well as pulsed, continuous and colour Doppler data were acquired in the parasternal and apical views with breath hold to ensure image quality. Gray-scale images were obtained with only one focal zone to ensure a frame rate of at least 50 frames per second.
Images were recorded digitally in cine-loop format or still images as appropriate and analysed off-line with commercial software EchoPAC (v.113, GE-Vingmed AS, Horten, Norway). Off-line ECHO analysis was performed by 1 investigator (MS) for images obtained in Norway (Tromsø 7 study) and by the central reading laboratory in Novosibirsk by 2 investigators (AR, SM) for images obtained in Russia. Left ventricular (LV) and atrial volumes were measured from the apical 2-and 4-chamber views and LV ejection fraction (LVEF) calculated using the biplane Simpson's technique 44 . LV mass and relative wall thickness (RWT) were estimated from M-mode recordings according to current recommendations 44 . Chamber volumes and LV mass were indexed to body surface area. Doppler measurements of aortic, mitral, pulmonary and tricuspid valve flow were obtained according to current guidelines and the recommended grading of any detected valvular heart disease were followed 45,46 . We evaluated global longitudinal strain and strain rate of LV by 2D speckle tracking technique. PW Doppler tissue velocities of mitral annulus were traced for additional quantification of systolic and diastolic ventricular function 45 . Intra-and inter-reader variability was regularly assessed within both reading laboratories and between the Russian and Norwegian reading teams.
Vascular ultrasound (VUS) of carotid arteries was conducted in accordance with the study SOP for VUS with the participant in a supine position using a commercially available system equipped with a 3~13 MHz linear transducer (Vingmed Vivid q or E9, GE Healthcare, Horten, Norway). ECG-gated high-resolution two-dimensional grey-scale images were obtained in longitudinal and transverse views. The highest probe frequency was applied with only 1 focal zone and the highest frame rate (at least 40 frames per sec). VUS images were recorded digitally in cine-loop format or still images and analyzed off-line with software EchoPAC (v.113, GE-Vingmed AS, Horten, Norway).
Off-line vascular analysis was performed by 2 investigators (AR, SM) in the reading laboratory in Novosibirsk, Russia.
Computer-assisted measurement of both common carotid arteries intima-media thickness and assessment of carotid plaques (Mannheim Consensus;2004-2006 and patterns of artery wall structure were conducted. Intra-and inter-reader variability was regularly assessed.
All participants were asked to give a blood sample. Since the health checks took place throughout the day it was not considered feasible to ask participants to fast for 12 hours but participants were asked to fast for 4 hours prior to attending the health check. Questions about time of last meal and drinks consumed in the past four hours including caffeine and consumption of alcohol in the past 24 hours were asked by the receptionist on arrival and these data were recorded.
Blood samples were collected in 4 SST II vacutainers (8.5ml) and 2 EDTA vacutainers (10ml and 6ml) BD® (Beckton, Dickinson and Company, Preanalytical Systems, US). Serum vacutainers were left at room temperature for 30 minutes and then stored at 4°C while EDTA vacutainers were stored immediately at 4°C. The 10ml EDTA tube and the 4 SST tubes were centrifuged in cooled centrifuges at 4°C at 2100-2200g for 15 minutes. Samples were aliquoted in to 1.8 ml Nunc® cryotubes® (10 cryovials of serum, 3 cryovials of plasma and 4 cryovials of whole blood). We aimed for processing, aliquoting and freezing of blood samples within a target of 2 hours after sample collection (using time stamps from modules used within the laboratories at time of sample processing we confirmed this this was achieved for 84% of samples: 100% of samples in Arkhangelsk and 63% of samples Novosibirsk). Vacutainers and cryovials were uniquely identified using bar-coded labels.
Participants were asked to volunteer a spot urine sample and faecal samples for analysis of the gut microbiome. Those who agreed were provided with appropriate collection kits and instructions and requested to provide samples while they were in the clinic, or to return samples to the clinic later. The proportion of participants providing both types of optional sample was considerably higher in Arkhangelsk (urine 59%, faecal 43%) than in Novosibirsk (urine 26%; faecal 9%) and was particularly high for the participants recruited from alcohol services (urine 99.6%; faecal 89%). If providing the sample at home participants were instructed to store samples at 4°C and return to the clinic within 18 hours in order to meet target of freezing samples within 24 hours. Blood, urine and faecal samples were initially processed and stored at -20°C for a maximum of three weeks and then transferred to -80°C freezers. Periodically throughout the study biological samples were shipped to Moscow and stored at -80°C. Analysis was performed in one period at the end of the study with samples from both sites analysed in parallel and not sequentially. The core set of biochemistries analysed using the blood and urine samples are listed in Table 8. The target and achieved number of cryovials per participant of each biological sample type is shown in Table 9.
Electronic data capture for all aspects of the health check was used to reduce data entry errors that occur when using paper forms. This included all interview responses as well as output from all measurement devices. In some cases the data files created by the devices were also automatically captured. Exceptions were measurement of height, waist and hip circumference, pulse oximetry and the physical function tests (grip strength, chair stands, standing balance) in which the values were entered via the keyboard. The processing of laboratory samples was also done using a bespoke sample handling application. Data capture software was created using SURVANT (Netelixis IT Solutions. SURVANT survey authoring and data collection software: version 1.0).
Detailed reports on data quality were created each month on key areas to be monitored such as characteristics of participants, GPS location of interviews, and inter-operator variability. These reports were reviewed by the central study team in a monthly meeting with immediate feedback provided to the fieldwork sites.
The questionnaires and data collection tools used in the study are shown in Supplementary File 3. These files show paper versions of the questions used however all data collections was done electronically (CAPI for the baseline questionnaire and SURVANT for the health check.) Gut microbiome sub-study The gut microbiome refers to digestive-tract associated microbes, and more than 1,000 microbial species-level phylotypes can be accessed by sequencing the 16S ribosomal RNA genic region of faecal DNA samples. An Imbalance of the normal gut microbiome has been linked with gastrointestinal conditions (e.g. Inflammatory bowel disease, Irritable bowel syndrome), systemic metabolic diseases (e.g. obesity, type 2 diabetes), and atopy, but underlying studies tend to have low sample size. Whilst, the microbiome is affected by factors such as age, antibiotic use, and diet, the composition of the microbiota is thought to be relatively stable within healthy adult individuals over time 47 . Within this study it is proposed to establish the relative abundance of microbes by 16S (V3-V4 region) sequencing of faecal samples (n=1000) collected in Arkhangelsk.
The resulting characterization of the gut microbiome will be correlated with CVD outcomes, using regression-based association test that accounts for the rich set of confounders collected in the study. Analysis of the repeat samples collected a year later as part of the repeatability sub-study will facilitate an assessment of within-person microbiota stability over time.
The presence of phylotypes that may be linked to cardiovascular outcomes will be confirmed using a metagenomic approach, where whole genomes, rather than targeted 16S, are characterized.
Comparison with Tromsø 7 "Heart to Heart Study" The central objective of IPCDR is to understand why Russia has such high cardiovascular mortality compared to other countries. From the outset we planned to make comparisons of the cardiovascular phenotypes observed in Arkhangelsk and Novosibirsk with those observed in the 7 th wave of the Tromsø study. To facilitate this comparison UiT, the Arctic University of Norway which runs the Tromsø study has created the Heart to Heart initiative which provides an umbrella under which this comparative scientific work can be developed.
The Tromsø Study is a population based survey of residents of the municipality of Tromsø in Northern Norway 48 . To date there have been 7 waves of data collection with the first wave starting in 1974. Tromsø, the largest city in Northern Norway and situated ~4 00km north of the Arctic Circle, is geographically close to Arkhangelsk and as such is a particularly interesting study with which to make international comparisons.
Data collection for the Tromsø 7 Study took place between 2015 and 16 with a total of 21,000 men and women aged 40 and above living in Tromsø examined. During the development phases of both Tromsø 7 and the IPCDR Know Your Heart Study we aligned aspects of the medical examination and laboratory analyses to make the data as comparable as possible. In particular an identical protocol was used for the ECHO measurements in both studies. A validation study has been carried out by the University Hospital of North Norway laboratories whereby split samples were analysed in Norway and in the Moscow laboratories used for conventional biochemistries. Some of the key areas for comparison between the Tromsø study and IPCDR studies are shown in Supplementary Table S4. The University Hospital in Tromsø has also undertaken a validation study to compare measurement of body composition using the two types of device used in each study. A similar validation study was done for the measurement of physical activity.
Data and statistical analysis plan Data analyses will be focused around examining the associations and comparisons of interest shown in Figure 1. The analysis plan consists broadly of two parts: analysis of associations between key exposures of interest and cardiovascular phenotypes within Russia and 2) comparisons between Russia and other studies, particularly Tromsø 7. Examples of proposed analyses include: 1) Comparisons between Know Your Heart and Tromsø 7 on cardiovascular risk factors such as blood pressure and body mass index will be carried out by calculating means for continuous variables and proportions for categorical variables for all participants aged 40-69 stratified by sex and age standardized to the 2013 European Standard Population.
2) One of our exposures of interest will be alcohol use.
The data on self-reported consumption from the questionnaires and biomarker data (gamma glutamyl transferase, carbohydrate deficient transferrin for a sub-set) will be used to divide participants into appropriate groups (non-drinker, light drinker, moderate drinker, hazardous drinker (population sample), and hazardous drinker (diagnosed alcohol use disorder). Associations with cardiovascular disease phenotypes will be analysed using logistic and linear regression with adjustment for pre-specified confounders.
3) Alongside well-established, clinical measures of cardiovascular phenotype such as left ventricular ejection fraction, carotid intima media thickness (cIMT) and plaque, we are proposing to investigate whether there are underlying latent dimensions of cardiovascular phenotypes which can be obtained from the data using factor analysis.
A range of statistical analysis programs will be used including STATA (StataCorp LP), IBM SPSS stastistics (IBM Corporation) and R.

Strengths and limitations
This study has collected very detailed data on cardiovascular profile and risk factors for cardiovascular disease from the general population of two geographically distinct cities within Russia. The close connection with the Tromsø 7 study allows for comparisons between Russia and Norway in the same calendar years with considerably less chance that differences are due to study methodology than many comparisons between population-based surveys. To our knowledge the inclusion of participants receiving treatment for alcohol problems in an in-depth study of cardiovascular phenotypes alongside a general population sample is unique. The use of the same tools and measurement procedures in both populations is an important strength of the study.
One of the potential limitations of the study is the low response rate for Novosibirsk. This creates uncertainty about the generalizability of study findings particularly around estimation of prevalences and mean values of parameters that will affect comparisons that can be made with other countries. In both sites response rates were higher in women and older people. However, non-response patterns were complex, and the extent to which they may limit inferences with the Tromsø 7 study will depend upon the direction and magnitude of the differences found. For example, the fact that in Novosibirsk the educational profile of participants was weighted more towards those with higher education than the population of the city as a whole, might be expected in many cases to minimize differences with Tromsø.
There are several possible reasons for the poorer response rates in Novosibirsk than Arkhangelsk. Novosibirsk is an appreciably larger city than Arkhangelsk, with citizens being potentially more suspicious of approaches to take part in research. The smaller size of Arkhangelsk was one of the factors that facilitated good links with the city government who provided extensive support for an intensive public information campaign about the study in the city that was on a larger and more sustained basis than in Novosibirsk. Finally, the smaller size of Arkhangelsk and the location of the research polyclinic in the center of the city made it easier for participants to attend than may have been the case in Novosibirsk.
Despite these limitations the richness of the data collected means there is an unparalleled opportunity for in-depth analysis of cardiovascular phenotypes and much greater understanding of how these are associated with a wide range of biological, psychological and socio-economic risk factors within Russia and with the Tromsø 7 study (Figure 1).
While this study was designed as a cross-sectional study consent for recontacting participants and accessing health records was obtained therefore there is potential to obtain follow up data in the future.

Dissemination of information
Bona fide researchers will be able to apply for subsets of the data from this study for research purposes. In addition, we are establishing a biobank of the biological samples collected in Russia. Researchers will be able to apply to analyse these samples within Russia. Further information about the study including details of how to access data and samples will be available at https://knowyourheart.science/ [active from June 2018]. This website will be updated from time to time with summaries of findings and links to a separate

Not applicable
No competing interests were disclosed.

Competing Interests:
Referee Expertise: The authors should recognise that the study may not find associations of risk factors to phenotypes as the sample size is too small. I have read this submission. I believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard, however I have significant reservations, as outlined above. We thank Dr Filip Zemrak for his comments. Our response is as follows: 1. One of the major objectives of this study is to make comparisons between the Know Your Heart Study in Russia and the Tromsø 7 Study, Norway. Dr Zemrak may wish to note that as shown in Table 2 of our paper there are 8-10 fold differences in risk of CVD and IHD death between the Russian and Norwegian populations. This is the background against which we are attempting to identify differences in CVD phenotypes and risk factors. It is the basis for our confidence that even our relatively small study (compared to for example UK Biobank or China Kadoorie Biobank) we will find large differences in some key factors as a result. This has been supported by early findings on interim data where we compared cardiovascular risk factors between the two studies. We found strong evidence (i.e. very low p-values) for a difference between the study populations on several of the key parameters of interest (e.g. blood pressure). These results were presented at Regional Health Research Conference, Norway (2016) and the Russian National Congress of Cardiology (2017). A full paper reporting these contrasts is in preparation now that all field work is completed.
Given the nature of the study (breadth of measures collected) it is not feasible to provide sample size calculations for every possible research question of interest and the sample size calculations given within the paper are designed to cover a range of plausible relevant scenarios. Our paper is a study protocol describing the methodology of the study and providing key information about its conduct and recruitment of subjects. To this extent the imprecision of any effect estimates we subsequently report will be reflected in the confidence intervals. We agree with Dr Zemrak that for some things we will be underpowered, whether this is in comparisons with other studies such as the Tromsø study, or comparisons within the Russian study populations. However, for other things we will be more than adequately powered. We have acknowledged this limitation in the revised version of the paper.
2. We acknowledge measurement error is a limitation as in any multi-site epidemiological study. As outlined in the paper several steps were taken to minimize variability including the production of detailed standard operating procedures, review of reports on data quality every month throughout the duration of the fieldwork and regular assessment of intra-and inter-reader variability in offline reading of ECHO and vascular ultrasound data. A strength of the study design (as described within the paper) is that in each city a further 200 participants were invited back one year after their initial health check and completed all study procedures for a second time. From this data we have been able to calculate correction factors which assess the extent of within-person variability in measurements (which includes measurement error due to factors such as variation in observers and devices). From a further paper (currently under review) we can demonstrate our correction and devices). From a further paper (currently under review) we can demonstrate our correction factors for a variety of measures are comparable (and generally lower) to those from a very large "gold standard" UK study providing additional reassurance on data validity. Given that this paper is a protocol paper it would not be appropriate to include these more substantive results concerning repeatability of measures. These will be published alongside results of specific analyses as they appear.
3. Ejection fraction is not the main or only functional parameter of interest in this study. Others are described in the protocol under details of the health check examination. We mentioned ejection fraction under our illustrative analysis plan as an example of a commonly used parameter within clinical practice which may be used alongside more agnostic methods such as factor analysis but it was not our intention to suggest this is the only parameter to be investigated. As we describe in the protocol paper we are undertaking intra-and inter-site reliability studies of off line reading. Unfortunately we did not have the resources to undertake cardiac MRI on 4500 Russian participants and equivalent number of Norwegian study subjects. This is a 'model' example of how a protocol paper should be written. The protocol is exceptionally clearly written and contains all the necessary details of the various study aspects including the study purpose, sampling, recruitment and the various dimensions of data collection, through to an outline of the analysis plan.
It requires no revisions and can be published as it stands.
Is the rationale for, and objectives of, the study clearly described? Yes

Are sufficient details of the methods provided to allow replication by others? Yes
Are the datasets clearly presented in a useable and accessible format? Yes No competing interests were disclosed.

Competing Interests:
Referee Expertise: Epidemiology I have read this submission. I believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.
Page 29 of 29