New trends of HCV infection in China revealed by genetic analysis of viral sequences determined from first-time volunteer blood donors

Recently, we studied hepatitis C virus (HCV) sera-prevalence among 559 890 first-time volunteer blood donors in China. From randomly selected 450 anti-HCV positive donors, we detected HCV RNA in 270 donors. In this study, we amplified HCV E1 and/or NS5B sequences from 236 of these donors followed by DNA sequencing and phylogenetic analysis. The results indicate new trends of HCV infection in China. The HCV genotype distribution differed according to the donors’ region of origin. Among donors from Guangdong province, we detected subtypes 6a, 1b, 3a, 3b, 2a, and 1a at frequencies of 49.7%, 31.0%, 7.6%, 5.5%, 4.1%, and 2.1%, respectively. Among donors from outside Guangdong, we detected 1b, 2a, 6a, 3b, 3a, 6e, and 6n at frequencies 57.1%, 13.2%, 11.0%, 9.9%, 4.4%, 2.2%, and 2.2%, respectively. Although we found no significant differences among regions in age or gender, subtype 6a was more common (P< 0.001) in donors from Guangdong than those from elsewhere, whilst subtypes 1b (P< 0.02) and 2a (P < 0.001) were more frequent outside Guangdong. Disregarding origins, the male/female ratio was higher for subtype 6a-infected donors (P < 0.05) than for subtype 1b donors, whilst the mean age of subtype 2a donors was 8–10 years older (P < 0.05) than that for all other subtypes. Detailed phylogenetic analysis of our sequence data provides further insight into the transmission of HCV within China, and between China and other countries. The predominance of HCV 6a among blood donors in Guangdong is striking and mandates studies into risk factors for its acquisition.


INTRODUCTION
Hepatitis C virus (HCV) is a blood-borne pathogen that presents a major threat to global public health. Worldwide, about 170 million people are infected with HCV [1], and the prevalence varies among countries [1][2][3][4][5][6]. HCV can cause chronic liver disease in 75-85% of the infected individuals. The outcomes include liver cirrhosis and hepatocellular carcinoma [7,8]. The rapid, global spread of HCV resulted mainly from transmission through blood transfusion [9]. Recently, in countries where donor screening is performed, new cases are often associated with injection drug use (IDU) and unsafe medical procedures. Other routes are also indicated [10].
Analysis of viral sequences has resulted in classifying HCV into six genotypes and >80 subtypes [11]. HCV genotypes vary in patterns of geographical distribution and therapeutic response. Subtypes 1a, 1b, 2a, 2b, and 3a are Ôglobal epi-demicÕ. Other genotypes are restricted to particular regions [12][13][14]. However, the geographical and genetic diversity of HCV is constantly evolving as result of modern transmission and increasing global travel.
Hepatitis C virus classification is usually consistent throughout the genome. Recombination appears rare. Hence, the provisional assignment of HCV genotypes/ subtypes can be based on partial genomic regions. Sequences from 5¢UTR are optimal for sensitive diagnosis but not sufficient to differentiate subtypes [15,16]. Sequences from the E1 and NS5B regions vary among strains and are suitable for genotyping [12]. Using sequence data from the two regions, we have determined HCV genotype distribution in different areas of China. Overall, 1b is the most predominant followed by 2a. However, in Guangdong Province, 6a has replaced 2a as the second most common subtype [17]. The emergence of 6a is probably because of its close association with IDU. This transmission may have been aggravated in recent years. In our previous report, however, samples were only obtained from patients who may not adequately reflect the general population [17]. In this study, specimens were collected from first-time volunteer blood donors [18]. We performed detailed phylogenetic analysis of E1 and NS5B sequences to provide insights into HCV transmission within China, and between China and other countries.

MATERIALS AND METHODS
From January 2004 to December 2007, a total of 559 890 first-time volunteer blood donors were recruited. Routine screening detected anti-HCV among 1877 donors. Among the 1877 donors, 450 were selected for RT-PCR, and 270 were found to be HCV RNA+ [18]. cDNA from the latter 270 donors were retained and used in this study.
Sequencing methods were as previously described [17]. Briefly, HCV fragments were amplified using the Primer STAR kit (Takara, Dalian, China). Amplicons were purified with the QIAquick PCR purification kit (QIAgen, Valencia, CA, USA). DNA was sequenced in both directions on an ABI Prism 3100 genetic analyzer (PE Applied Biosystems, Foster City, CA, USA). Sequences were aligned using the CLUS-TAL_X program (www.geneious.com). Phylogenies were estimated using the maximum-likelihood method under the HKY+I+G 6 substitution model, implemented in the PHYML program (http://atgc.lirmm.fr/phyml/). Bootstrap resampling was performed using 500 replicates. A variety of referenced sequences were retrieved from Genbank and included for analyses (see .
Guidelines set by the Institution Review Board of the Guangzhou Blood Center and the University of Utah were strictly followed. Written informed consent was obtained from all participants when they donated blood.

Characteristics of the study group
Of the 270 donors, 204 (75.6%) were men, and 66 (24.4%) were women; 178 (65.9%) were from Guangdong, and the remainder from elsewhere. Ages ranged from 21 to 54 years with a mean age of 34.4 years (SD = 6.79 years). Two hundred and sixty-eight donors are of Han ethnicity; one each originates from the Zhuang and Tujia ethnic groups [18].

Analysis of subtype 1b sequences
In total, 97 subtype 1b isolates were identified: 84 were represented by both E1 and NS5B sequences, 1 represented by E1 only, and 12 by NS5B only. Almost all of the E1 sequences were grouped into five clusters, labelled A, B, C, D, and E containing 37, 29, 8, 2, and 2 sequences, respectively. The bootstrap supports for clusters A-E were 83%, 44%, 99%, 88%, and 90%, respectively ( Fig. 1). Although many reference sequences were included in the clusters, they were all of Chinese origin [14,17,[19][20][21][22]. Figure 1 provides an expanded view of cluster A and B (each is collapsed into a branch in the main tree) that were investigated previously and found to coincide with the ÔCultural RevolutionÕ in China [23]. These two lineages appear no different to other branches but selectively spread in China. Clusters C, D, and E may have similar epidemiologic histories, although fewer isolates were identified.
The NS5B sequences were also grouped into clusters A, B, C, D, and E, containing 38, 36, 10, 2, and 2 sequences, respectively. Bootstrap supports for clusters A-E were 53%, 75%, 73%, 42%, and 24%. Figure 2 shows two further clusters, F and G, having bootstrap scores of 66% and 68%, respectively. All isolates in cluster F were from Yunnan [24], and all isolates in cluster G were from Beijing [25]. Figures 1 and 2 are compared, and the isolates from this study are located in similar positions, indicating reliable sequencing results and no recent viral recombination events or mixed HCV infection. In total, 87.6% (85/97) of the 1b isolates grouped into clusters A, B, or C ( Table 2). Consistent with our previous report [17], cluster A is prevalent nationwide, and cluster B is more common in Guangdong. When the E1 and N5B sequences from identical isolates were concatenated, significant bootstrap supports were obtained for both clusters (not shown).

Analysis of subtype 6a sequences
Subtype 6a sequences were isolated from 82 donors; E1 and NS5B were sequenced from 75 donors, E1 from 4, and NS5B from 3. The E1 tree presents three clusters (denoted I, II, and III) containing 36, 9, and 19 isolates, respectively (Fig. 3). When 573 nt long sequences were analysed, the three clusters exhibited bootstrap scores of 75%, 91%, and 78%. When sequences were trimmed to 410 nt, bootstrap scores were correspondingly reduced to 70%, 83%, and 45%. The  Fig. 1 Subtype 1b phylogeny estimated from E1 region sequences (H77 positions: 869-1289). Subtype 1a sequence M62321 was used as an outgroup. Green circles label reference sequences from outside China, and red circles label sequences from this study. Sequences without a circle are Chinese isolates reported in other studies. Blue circles and dashed arrows represent the locations of clusters A and B, which are shown expanded in two boxes on the right. Three dashed circles indicate clusters C, D, and E and bootstrap support values are shown in italics. Scale bar represents 0.10 nucleotide substitutions per site. Fig. 2 Subtype 1b phylogeny estimated from NS5B region sequences (H77 positions: 8276-8615). Subtype 1a sequence M62321 was used as an outgroup. All labels are the same as those described in the Fig. 1 legend. In addition, yellow circles represent unpublished sequences obtained by us from Chinese patients coinfected with HIV-1 [40], and dashed circles indicate clusters F and G.  I   II   III   IV   V   fs13 fs3  Fig. 3 Subtype 6a phylogeny estimated from E1 region sequences (H77 positions: 869-1289). Subtype 6b sequence D84263 was used as an outgroup. All labels are the same as those described in the Fig. 1 legend. In addition, five dashed circles indicate clusters I, II, III, IV, and V. The box on the left contains a smaller phylogeny estimated from longer 510 nt sequences, which shows stronger bootstrap support for clusters I, II, and III.  HKG19P  GZ0364  GZ0066  HKGS31  HKG14P  HKGS15  6a64  6a62  GZ0313  GZ0311  gb9  GZ0344  GZ0532  GZ0270  GZ0307  GZ0411  GZ0367  GZ0404  sz826  GZ0020  gb26  GZA0001  GZ0552  TW10520  TW346  TW12137  TW12142  HKG22P  HKGS21  HKGS1  GZ0216  HKGS17  TW10350  HKG3P  Y12083  gb14  HKGS10  GZ0545  HKGS27  GZ0472  6a33  GZ0206  GZ0329  6a73  6a66  6a67  GZ0144  GZ0453  GZ0012  HKGS25  GZ0223  6a35  HKGS30  6a63  6a77  6a65  HKGS14  GZ0034  GZ0534  GZ0104  GZ0044  GZ0207  GZ0074  GZ0343  GZ0111  GZ0167  GZ0045  GZ0221  GZ0466  GZ0510  GZ0351  GZ0225  GZ0458  sz593  fs1  GZE0003  6a72  HKGS29  6a69  TW7458  TW9413  D2  D45 VN11  II   III   6a   6b   GZ0517  GZ0447  GZ0256  TW3759 GZB0002 gz8 Fig. 4 Subtype 6a phylogeny estimated from E1 region sequences (H77 positions: 8276-8615) Subtype 6b sequence D84263 was used as an outgroup. All labels are the same as those described in the Fig. 1 legend. In addition, three dashed circles indicate clusters I, II, and III. The box on the left contains a smaller phylogeny estimated from a longer alignment of concatenated E1 + NS5B sequences (870 nt), which shows very high bootstrap support for clusters I, II, and III.
shorter alignment enabled the inclusion of more reference strains, which released two further clusters (denoted IV and V) having bootstrap scores of 90% and 78%, respectively. Clusters IV and V contained isolates from Yunnan, Hubei, and Guangxi [24,26,27]. Sixteen reference sequences were from Hong Kong, but none grouped within clusters I to V. Isolates from IDUs in Guangxi and Hubei appeared throughout the tree [26,27]. Generally, more derived sequences were from China [17], whereas more ancestral sequences were from Vietnam or immigrants from Vietnam.
Among the NS5B sequences, 36, 10, and 16 grouped within clusters I, II, and III, respectively (Fig. 4). Only cluster I had a significant bootstrap score of 80%. In both trees, identical isolates were placed in the same clusters (Figs 3 & 4). Reference sequences of Vietnamese origin tended to locate nearer the tree base. Phylogenetic signal was increased by concatenating E1 and NS5B sequences from identical isolates: the rectangle in Fig. 4 shows the phylogeny. Clusters I, II, and III were again present and supported with bootstrap scores of 86%, 95%, and 88%, respectively. Only five Vietnamese reference sequences could be concatenated; they were placed near the tree root.

Analysis of subtype 2a sequences
Subtype 2a sequences were isolated from 18 donors; E1 and NS5B were sequenced from 16 donors, whilst E1 only and NS5B only from one each. In both E1 and NS5B trees, subtype 2a isolates do not exhibit a clear geographical pattern ( Figure S1). In the E1 tree, we observed three clusters mainly composed of Chinese isolates [17]. The NS5B tree showed two notable clusters: one containing isolates from France and the other representing isolates from a single hospital in Beijing [25].

Analysis of subtype 3a sequences
Subtype 3a sequences were isolated from 15 donors; E1 and NS5B were from nine donors, E1 only from one, and NS5B only from five. In the both trees ( Figures S2 & S3), our majority isolates formed a single cluster, supported by bootstrap scores of 81% and 60%. In the E1 tree, three wellsupported UK clusters were observed. In the NS5B tree, there was substantial geographical mixing: isolates from different continents tend to group together, whilst sequences from the same country were spread throughout the tree. Some geographical clusters were apparent but exhibited no strong bootstrap support. In contrast, sequences from China formed two distinct groups.

Analysis of subtype 3b sequences
Subtype 3b sequences were isolated from 17 donors; E1 and NS5B were from 15 donors and NS5B only from two. Some reference isolates from China were mingled with the strains from this study. They formed a single Chinese cluster with bootstrap support of 78% in the E1 tree and 44% in the NS5B tree. Other reference sequences were from South and Southeast Asia and placed nearer the root of the tree ( Figure S4). Hence, subtype 3b likely originated from neighbouring countries and is now growing in China, particularly in Yunnan [24] and Guangxi [26] and among IDUs [27].

Analysis of subtype 1a sequences
Subtype 1a isolates were sampled from three donors with both E1 and NS5B sequences amplified. We have previously sequenced a 1a strain in a patient from Shanghai [17]. Excluding this, no 1a sequences in Genbank were from China. Phylogenetically, the three 1a isolates appeared not to group with that from the Shanghai patient ( Figure S5). In the E1 tree, the three isolates formed a cluster but only exhibited a bootstrap score of 44%. In the NS5B tree, the three isolates were dispersed among strains from other countries. Although six 1a strains from Taiwan formed a small cluster [28], they were not related to the isolates reported here. These results suggest that subtype 1a in China is resulted from sporadic importation events.

Analysis of subtype 6e and 6n sequences
Subtype 6e and 6n sequences were obtained from two donors each. Phylogenetic analysis grouped the two 6e isolates with GX004 and one 6n isolate close to KM42 ( Figure S6). In the E1 tree, the Chinese 6e and 6n clusters were supported by bootstrap scores of 100% and 98%, respectively. In the NS5B tree, no strong support was obtained. GZ0355 was from a donor from Guangxi and was grouped with six subtype 6e isolates all from Guangxi. Similarly, GZ0203 was from a donor from Guizhou and closely grouped with six 6n isolates from Yunnan, which is adjacent to Guizhou [17,24].

DISCUSSION
In this study, we observed different patterns of HCV genotype distribution among two groups of blood donors. Subtype 6a was predominant in Guangdong group, whilst 1b and 2a predominant in non-Guangdong group. In 2002, we completed a similar study of 139 patients from nine cities in China. We found that 1b, 2a, and 6a accounted for 66.2%, 13.7%, and 10.1% of infections, respectively. Importantly, among patients from Guangdong, subtype 6a has replaced 2a and become the second most common subtype, accounting for 21.2% (14/66). In contrast, no 6a has been detected among patients from other areas [17]. Findings from the current study indicate further spread of 6a infections in China. Among donors from Guangdong, 6a has become the dominant HCV genotype, accounting for 49.7%. Among donors from other areas, subtype 6a was also detected in 10.6%.
Studies have identified 6a prevalence in Hong Kong, Macau, Thailand, and Vietnam. Other studies have found 6a in Singapore (U908306-U908309) and Taiwan [28]. In Hong Kong, 6a has been detected in 27-30% of HCVinfected donors and 60% of HCV-infected IDUs. In Hong Kong, 6a appears to spread to the general population mainly through IDUs. We have postulated that subtype 6a in Guangdong was introduced from Hong Kong [17], because of the geographical proximity of the two locations and because the subtype was detected earlier in Hong Kong. Whilst this may be true for some cases, it is not sufficient to explain the recent 6a spread in mainland China. Phylogenetically, 6a sequences from Guangdong formed three clusters (denoted I, II, and III). Cluster I also contained sequences from IDUs from different regions of China, including cities in Guangxi province bordering Vietnam [26], Liuzhou in Guangxi [29], and Wuhan in Hubei province [27]. Clusters I and II may represent 6a strains originating in Guangdong and now starting to seed IDU networks elsewhere. Five subtype 6a isolates from IDUs in Taiwan [28] also grouped in cluster II, indicating the IDU networks extended from mainland China to Taiwan. Spreading of HIV in IDUs via this route has also been reported [30]. Cluster VI was among IDUs in Guangxi only. Cluster V could be an outcome of interchange between IDU networks in Yunnan and Guangxi [25][26][27]31]. Phylogenetically, subtype 6a sequences from Hong Kong are distinct from those isolated in mainland China [17], suggesting that the 6a circulation in mainland China may not be directly linked to that in Hong Kong.
The causes of 6a emergence in China are unknown. Based on phylogenies, we hypothesize that subtype 6a originated in Vietnam, or perhaps pre-existed in southern China. A possible historical event for the importation of 6a to China was the emigration of 290 000 Vietnamese to China from the late 1970s to early 1980s [32]. Hong Kong also received 100 000 such emigrants, mostly from South Vietnam [33]. Many ethnic Chinese from North Vietnam fled to China and were resettled in Yunnan, Guangxi, and Guangdong provinces [34].
Some 6a strains may have entered the blood transfusion networks and undergone transmission among recipients, particularly before the discovery of HCV in 1989, or through unsafe medical procedures prior to the governmental ban on paid blood donors in 1998 [35]. Although no large-scale transfusion-linked infections were recorded, some individuals could have been infected by the use of contaminated plasma or unscreened medical infusions, such as foetal liver cells and therapeutic immunocytes that were injected commonly in China during 1990s [36,37].
Guangdong was the first region of mainland China to undergo economic reforms because of its proximity to Hong Kong. It was opened to outside investment and has undergone rapid economic growth. This has resulted in greater social exchange with other countries and has attracted millions of internal Ômigrant labourersÕ [38]. The numbers of commercial sexual workers and IDUs have risen alongside this economic growth, providing more opportunity for 6a strains to spread to the general population [39].
In Guangdong, the relative prevalence of 6a has grown from undetectable levels to >20% of infections [17]. In this study, we demonstrate further 6a growth, such that it has become the dominant HCV strain, infecting 49.7% of donors in this study. The spread has occurred despite the outlawing of paid blood donors [35] and improved healthcare standards.
Other transmission routes most likely explain the growth of 6a infection. Recent increases in the numbers of IDUs have been reported. IDUs can transmit 6a strains to distant areas, via known drug trafficking routes [31]. This is the first study of HCV genotype distribution among first-time volunteer blood donors in China. Subtype 1b was the most common, accounting for 41.1%. Most of these isolates were classified into two clusters, A and B. This is consistent with our previous study [17] and indicates that the geographical distribution of the two clusters in China remains largely unchanged. Overall, a marked decrease in the proportion of 1b infections was observed, down from 66.2% (92/139) in our previous report [17] to 41.1% (97/ 236) in the present study. Explanations are the relative increase in 6a infections and/or a larger portion of donors from Guangdong. However, if we consider only donors from outside Guangdong, a decreased 1b frequency is still observed (57.1%). Subtype 1b strains are likely more associated with transmission via blood transfusion and medical procedures. Effective measures to reduce these transmissions have lead to decreased 1b infections. In contrast, subtype 6a strains may be more linked to IDU and sexual transmission, which are both increasing risks in Guangdong.
We found that subtype 2a infected donors were significantly older (8-10 years) than those infected by other HCV strains. Phylogenetically, 2a sequences from China did not form clear geographical clusters. The relative prevalence of 2a has declined in comparison to our previous report [17], which is apparent when the Guangdong group and non-Guangdong group are considered separately or as a whole. Collectively, it is suggested that new cases of HCV 2a infection in China have been reduced.

CONFLICTS OF INTEREST
None reported.

SUPPORTING INFORMATION
Additional Supporting Information may be found in the online version of this article: Figure S1. Subtype 2a phylogeny estimated from (a) E1 region (H77 positions: 869-1289) and (b) NS5B region (H77 positions: 8276-8615). Subtype 2b sequence D10988 was used as an outgroup. Green circles label reference sequences from outside China, red circles label sequences from this study and yellow circles label Chinese isolates reported in other studies. Figure S2. Subtype 3a phylogeny estimated from E1 region (H77 positions: 869-1289). Subtype 3b sequence D49374 was used as an outgroup. All labels are the same as those described in the Figure S1. Figure S3. Subtype 3a phylogeny estimated from NS5B region (H77 positions: 8276-8615). Subtype 3b sequence D49374 was used as an outgroup. All labels are the same as those described in the Figure S1. The symbol Ô//Õ denotes where this large phylogeny has been split into two parts. Figure S4. Subtype 3b phylogeny estimated from (a) E1 region (H77 positions: 869-1289) and (B) NS5B region (H77 positions: 8276-8615). Subtype 3a sequence D17763 was used as an outgroup. All labels are the same as those described in the Figure S1. Figure S5. Subtype 1a phylogeny estimated from (a) E1 region (H77 positions: 869-1289) and (b) NS5B region (H77 positions: 8276-8615). Subtype 1b sequence M58335 was used as outgroup. All labels are the same as those described in the Figure S1. Figure S6. Subtype 6e and 6n phylogeny estimated from (a) E1 region (H77 positions: 869-1289) and (b) NS5B region (H77 positions: 8276-8615). Subtype 2a sequence D00944 was used as an outgroup. All labels are the same as those described in the Figure S1.
Please note: Wiley-Blackwell are not responsible for the content or functionality of any supporting materials supplied by the authors. Any queries (other than missing material) should be directed to the corresponding author for the article.