The distribution of HLA-A, -B, and -DRB1 alleles and haplotypes in inhabitants of Guizhou Province of China

The present study was aimed to analyze the frequencies of human leukocyte antigen (HLA)-A, -B, and -DRB1 alleles and A-B-DRB1, A-B, A-DRB1 and B-DRB1 haplotypes in inhabitants of Guizhou province, China. All samples were typed in the HLA-A,-B, and -DRB1 loci using the polymerase chain reaction-reverse sequence specific oligonucleotide probe (PCR-rSSOP) method and HLA polymorphisms were analyzed. A total of 18 HLA-A, 31 HLA-B, and 13 HLA-DRB1 alleles were found in the Guizhou population. The first two frequent alleles in the HLA-A, -B, and -DRB1 loci were A*11(30.72%) and A*02(30.65%), B*40(16.27%) and B*46(16.27%), and DRB1*09(15.91%) and DRB1*15(13.51%), respectively. The most common haplotype was A*02-B*46-DRB1*09(5.59%) in A-B-DRB1, A*02-B*46(11.73%) in A-B, B*46-DRB1*09(7.49%) in B-DRB1, and A*02-DRB1*09(8.08%) in A-DRB1. Some haplotypes with strong linkage disequilibrium (LD) were found not only in the common haplotypes, such as A*33-B*58, B*30-DRB1*07, and B*33-DRB1*03, but also in the rare haplotypes, such as A*01-B*37, B*37-DRB1*10, and A*01-DRB1*10. Guizhou inhabitants shared some characteristics of the Southern Chinese population but also had their own unique features. Overall, HLA polymorphism in Guizhou population was more consistent with that of Chengdu population than that of other populations in China.


INTRODUCTION
Guizhou province is located in the southwest of China. It adjoins Sichuan province and Chongqing municipality to the north, Yunnan province to the west, Guangxi province to the south and Hunan province to the east. Guizhou is a mountainous province; however, while it is mountainous in the west, the eastern and southern regions are relatively flat. Guizhou covers an area of over 176,000 square kilometers with a total population of more than 35,245,000. Guizhou is one of the provinces that contain the greatest number of minority groups. There are 49 ethnic groups living there, with minorities making up about 38% of the population and their compositions rank third in China after Yunnan Province and Xinjiang Autonomous Region.
Human leukocyte antigen (HLA) genes are located at the short arm of chromosome 6 within a region of a few million base pairs. HLA is an extremely polymorphic genetic system and its constituent gene products play important roles in the immune response for unrelated hematopoietic stem cell transplantation [1,2] .
HLA haplotype analysis is important for identifying appropriate donors, and the most important clinical application of HLA haplotype has been the selection of suitable donors in transplantation [3] . HLA matching at the haplotype level may have a higher likelihood of matching at other loci than matching merely at the allele level [4] . On the other hand, an accurate and adequate characterization of the distribution of HLA alleles and haplotypes at the population level may have been lagging. Hence, determination of the distribution of HLA alleles and haplotypes in different populations is necessary for selecting acceptable unrelated donors for patients. With the development of the Chinese Marrow Donor Program (CMDP), more and more HLA typing data have become availabe, which provides us a good chance for analyzing HLA polymorphism. In addition, HLA typing technology has developed rapidly with the development of CMDP, and PCR technology has been applied in the DNAbased HLA typing method. Techniques available for DNA typing include sequence specific oligonucleotide probes (SSOP), sequence-specific primers (SSP) and sequence-based typing (SBT). However SBT technology requires expensive equipment, and the first two techniques give rise to flexibility with respect to the desired level of resolution depending on the number of oligonucleotide probes or primers used [5,6] .
In this paper, we examined the frequencies of HLA-A, -B, and -DRB1 alleles in a total of 2,879 persons residing in the Guizhou province of China. Furthermore, we estimated the frequencies of two or three locus haplotypes and the linkage disequilibrium test between two pairs of loci.

Subjects
Analysis included 2,879 donors recruited into the CMDP Guizhou Branch from August 2006 to December 2007. All donors, regardless of ethnic groups, were included in this study (Han 85%; Miao, Dong, and Buji etc 15%, aged from 20-45 years) and were typed for HLA-A, HLA-B and HLA-DRB1 in our laboratory. The experiment protocol was approved by the Institutional Review Board of the First Affiliated Hospital of Nanjing Medical University, and all subjects signed informed consent.

HLA typing
All donors were typed using PCR-reverse SSOP (PCR-rSSOP) method for HLA-A, -B and -DRB1 using commercial kits (LABtype rSSO Typing Test, lot# A007, B009, DRB0010, OLI, CA, USA). LABType ® SSO is a reverse SSO (rSSO) DNA typing method us-ing SSOP and color-coded microspheres to identify HLA alleles. First, genomic DNA was isolated from whole blood using the salting-out procedure with commercial kits (DNA Isolation Kit, Dynal Biotech, Brown Deer, Wisconsin, USA). The appropriate DNA concentration was 20-40 ng/μL and the relatively good purity of A 260 /A 280 was 1.6-1.8. Then, the sample DNA was subjected to PCR amplification (PE9700, Thermo cycler Life technologies, USA) in a 10 μL reaction volume, with the PCR run at 96°C for 3 min, 96°C for 20 s, 60°C for 20 s, and 72°C for 20 s, for 5 cycles, and 96°C for 10 s, 60°C for 15 s, and 72°C for 20 s for 30 cycles followed by 72°C for 10 min and stored at 4°C forever. After amplification, the PCR products were denatured and neutralized with acids and bases, and then the PCR products were hybridized with the corresponding locus beads at 60°C for 15 min, which were washed three times using the washing buffer. Then, streptavidin conjugated phycoerythrin (SAPE) was reacted with the products for 5 min at 60°C, and following washing, the products were detected using the Luminex 200 after being suspended with 60 μL washing buffer. Fluorescence signals were identified by the laser Luminex 200 (Luminex, USA), and lastly the HLA typing was obtained from the software HLAtools.

Statistical analysis
HLA allele frequencies (AF) were determined for each allele in donors using the formula: AF (%) = (n/2N)×100%, where n indicates the sum of a particular allele and N indicates the total number of individuals.
The maximum-likelihood haplotype frequencies, the Hardy-Weinberg equilibrium, and the linkage disequilibrium (LD) test were computed by the software Arlequin 3.01 using the expectation-maximization (EM) algorithm. Hardy-Weinberg exact tests were performed on all samples for each of the three HLA loci. The EM algorithm is a very general principle for handling missing data in statistical analysis. This algorithm has been described in detail somewhere as applied to estimation of multilocus haplotype frequencies. EM is an iterative method which alternates between performing an expectation (E) step, which computes the expectation of the log-likelihood evaluated using the current estimate for the latent variables, and a maximization (M) step, which computes parameters maximizing the expected log-likelihood found on the E step. These parameter-estimates are then used to determine the distribution of the latent variables in the next E step [7,8] .
The parameters reflecting LD intensity of D, D', and r 2 , and chi-square value given by Arlequin were also shown and the mathematic definitions of D, D' and r 2 were given in detail elsewhere [9] . Table 1 The Hardy-Weinberg equilibrium of HLA-A, -B, and -DRB1 loci in Guizhou population

Hardy-Weinberg equilibrium examination
Hardy-Weinberg exact tests were performed on the three HLA loci. The observed, expected homozygosities and the statistical P value are given in Table  1. The results showed that the P values at the three loci were all more than 0.05. The P value was used to measure the magnitude of the deviation in a population sample, if a P value greater than 0.05, indicated that the population were consistent with Hardy-Weinberg equilibrium [10] , which meant that the population was random and the sample size was adequately large [11,12] .

Allele frequencies
A total of 18 HLA-A, 31 HLA-B, and 13 HLA-DRB1 alleles were found in Guizhou population. In the HLA-A locus, A*11 was the most frequent allele in the present study with a frequency of 30.72%, followed by A*02(30.65%), A*24(17.07%), and A*33(7.43%). In the HLA-B locus, B*40 and B*46 were ranked as the first two frequent alleles with the same frequency of 16.27%, followed by B*15 (13.89%), B*13(9.66%), B*51(6.34%) and B*58(6.32%). In the HLA-DRB1 locus, DRB1*09 was the most common one (15.91%), followed by DRB1*15(13.51%), DRB1*12 (13.06%), DRB1*04 (10.44%) and DRB1*14 (9.34%). In addition, some HLA alleles were found to be very rare in the Guizhou population. For example, A*25(0.02%) and A*36(0.02%) in the HLA-A locus, and B*53(0.02%) and B*59(0.02%) in the HLA-B locus. Besides, some HLA alleles were not detected at all, such as A*43, B*82 and B*83. The frequencies of HLA-A, -B, and -DRB1 alleles are described in Table 2. The HLA allele distribution ( Table 3) showed that the majority of Guizhou population harbored the most common alleles. There were three alleles in the HLA-A locus (over 10%) with a cumulative frequency of 78.44%, three alleles in the HLA-B locus with a cumulative frequency of 46.43% , and four alleles in the HLA-DRB1 locus with a cumulative frequency of 52.92%. Overall, the alleles with frequencies more than 1% in the HLA-A, -B, and -DRB1 loci made up 90% of the total population.

Linkage disequilibrium
The results of linkage disequilibrium (LD) test between two pairs loci are summarized in Table 5-7 ranked by the LD parameter, r 2 value. Some strong LD haplotypes were detected between two loci, including the common haplotypes and the rare haplotypes. For example, in the A-B haplotype, the haplotype with the strongest LD were A*33-B*58 with a frequency of 5.1%, while the haplotype with the second strongest LD was a rare haplotype (A*01-B*37) only with a frequency of 0.68%. In the A-DRB1 haplotype, the first two strongest LD haplotypes were common haplotypes including A*30-DRB1*07 (2.30%) and A*33-DRB1*03 (2.85%). However, the third strongest one was a rare haplotype A*29-DRB1*10 with a frequency of 0.38%. While in the B-DRB1 haplotype, B*37-  DRB1*10 with a frequency of 0.76% was ranked as the first strongest LD haplotype, followed by three common haplotypes, which were B*58-DRB1*03 (3.46%), B*46-DRB1*09 (7.49%) and B*13-DRB1*07 (2.85%).

Comparison of the frequent alleles between Guizhou province and other populations
The first three frequent alleles in the HLA-A,-B, and -DRB1 loci in Guizhou province and other populations were obtained from the previous studies: the provinces of Shanxi [13] , Henan [14] , Jiangsu [15] , Hunan [16] , and Hainan [17] . As it can be seen from Table 8, A*11, A*02, and A*24 were more frequent in Guizhou than in other provinces in China, regardless of whether the Table 5 The relative strongest linkage equilibrium between HLA-A and -B  Table 6 The relative strongest linkage equilibrium between HLA-A and -DRB1  Table 7 The relative strongest linkage equilibrium between HLA-B and -DRB1 population was the northern Chinese or southern Chinese. A*11 was more frequent in southern Chinese than the northern Chinese and A*02 was less frequent in Hunan and Hainan provinces than in the northern Chinese. In the the HLA-B locus, B*40 and B*46 were more frequent in Guizhou than in other provinces in China. In the HLA-DRB1 locus, DRB1*12 in Guizhou was the most frequent allele among the six provinces, and the frequency of DRB1*15 was between that of the northern Chinese and southern Chinese.

DISCUSSION
Guizhou province is in the southwest of China. One would expect that Guizhou presents some of the characteristics of the Southern Chinese population. The common alleles in the Chinese population, such as A*02, A*11, A*24, A*33, B* 40, B*58, B*15, B*46, DRB1*09, DRB1*15, DRB1*12, and DRB1*04, are also found frequently in Guizhou population. Meanwhile, it was shown that inhabitants of Guizhou province exhibit some differences from those of the other provinces of China in some alleles, especially differences from some minority groups, such as Hui, Wa, and Drung [18][19][20] . For example, in locus A, A*02 (30.65%) and A*11 (30.72%) were the first two most frequent alleles in Guizhou province, and there were nearly no differences in the frequencies of two alleles. In Chengdu population [21] , A*11 was the most frequent allele with a frequency of 31.50%, and A*02 was the second most frequent allele with a frequency of 31.03%, which is nearly the same as that reported for Guizhou province. The frequency of A*02 in Guizhou province is consistent with that of Southern Chinese, such as that in Jiangsu (A*02, 29.55%) [15] and Shanghai (A*02, 31.34%) [22] . The first four most frequent alleles in Guizhou were in the order of A*11, A*02, A*24 and A*33, and this order is the same as that of Chengdu [21] . In contrast, the first four most frequent alleles in the A locus in Jiangsu and Shanghai are all in the order of A*02, A*11, A*24 and A*33 [15,22] . The first four most frequent alleles in Yunnan province for Han Chinese are in the order of A*24, A*02, A*11 and A*33, which differ more from those in Guizhou population than Jiangsu and Shanghai, although Yunnan province is adjacent to Guizhou province. In the B locus, B*46 (16.27%) is the most common allele in Guizhou population, which is also the most common one in Chengdu(16.3%) [21] and Yunnan (17.9%) [23] . While in Jiangsu [15] and Shanxi [13] , B*15 is the most common one. The first four most frequent alleles in the B locus in Guizhou are in the order of B*40, B*46, B*15 and B*13, for Han Chinese in Yunnan are B*46, B*15, B*40 and B*13 [23] , in Jiangsu are B*15, B*40, B*13 and B*46 [15] . The four alleles are all the same but in different order. In the DRB1 locus, DRB1*09 (15.91%) is the most predominant allele, which is also the most predominant one in Jiangsu [15] , Shanghai [22] and Chengdu [21] and the first four most frequent alleles in DRB1 locus in Guizhou are in the same order as that in Chengdu, with the same order of DRB1*09, DRB1*12, DRB1*15, and DRB1*04. By comparison of the frequencies of the HLA alleles in several provinces, Guizhou is more consistent with Chengdu than other provinces.
The distribution of the HLA-A, -B, and -DRB1 alleles ( Table 2) showed that the majority of Guizhou population harbor the common alleles, which means that most patients for homologous stem cell transplantation (HSCT) would readily find HLA-A, -B, and -DRB1 matched donors in CMDP Guizhou registry if they carry those common alleles.
HLA haplotype estimate is a valuable tool in the management of donor registries. It has been used to project how many donors would be needed to achieve a certain probability of finding an HLA-matched donor. Another useful application of the haplotype frequency is to predict the probability that a donor typed at low or intermediate-resolution would match a specific patient at high resolution. Besides, HLA haplotype provides valuable information in tracing the source of historical genetic inputs.
In summary, the present study reported HLA-A, -B, and -DRB1 allele frequencies and haplotype frequencies in Guizhou population. The results would be useful as baseline data for donor selection of hematopoietic stem cell or solid organ transplantation, anthropology studies and HLA disease association analysis.