NGS-PrimerPlex: high-throughput primer design for multiplex polymerase chain reactions

Summary Multiplex PCR has multiple applications in molecular biology, including developing new targeted NGS-panels. We present NGS-PrimerPlex, an efficient and versatile command-line application that designs primers for different refined types of amplicon-based genome target enrichment. It supports nested and anchored multiplex PCR, redistribution among multiplex reactions of primers constructed earlier and extension of existing NGS-panels. The primer design process takes into consideration formation of secondary structures, non-target amplicons between all primers of a pool, primers and high-frequent genome SNPs overlapping. Moreover, users of NGS-PrimerPlex are free from manually defining input genome regions, because it can be done automatically from a list of genes or their parts like exon or codon numbers. Using the program, NGS-panel for sequencing the LRRK2 gene coding regions was created, and 354 DNA samples were studied successfully with median coverage of 97.4% of target regions by at least 30 reads. Availability and implementation NGS-PrimerPlex is open-source and freely available at https://github.com/aakechin/NGS-PrimerPlex/. Contact aa_kechin@niboch.nsc.ru Supplementary information Supplementary data

Multiplex PCR is a fundamental approach to get many several fragment copies of single DNA molecule simultaneously. It has a lot of applications in different science areas, from kinship determination (Grover et al., 2017) to NGS library preparation (Cohen et al., 2018;Ermolenko et al., 2015). A transition from monoplex to multiplex reactions requires consideration of many factors that may influence PCR efficiency. It includes secondary structure formation by oligonucleotides and non-target hybridization, elongation and amplification with primers from different pairs. All of this should be taken into account during manual primer design.
To facilitate primer design process, many tools have been developed (Supplementary Table 1).
However, none of them can completely simplify the process of developing new amplicon-based NGS panels. This library preparation approach has become widespread among both researchers and commercial companies due to its simplicity of utilization by end user and highly efficient target enrichment for DNA samples of different quality. This approach has become particularly relevant for the molecular genetic testing of tumors, where many target genes should be studied.
At the same time, targets differ not only from one type of cancer to another, but also between patients with one type of oncology.
Here we describe a new tool for automatic primer design, particularly for creating new amplicon-   filled with brown denotes optional steps that can be applied by user. Blue rectangles denote inp and output files; grey rectangles denote steps of the whole workflow.

Multi-step primer design
By default, NGS-PrimerPlex splits input genome regions onto distinct positions and tries construct primers for each of them using primer3-py that includes primer3 library (Untergasser al., 2012). For each position, the program tries to design three types of primers: so that the rig primer was close to the studied position, the left primer was close to the studied position, a without any of these restrictions (Supplementary Figure 2). This is necessary for subseque successful joining of primers into a set of primer pairs that amplify a single extended genom region (e.g. whole exon). Unlike other programs (e.g. hi-plex) which design primer pairs n overlapping at all, NGS-PrimerPlex allows overlapping but considers it during distribution in  (Oskina et al., 2017).
Another feature of NGS-PrimerPlex that may be useful for complex genome regions is a multistep primer design. Frequently, existing tools suggest user to choose parameters of primer design and at the end of the design process program can give an error that primers could not be constructed with the defined parameters. After choosing less strict parameters, user has to start the process from scratch that significantly slows down parameter optimization. NGS-PrimerPlex saves primers that could be designed with more strict parameters and constructs primers only for other regions. On the other hand, this feature can be used to create personalized NGS-panels and expanding existing ones that are targeted for patient-specific genome regions, when primers for some of these regions have been obtained earlier that can not be done with other programs.

Non-target hybridization in genome and overlapping with SNPs
On-target is one of the main characteristics of targeted NGS-panels (Samorodnitsky et al., 2015) and it depends on whether the primers hybridize to non-target genome regions, including with substitutions/insertions/deletions as well as with other primers from the same multiplex reaction.
For doing it, NGS-PrimerPlex uses BWA program that can readily find all regions to which a nucleotide sequence can be mapped. For the similar regions (but not necessarily identical ones, because the user can search for primer targets with mismatches), the program checks if primer has the same last two nucleotides from 3'-end as non-target region, that is not performed by existing tools (e.g. MPD and hi-plex). This allows identifying regions that will really give nontarget amplicons but not only has homology with primers. Comparing genome coordinates for different primers, NGS-PrimerPlex filters out primer pairs that can lead to non-target amplification in multiplex reaction.
Another significant characteristic of NGS-panels is a uniformity of coverage (Samorodnitsky et al., 2015) which depends, among other things, on the sequence homogeneity of the sequence complementary to primer among different samples. Therefore, NGS-PrimerPlex checks if any of the primers designed overlaps with SNPs from the dbSNP database (Sherry et al., 2001) using pysam Python-module and dbSNP VCF-file. The user has an opportunity to check only part of primer for overlapping with SNP (e.g. only last 10 nucleotides that are especially important, as we showed in this work) and to define minimal frequency of SNP in population for which primers will be checked. Both features are unique for NGS-PrimerPlex.

Nested and anchored multiplex PCR
Nested multiplex PCR is applied in many areas of molecular biology where higher specificity is necessary, particularly during the detection of low-frequent somatic mutations in tumors (Zheng et al., 2014). It increases the specificity by selecting only amplicons which contain both internal and external primers (Supplementary Figure 3). NGS-PrimerPlex allows users to design primers for nested PCR with subsequent distribution of four primers among multiplex reactions considering both secondary structure and non-target product formation for internal and external primers.
Anchored multiplex PCR that applies one primer hybridizing to gene-specific region and one primer been complement to adapter sequence allows amplifying regions with unknown or partially highly variable sequence (Supplementary Figure 3), e.g. to detect gene fusion mutations without prior knowledge of the fusion partners (Schenk et al., 2017;Zheng et al., 2014). And NGS-PrimerPlex can also design primers for this type of multiplex PCR.

Distribution of primers among multiplex reactions
The final step of primer design is a distribution of the constructed primers among user defined number of multiplex reactions. NGS-PrimerPlex supports distribution of all primers into any number of multiplexes (unlike e.g. MPD that is capable of distributing into small multiplexes of about 2-15 primer pairs) as well as distribution of some regions into specific groups of multiplexes (e.g. when it is necessary to separate some genes into different multiplexes). The distribution is performed using networkx Python module which creates graph, edges of which mean two primer pairs not producing any secondary structures, not overlapping by the product, and not producing non-target amplicons. For joining primer pairs into set of primer pairs that amplify single extended genome region (e.g. whole exon), it searches for a shortest path from the start of such region to the end, both of which are also included into the graph. For the subsequent distribution, NGS-PrimerPlex tries to find clique in the constructed graph, i.e. such subset of vertices that every two distinct vertices in the clique are adjacent. Searching for clique is performed until all primer pairs are distributed.

NGS library preparation and sequencing
To confirm efficiency of NGS-panels designed with NGS-PrimerPlex tool, we applied it to sequence coding exons of LRRK2 gene, associated with Parkinson's disease (Di Maio et al., 2018). The LRRK2 gene includes 51 exons, all of which are protein-coding (7784 bp including two intron nucleotides near exon-intron junctions). DNA was extracted from the peripheral blood leukocytes as it was described earlier (Kechin et al., 2018). NGS libraries were prepared with inhouse amplicon-based approach using two-step amplification: (1) enrichment of target regions; (2) inclusion of adaptors. The libraries were sequenced with MiniSeq High Output kit (300 cycles). NGS-reads were analyzed with workflow that is similar to BRCA-analyzer's one (Kechin et al., 2018). NGS-reads were trimmed with Trimmomatic (Bolger et al., 2014)  3 R e s u l t s To evaluate throughput of NGS-PrimerPlex, we applied it to design primers for coding sequences of several sets of genes that are studied in clinics with existing commercial NGSpanels (Supplementary Table 3). For genes from Tumor15 panel, we used whole coding sequences of these genes regardless of which parts of them are clinically significant. We showed that NGS-PrimerPlex can design NGS-panels for a sizable number of genes and genome positions in reasonable time on computers with different performances. For other samples, median coverage of target regions was less than 30, likely due to low amoun or low quality of DNA used. One amplicon was absolutely uncovered, one amplicon w covered by at least 30 reads only for five samples, three amplicons had a high variation coverage between samples due to overlapping with SNPs (rs73097447 located at 2 nd nucleoti of primer from 3'-end: 68% of reference allele vs 32% of alternative allele, rs14294784 located at 4 th nucleotide of primer: 61% of reference allele vs 39% of alternative allele for hg1

Wet-lab validation
and rs711176013 located at 8 th nucleotide of primer: 41% of reference allele vs 59% alternative allele for hg19) (Figure 3). Thus, primers covering an SNP by -8 to -10 nucleoti from the 3'-end demonstrate a high variation in the amplification efficiency. Commonly, we should avoid primers covering SNPs. And NGS-PrimerPlex certainly has opportunity to turn it off by skipping -nucs argument. And then the program will not allow a SNPs under primers. However, this is not always possible, e.g. for regions that contain ma SNPs and should be studied in the whole. In this case, we can limit checking for SN overlapping to about 10 nucleotides from 3'-end and to SNPs with frequency more than 5%, th can be defined in NGS-PrimerPlex and is absent in other tools (e.g. MPD).
After this experiment, primers covering SNPs by at least one of 10 nucleotides from 3'-end primers were removed and redesigned with NGS-PrimerPlex using -draft and -snps paramete and by increasing -primernum1 parameter, leaving other primers the same.