In the last few years, we have developed multiple innovative high-throughput genotyping methods that includes NanoGBS and HD-GBS. We have also developed different approaches to maximize the efficacy of NGS-based genotyping methods via predictive models.

NanoGBS: A Miniaturized Procedure for GBS Library Preparation

High-throughput reduced-representation sequencing (RRS)-based genotyping methods, such as genotyping-by-sequencing (GBS), have provided attractive genotyping solutions in numerous species. Here, we present NanoGBS, a miniaturized and eco-friendly method for GBS library construction. Using acoustic droplet ejection (ADE) technology, NanoGBS libraries were constructed in tenfold smaller volumes compared to standard methods (StdGBS) and leading to a reduced use of plastics of up to 90%. A high-quality DNA library and SNP catalogue were obtained with extensive overlap (96%) in SNP loci and 100% agreement in genotype calls compared to the StdGBS dataset with a high level of accuracy (98.5%). A highly multiplexed pool of GBS libraries (768-plex) was sequenced on a single Ion Proton PI chip and yielded enough SNPs (~4K SNPs; 1.5 SNP per cM, on average) for many high-volume applications. Combining NanoGBS library preparation and increased multiplexing can dramatically reduce (72%) genotyping cost per sample. We believe that this approach will greatly facilitate the adoption of marker applications where extremely high throughputs are required and cost is still currently limiting.

High-density genotyping-by-sequencing (HD-GBS)

The aim of this study was to develop and establish an improved GBS approach (high-density GBS or HD-GBS) to significantly increase the density of detectable markers, using soybean as a test case. As it is a complexity-reduction method, GBS relies on sequencing the optimal number of fragments falling within a specific size range. Our objective in designing HD-GBS was to select the best combination of restriction enzymes in view of generating ~1 million fragments of 100–800 bp. 

Missing Data Imputation

The quality, statistical power, and resolution of genome-wide association studies (GWAS) are largely dependent on the comprehensiveness of genotypic data. Over the last few years, despite the constant decrease in the price of sequencing, whole-genome sequencing (WGS) of association panels comprising a large number of samples remains cost-prohibitive. Therefore, most GWAS populations are still genotyped using low-coverage genotyping methods resulting in incomplete datasets. Imputation of untyped variants is a powerful method to maximize the number of SNPs identified in study samples, it increases the power and resolution of GWAS and allows to integrate genotyping datasets obtained from various sources. Here, we describe the key concepts underlying imputation of untyped variants, including the architecture of reference panels, and review some of the associated challenges and how these can be addressed. We also discuss the need and available methods to rigorously assess the accuracy of imputed data prior to their use in any genetic study.