Grapevine pangenome facilitates trait genetics and genomic breeding
Main
The cultivated grapevine (Vitis vinifera ssp. vinifera L.) is an economically important perennial fruit crop that is grown widely for winemaking and fresh fruit in ~94 countries. Previous studies have suggested that grapevine originated from a single domestication event in the Black and Caspian Sea regions more than 10,000 years ago, which subsequently spread across the northern hemisphere with gene flow from local wild populations. However, other studies have suggested the potential for multiple domestication events. Since domestication, grapevine cultivars have accumulated deleterious genomic variants, including single-nucleotide polymorphisms (SNPs) and SVs, in a heterozygous state, resulting in strong inbreeding depression. Recent studies have highlighted the potential contribution of hidden genomic variants, including SVs to phenotypes, but the quantitative genetic basis of complex agronomic traits in grapevine has rarely been investigated at the genome scale.Long-read sequencing technologies have revealed the prevalence of SVs in plant genomes. It is increasingly evident that SVs are more likely than SNPs to influence the phenotype of domestication traits. At the population level, SVs tend to occur at low frequencies, reflecting negative selection signals. Furthermore, the frequency of SVs may be related to their recent origin. For example, recent transposable element (TE) activity can generate new SVs that are initially present in only one individual or lineage. In part because of their low population frequencies, SVs are typically in low linkage disequilibrium (LD) with SNPs. One practical implication of low LD is that SVs may encompass substantial missing heritability for quantitative traits. Consistent with this viewpoint, the addition of SVs to population and quantitative genetic analyses has yielded new insights into local adaptation and agronomic traits.
Grapevine genomes are highly heterozygous, partly because of the accumulation of genetic variation during clonal propagation, which has been carried out for thousands of years. For example, the genomes of diploid Chardonnay and Cabernet Sauvignon contain more than 10% heterozygous sites including SNPs, insertion–deletions (indels) and SVs. Although the commonly used reference genome from PN40024 was highly homozygous after nine generations of selfing, it is missing >10% of genes compared with heterozygous cultivars. Across cultivars, only ~7% of the genes are shared, whereas ~8% are unique to each individual. The high level of variability in grapevine merits the construction of a pangenome reference that incorporates presence–absence variation, improves the detection of genomic variants, including SV, and reduces reference biases.
Here we assembled 18 haplotype-resolved telomere-to-telomere (T2T) assemblies representing eight diploid grapevine cultivars and one diploid wild grape. We then constructed a graph-based pangenome, which we call Grapepan v.1.0, using these new assemblies and 11 previously published chromosomal assemblies. These genotypes represent the global genetic diversity of grapes. Using Grapepan v.1.0, we built a variation map that includes SNPs, indels (2 bp ≤ indel < 50 bp) and SVs (≥50 bp) across a larger sample of 466 accessions, including 324 that were newly sequenced. We utilized this variation map in a genome-wide association study (GWAS) and the genomic prediction of 29 complex agronomic traits. This exercise identified quantitative trait loci (QTLs) for these agronomic traits, provided unique insights into the contribution of SVs to quantitative genetic variation and demonstrated the feasibility of breeding superior cultivars via genomic selection for multiple traits. The pangenome reference (Grapepan v.1.0), variation map, QTLs and our genomic selection models facilitate genomic breeding of grapevine.
Discussion
Accelerating the innovation of grape varieties is urgently required to adapt to future planting, rapidly changing market demands and climate change. Grape breeding exhibits a degree of reliance on older varieties; in particular, clonal reproduction allows the preservation of genotypes over extended periods, some of which are older than 900 years. Advances in grapevine breeding lag far behind those made in annual cereal crops because of their long generation times (~3 years on average), high deleterious burden that leads to inbreeding and/or hybrid depression, high genomic heterozygosity, inefficient genetic transformations and limited knowledge about the genetic basis of complex agronomic traits.Progress in understanding the complexity of the grapevine over the past two decades, from phenotypic characterization to marker identification and association analysis, has greatly benefited breeding efforts. Early breeding emphasized correlation analysis between phenotypic traits and low-density genetic markers, and selected phenotypic traits through marker-assisted selection. Using association analyses, researchers have linked specific genetic variations to desirable phenotypic traits, providing breeders with valuable tools for the targeted selection of multiple phenotypes, including berry size, color and sugar content. These efforts have led to significant contributions such as the development of seedless grape varieties and the enhancement of disease resistance in grapevines. However, limitations inherent in detecting variations within a single reference genome hinder the identification of crucial variants associated with breeding traits and a comprehensive analysis of agronomic trait inheritance.
Advanced pangenome-based approaches underscore broader efforts aimed at discovering genetic variants in crop breeding. Recent research has focused on North American wild grapevines and has established a nonreference pangenome inclusive of nine wild accessions. Their sequencing encompasses the diversity of wild grapevine species, aiming to integrate resistance variants from wild species for use in rootstock improvement. By contrast, our pangenome (Grapepan v.1.0) focuses on discovering variants associated with agronomic traits in domesticated grapevines. We selected representative cultivated varieties to construct the pangenome. We also included table grape varieties to expand diversity across grape populations with different uses. Therefore, our pangenome may contain more advantageous genotypes related to domesticated traits, thus directly serving breeding programs. We utilized a graph-based approach in which any variant is integrated as a node within the pangenome reference. Indeed, the most significant enhancement of the pangenome lies in the discovery of SVs. The number of newly discovered SNPs differs only slightly compared with alignment with a single reference genome or previous pangenome versions. Thus, our grape pangenome places greater emphasis on uncovering traits associated with SVs and revealing their inheritance patterns.
In Grapepan v.1.0, SVs often associated with repetitive sequences and TEs, suggesting that TE-mediated events are an important evolutionary force. The low frequency of SVs in the grapevine genome can be attributed to recent TE activity and the evolutionary constraints imposed by natural selection. This poses challenges in precisely controlling the breeding process when relying solely on SNP for trait selection. This challenge is exacerbated by the incomplete capture of heritability for multiple traits, particularly from SVs, which might be related to two factors. First, LD decay can influence the resolution of genetic mapping and the identification of causal variants. Second, SVs are often found to generate and explain a greater proportion of phenotypic variation in numerous traits compared with SNPs. The rarity of SVs also makes it difficult to accurately estimate their frequency and effect size within a population. Consequently, the statistical power to detect associations involving rare SVs is lower than that for SNPs. In addition, SVs are larger relative to SNPs and can engender more immediate functional consequences, such as perturbations in gene dosage or the disruption of critical gene regulatory elements. For example, SVs contribute the largest share of heritability for approximately half of the molecular traits in tomatoes, the identification of SVs based on pangenome has greatly increased estimates of the heritability of metabolic traits. In foxtail millet, the precision of 73.9% of traits with both SNP and SV markers increased by between 0.04% and 12.67% compared with SNP-only markers. Fruit color serves as a key trait in grape breeding, renowned for its association with SV determination. We confirmed the higher heritability in fruit color contributed by SV and emphasized the power and accuracy of SV-based GWAS and genomic selection. We have found that the inheritance of an isoamylase gene associated with a 5.6-kb deletion explained 6.23% of the variance in SSC. Collectively, a deep understanding of SVs based on the pangenome will greatly improve the efficiency of SV-associated analysis for grapevine breeding.
Results
The graph pangenome reference for grapevine (Grapepan v.1.0)HiFi reads, Hi-C reads and ultra-long nanopore reads were collected for nine representative diploid samples, including one accession of Vitis retordii, a wild species endemic to Asia, and eight grapevine cultivars (seven table grapes and one wine grape). The nine samples resulted in 18 haplotypes that reached T2T-level assembly after gap filling. Genome sizes ranged from 479.15 to 539.30 Mb . The quality of haplotype assembly was confirmed by high contiguity (>99.9%), minimal switching error (<0.05%) and low Hamming error (<2.83%). Benchmarking universal single-copy orthologs evaluation indicated an average completeness of 98.4% for these haplotypes (range 98.07% to 98.64%). We used the same pipeline to annotate all haplotypes and to ensure consistent results. Across the 18 haplotypes, the number of protein-coding genes ranged from 34,536 to 38,526, and the TE sequence length per haplotype ranged from 263.86 Mb (54.68%) to 312.10 Mb (59.03%). In addition, we identified centromere and telomere sequences in all assemblies. Consistent with previous studies, the predominant repeat unit of the centromere was 107 bp long. Overall, these 18 assembled haplotypes and their annotations represent one of the highest-quality grapevine genomic datasets generated to date.
International Conference on Genetics and Genomics of Diseases