A pangenome analysis pipeline provides insights into functional gene identification in rice
Jian Wang†, Wu Yang†, Shaohong Zhang, Haifei Hu* , Yuxuan Yuan, Jingfang Dong, Luo Chen,Yamei Ma, Tifeng Yang, Lian Zhou, Jiansong Chen, Bin Liu, Chengdao Li* , David Edwards* and Junliang Zhao*
Genome Biology
Abstract
Background: A pangenome
aims to capture the complete genetic diversity within a species and
reduce bias in genetic analysis inherent in using a single reference
genome. However, the current
linear format of most plant pangenomes limits the presentation of
position information for novel sequences. Graph pangenomes have been
developed to overcome this limitation. However, bioinformatics analysis
tools for graph format genomes are lacking.
Results: To overcome this
problem, we develop a novel strategy for pangenome construction and a
downstream pangenome analysis pipeline (PSVCP) that captures genetic
variants’ position information while maintaining a linearized layout.
Using PSVCP, we construct a high-quality rice pangenome using 12
representative rice genomes and analyze an international rice panel with
413 diverse accessions using the pangenome as the reference. We show
that PSVCP successfully identifies causal structural variations for rice
grain weight and plant height. Our results provide insights into rice
population structure and genomic diversity. We characterize a new locus
(qPH8-1) associated with plant height on chromosome 8 undetected by the
SNP-based genome-wide association study (GWAS).
Conclusions: Our results
demonstrate that the pangenome constructed by our pipeline combined with
a presence and absence variation-based GWAS can provide additional
power for genomic and genetic analysis. The pangenome constructed in
this study and the associated genome sequence and genetic variants data
provide valuable genomic resources for rice genomicsresearch and
improvement in future.
Keywords: Pangenome, Presence/absence variation, Genomic diversity, PAV-based GWAS