杜鹃花属目标基因捕获探针设计与重测序数据应用; Design of Target Enrichment Probes and Application of Genome Resequencing Data for Rhododendron Phylogenomics
莫智琼
导师高连明
摘要Rhododendron L., the largest genus of seed plants in China, is a typical group that has undergone a rapid adaptive evolution and diversification in the Pliocene. For such cases it is always challenging to resolve the evolutionary relationships among its members. Several genome-partitioning strategies based on next-generation sequencing (NGS) technologies exist that provide opportunities to resolve intricate phylogenetic relationships of rapidly derived groups. In this study, a total of 43 species (including 3 varieties) representing different taxonomic categories in Rhododendron were selected for transcriptome sequencing (RNA-seq) and/or genome resequencing (Re-seq). Numerous orthologous genes were assembled and identified from transcriptome sequencing data and used for phylogenetic analysis. Based on the evolutionary rate and phylogenetic usefulness of the orthologous genes, a set of targeted genes suitable for phylogenetic study of Rhododendron was selected for designing probes to capture target genes from genomic DNA. In addition, the phylogeny of Rhododendron was reconstructed using nuclear and chloroplast genome data obtained from resequencing data. Additionally, the feasibility and potential of two genome-partitioning strategies, genome resequencing and targeted capture sequencing (Hyb-seq) technology, were compared and assessed. The main findings and conclusions of this study are summarized as follows: 1. Identification of orthologous genes from transcriptome and assessment of phylogenetic inference A total of 4555 orthologous genes were assembled and identified from the transcriptome data, from which 1820 data sets in total were generated by random sampling, each time increasing by 50 genes and each gene number repeating 20 times. The main topological structure of the Rhododendron phylogeny was determined according to the occurrence frequency of a phylogenetic tree with coalescence method of each data set. The results showed that two topologies, Topo1 and Topo2, were of highest frequencies, and showed little difference in the phylogenetic placement of Rhododendron scopulorum and R. lepidotum. Out of which, R. scopulorum is sister to R. lepidotum, and then sister to (R. primuliflorum-R. trichostomum)-(R. fastigiatum-(R. hippophaeoides-R. telmateium)) in Topo1, while R. scopulorum is sister to (R. lepidotum-((R. primuliflorum-R. trichostomum)-(R. fastigiatum-(R. hippophaeoides-R. telmateium)))) in Topo2. Analysis of positive selection was performed on the 4555 genes to explored whether genes under positive selection used for phylogeny reconstruction affected the phylogenetic inferences. The results showed that genes under positive selection affected phylogenetic inference at some nodes and their inclusion led to a high frequency of Topo2. Consequently and conversely, the frequency of Topo1 increased significantly after deleting genes under positive selection. Results of PHYPARTS for detecting concordance and conflict between gene trees and specified topologies showed that Topo1 was supported in more gene trees. This indicated that Topo1 may represent the true phylogenetic relationship among the Rhododendron species included in this analysis. This could be used as a phylogeny reference for selecting genes to design capture probes. 2. Phylogeny reconstruction and target gene selection for probe design Six data sets on account of trade-off between the number of genes and species included in the matrix were generated for phylogenetic inference using two tree building methods, concatenate and coalescence based, respectively. The phylogenetic inference of the six data sets based on the concatenate approach yielded the same phylogenetic trees with Topo1 with highly supported branches except for dataset B. While in the phylogenetic trees based on the coalescence approach, Topo1 was supported by data sets including more genes and missing gene in more species, while Topo2 was supported by data sets with fewer genes and missing gene in less species. However, the support values were very low for the nodes of conflict between Topo1 and Topo2. Trees based on the matrix and filtered matrix of the six data sets did not change phylogenetic relationships in both of the two tree-building approaches. In addition, genes assembly from mixed transcriptome data of multi-sample of the species may lead to erroneous phylogenetic inferences, and matrix filtering will result in differences in topology. Based on evolutionary rate of genes, gene sets were established by two sampling methods and then used to evaluated the phylogenetic usefulness for selecting target gene set for probe design. The results indicated that a robust phylogeny of Topo1 with strong support values for “sample268+232” method was obtained by combining genes with highly informative sites above 5% and randomly sampled conservative genes with low variation. Finally, the gene set generating a phylogeny with highest support value was selected as targeted genes for probe design. 3. Application and evaluation of phylogenetic inference on genome Re-seq data Single copy nuclear genes identified based on whole genome and transcriptome data of R. delavayi were used as reference for gene assembly from genome resequencing (Re-seq) data. A large number of nuclear orthologous genes were assembled and identified by using homologous tree methods. Reads of the plastid genome were de novo assembled into contigs from Re-seq data, and finally 66 chloroplast protein-coding genes were obtained. The phylogenetic relationships of Rhododendron species were reconstructed separately based on the nuclear and plastid genome data. The phylogeny of nuclear and plastid genome data from resequencing data showed the same phylogenetic topologies having same subgeneric/sectional (subsectional) relationships of Topo1, and the phylogenetic tree based on the nuclear gene data set was strongly supported. In summary, a large amount of genetic data from nuclear and plastid genomes can be obtained through Re-seq approaches, which can be used for phylogenetic inference analyses resulting in highly resolved trees for groups that experienced rapid radiative evolution. In addition, evaluation of Re-seq data from herbarium specimens with more than 40 years old were performed, a large number of orthologous genes were obtained which can be used to resolve the phylogenetic relationships of Rhododendron, indicating specimen materials could be used for genome Re-seq and phylogenetic inference. Therefore, genome resequencing will provide a promising prospect in plant phylogenomic studies in the future.
2020-05
文献类型学位论文
条目标识符http://ir.kib.ac.cn/handle/151853/74163
专题昆明植物所硕博研究生毕业学位论文
推荐引用方式
GB/T 7714
莫智琼. 杜鹃花属目标基因捕获探针设计与重测序数据应用, Design of Target Enrichment Probes and Application of Genome Resequencing Data for Rhododendron Phylogenomics[D],2020.
条目包含的文件
文件名称/大小 文献类型 版本类型 开放类型 使用许可
莫智琼-莫智琼-2017E8010661(8681KB)学位论文 限制开放CC BY-NC-SA请求全文
个性服务
推荐该条目
保存到收藏夹
查看访问统计
导出为Endnote文件
谷歌学术
谷歌学术中相似的文章
[莫智琼]的文章
百度学术
百度学术中相似的文章
[莫智琼]的文章
必应学术
必应学术中相似的文章
[莫智琼]的文章
相关权益政策
暂无数据
收藏/分享
所有评论 (0)
暂无评论
 

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。