• 《白桦全基因组测序及分析》
  • 作者:王遂著
  • 单位:东北林业大学
  • 论文名称 白桦全基因组测序及分析
    作者 王遂著
    学科 林木遗传育种学
    学位授予单位 东北林业大学
    导师 杨传平,曲冠证指导
    出版年份 2019
    中文摘要 白桦(Betula platyphylla)是桦木科(Betulaceae)桦木属(Betula)的落叶阔叶树种,主要生长在亚洲温带和寒温带地区,最典型的特征是纸张状剥离的灰白色树皮和长长的横纹样皮孔。白桦是我国北方重要的绿化和用材树种,同时还具有很高的药用价值,一直被科研人员所关注。然而,白桦目前还缺乏完整的基因组信息,这严重制约了相关研究的进展。因此,本研究将利用二代和三代测序技术,对白桦基因组进行调研,分别对其细胞器和细胞核基因组进行组装和注释,并分析其特征,最终得到较为完整的白桦全基因组信息,为白桦分子及育种研究奠定基础;同时基于东北林业大学高性能计算机集群,开发一套适合于高杂合林木基因组的拼接注释流程,为今后其他林木基因组的解析提供帮助。 基因组调研显示,白桦基因组大小约为432.9 Mb,杂合率约为1.22%,重复序列含量约为47.9%,属于高杂合基因组。同时发现野外取材的白桦叶片含有细菌污染,可能会对测序结果产生影响。因此最终决定采用无菌的白桦组培苗作为试材,通过二代和三代测序技术完成白桦全基因组测序。 叶绿体基因组分析表明,白桦叶绿体基因组全长160,518 bp,包含一对长26,056 bp的反向重复序列(IRs)和被它们隔开的89,397 bp的长单拷贝(LSC)和19,009 bp的短单拷贝(SSC)片段。整个叶绿体基因组共注释得到129个基因,包含84个编码蛋白的基因,37个tRNA基因和8个rRNA基因。在编码蛋白的基因中,有3个使用了非ATG起始密码子。比较基因组学显示,壳斗目物种叶绿体基因组相对保守,但也存在一些变异热点区域,可以用于设计分子标记。RNA编辑位点识别表明,白桦叶绿体中至少有80处RNA编辑事件发生,其中大多数为C到U的转变,而少部分不是。特别是3个rRNA上的位点可以被编辑成2个以上不同的碱基,这在以往的研究中从未被报道过。对那些不改变氨基酸的同义编辑,其相对同义密码子使用度(RSCU)均有所提高。系统演化分析表明,与矮小桦(B.nana)相比,白桦和银桦(B.pendula)有着更近的亲缘关系。 线粒体基因组分析则显示,白桦线粒体基因组全长581,539 bp,GC含量为45.5%。其上共注释得到了65个基因,其中编码蛋白的基因40个,tRNA基因22个,rRNA基因3个。重复序列分析表明,白桦线粒体基因组上有96个长散在重复序列,其中包括43个正向(forward)和53个回文(palindromic)重复序列。基因组比较的结果显示,白桦线粒体基因组与近缘种银桦线粒体基因组具有良好的共线性。白桦线粒体中共识别出475处RNA编辑位点,远多于叶绿体。共线性分析显示,白桦线粒体基因组中有5个长片段区块来自叶绿体,占总长度的4.2%。 白桦核基因组分析表明,共装配出contigs 1,540条,总计430.4 Mb,contig N50为754.6 kb,GC含量为35.7%。利用子代和双亲的遗传图谱信息,将91.3%的contigs挂载到14条假染色体上。重复序列分析表明,白桦核基因组中的重复序列占50.54%,其中以转座子为主。非编码RNA注释共得到tRNA基因512个,rRNA基因265个。功能基因结构注释显示,在白桦核基因组上,共注释到编码蛋白的基因31,578个,基因区平均长度为4,229 bp,CDS平均长度为1,089 bp,每个基因平均含有4.78个外显子,鉴定出存在可变剪切现象的基因7,086个。BUSCO检测结果显示,94.2%的基因在白桦注释结果中被完整覆盖。基因功能注释结果表明,有27,965个基因得到了注释,占总数的88.6%。共线性分析显示,白桦与葡萄(Yitis vinifera)和毛果杨(Populus trichocarpa)基因组在染色体水平上有良好的共线性,且白桦和葡萄的一些典型共线性区域在毛果杨染色体上存在2个对应区域。全基因组复制分析进一步显示,白桦与葡萄一样,在被子植物形成后,仅经历了1次全基因组三倍化事件。系统演化分析则表明,白桦与银桦亲缘关系很近,两者大约于2.6 Mya分开。 关键词 白桦;叶绿体;线粒体;基因组
    英文摘要 Betula platyphylla is a broadleaved deciduous hardwood tree species belonging to the genus Betula, in the family Betulaceae and can be found in temperate or subarctic regions of Asia. The hoary bark of this tree is marked with long, horizontal lenticels, and it often separates into thin, papery plates, which is the most typical characteristic of this tree species. B. platyphylla is an important afforestation and timber tree species in northern China, and it also has high medicinal value, which has been concerned by researchers. However, there is still a lack of complete genomic information of B. platyphylla, which severely constrains the progress of relevant research. Therefore, this study will use the next- and third-generation sequencing to survey the genome of B. platyphylla, assemble and annotate its organelles and nuclear genome, and analyze their characteristics. Finally, complete genome information of B. platyphylla will be obtained, which will lay a foundation for molecular and breeding research of this tree species. At the same time, based on the high-performance computer cluster of Northeast Forestry University, a pipeline suitable for high-heterozygous tree genome assembly and annotation was developed to help the analysis of other forest tree genomes in the future. Genome survey showed that the genome size of B. platyphylla was about 432.9 Mb, the hybridization rate was about 1.22% and the repeat sequence content was about 47.9%. Undoubtedly, it belongs to the highly polymorphic genome. At the same time, it was found that the leaves obtained from the wild contained bacteria pollution, which might affect the sequencing results. Therefore, I finally decided to use the aseptic tissue culture seedlings of B. platyphylla as the test material, and complete the whole genome sequencing through the next-and third-generation sequencing. The complete chloroplast genome of B. platyphylla was 160,518 bp in length, which included a pair of inverted repeats (IRs) of 26,056 bp that separated a large single copy (LSC) region of 89,397 bp and a small single copy (SSC) region of 19,009 bp. The annotation contained a total of 129 genes, including 84 protein-coding genes, 37 tRNA genes and 8 rRNA genes. There were 3 genes using alternative initiation codons. Comparative genomics showed that the sequence of the Fagales species chloroplast genome was relatively conserved, but there were still some high variation regions that could be used as molecular markers. RNA editing sites recognition indicated that at least 80 RNA editing events occurred in the chloroplast genome. Most of the substitutions were C to U, while a small proportion of them were not. In particular, three editing loci on the rRNA were converted to more than two other bases that had never been reported. For synonymous conversion, most of them increased the relative synonymous codon usage (RSCU) value of the codons. The phylogenetic analysis suggested that B. platyphylla had a closer evolutionary relationship with B. pendula than B. nana. The complete mitochondrial genome of B. platyphylla was 581,539 bp in length and the GC content was 45.5%. A total of 65 genes were annotated, including 40 protein-coding genes, 22 tRNA genes and 3 rRNA genes. Repeat sequence analysis showed that there were 96 interspersed repeat sequences in the genome, including 43 forward and 53 palindromic repeat sequences. The results of genome comparison showed that the mitochondrial genome of B. platyphylla had good collinearity with the related species B. pendula. A total of 475 RNA editing sites were identified in B. platyphylla mitochondrion, far more than chloroplast. Collinearity analysis showed that five long fragments of the mitochondria genome came from chloroplast genome, accounting for 4.2% of the total length. The nuclear genome of B. platyphylla had a total size of 430.4 Mb, comprising 1,540 contigs with an N50 contig size of 754.6 kb, and a GC content of 35.7%. With the help of the female, male, and F₁ population genetic maps, 91.3% contigs were mounted onto 14 pseudochromosomes. The identified repeat sequences accounted for 50.54 % of the entire genome, most of them were transposable elements (TEs). A total of 512 tRNA genes and 265 rRNA genes were obtained from non-coding RNA annotation. I further identified 31,578 genes using my pipeline, of which 27,965 (88.6%) were functionally annotated in public databases. The average length of gene region and CDS region were 4,229 bp and 1,089 bp respectively. On average, each gene contained 4.78 exons, and 7,086 alternative splicing genes were identified. BUSCO analyses using annotated protein sequence identified 94.2% complete genes. Collinearity analysis showed that the genomes of B. platyphylla, Vitis vinifera and Populus trichocarpa had good collinearity at the chromosome level, some typical syntenic regions in B. platyphylla and V vinifera matched up to double regions in P. trichocarpa. Whole genome duplication analysis confirmed that B. platyphylla experienced a whole-genome triplication (WGT-γ) and no recent whole-genome duplication (WGD) occurred. The phylogenetic analysis suggested that B. platyphylla and B. pendula were sister species and separated about 2.6 million years ago (Mya). Keywords Betula platyphylla; chloroplast; mitochondrion; genome
    鸥维数据云查询平台
      联系我们
    • 电话:400-139-8015
    • 微信:vbeiyou
    • 邮箱:ovo@qudong.com
    • 总部:北京市海淀区学院路30号科群大厦西楼5层
    Copyright © 西北大学西部大数据研究院旗下“鸥维数据” 京ICP备17065155号-6