| 英文摘要 |
Betula platyphylla is a broadleaved deciduous hardwood tree species belonging to the genus Betula, in the family Betulaceae and can be found in temperate or subarctic regions of Asia. The hoary bark of this tree is marked with long, horizontal lenticels, and it often separates into thin, papery plates, which is the most typical characteristic of this tree species. B. platyphylla is an important afforestation and timber tree species in northern China, and it also has high medicinal value, which has been concerned by researchers. However, there is still a lack of complete genomic information of B. platyphylla, which severely constrains the progress of relevant research. Therefore, this study will use the next- and third-generation sequencing to survey the genome of B. platyphylla, assemble and annotate its organelles and nuclear genome, and analyze their characteristics. Finally, complete genome information of B. platyphylla will be obtained, which will lay a foundation for molecular and breeding research of this tree species. At the same time, based on the high-performance computer cluster of Northeast Forestry University, a pipeline suitable for high-heterozygous tree genome assembly and annotation was developed to help the analysis of other forest tree genomes in the future. Genome survey showed that the genome size of B. platyphylla was about 432.9 Mb, the hybridization rate was about 1.22% and the repeat sequence content was about 47.9%. Undoubtedly, it belongs to the highly polymorphic genome. At the same time, it was found that the leaves obtained from the wild contained bacteria pollution, which might affect the sequencing results. Therefore, I finally decided to use the aseptic tissue culture seedlings of B. platyphylla as the test material, and complete the whole genome sequencing through the next-and third-generation sequencing. The complete chloroplast genome of B. platyphylla was 160,518 bp in length, which included a pair of inverted repeats (IRs) of 26,056 bp that separated a large single copy (LSC) region of 89,397 bp and a small single copy (SSC) region of 19,009 bp. The annotation contained a total of 129 genes, including 84 protein-coding genes, 37 tRNA genes and 8 rRNA genes. There were 3 genes using alternative initiation codons. Comparative genomics showed that the sequence of the Fagales species chloroplast genome was relatively conserved, but there were still some high variation regions that could be used as molecular markers. RNA editing sites recognition indicated that at least 80 RNA editing events occurred in the chloroplast genome. Most of the substitutions were C to U, while a small proportion of them were not. In particular, three editing loci on the rRNA were converted to more than two other bases that had never been reported. For synonymous conversion, most of them increased the relative synonymous codon usage (RSCU) value of the codons. The phylogenetic analysis suggested that B. platyphylla had a closer evolutionary relationship with B. pendula than B. nana. The complete mitochondrial genome of B. platyphylla was 581,539 bp in length and the GC content was 45.5%. A total of 65 genes were annotated, including 40 protein-coding genes, 22 tRNA genes and 3 rRNA genes. Repeat sequence analysis showed that there were 96 interspersed repeat sequences in the genome, including 43 forward and 53 palindromic repeat sequences. The results of genome comparison showed that the mitochondrial genome of B. platyphylla had good collinearity with the related species B. pendula. A total of 475 RNA editing sites were identified in B. platyphylla mitochondrion, far more than chloroplast. Collinearity analysis showed that five long fragments of the mitochondria genome came from chloroplast genome, accounting for 4.2% of the total length. The nuclear genome of B. platyphylla had a total size of 430.4 Mb, comprising 1,540 contigs with an N50 contig size of 754.6 kb, and a GC content of 35.7%. With the help of the female, male, and F₁ population genetic maps, 91.3% contigs were mounted onto 14 pseudochromosomes. The identified repeat sequences accounted for 50.54 % of the entire genome, most of them were transposable elements (TEs). A total of 512 tRNA genes and 265 rRNA genes were obtained from non-coding RNA annotation. I further identified 31,578 genes using my pipeline, of which 27,965 (88.6%) were functionally annotated in public databases. The average length of gene region and CDS region were 4,229 bp and 1,089 bp respectively. On average, each gene contained 4.78 exons, and 7,086 alternative splicing genes were identified. BUSCO analyses using annotated protein sequence identified 94.2% complete genes. Collinearity analysis showed that the genomes of B. platyphylla, Vitis vinifera and Populus trichocarpa had good collinearity at the chromosome level, some typical syntenic regions in B. platyphylla and V vinifera matched up to double regions in P. trichocarpa. Whole genome duplication analysis confirmed that B. platyphylla experienced a whole-genome triplication (WGT-γ) and no recent whole-genome duplication (WGD) occurred. The phylogenetic analysis suggested that B. platyphylla and B. pendula were sister species and separated about 2.6 million years ago (Mya). Keywords Betula platyphylla; chloroplast; mitochondrion; genome
|