Abstract

Forest musk deer (Moschus berezovskii, FMD) is an endangered artiodactyl species, male FMD produce musk. We have sequenced the whole genome of FMD, completed the genomic assembly and annotation, and performed bioinformatic analyses. Our results showed that microsatellites (SSRs) displayed nonrandomly distribution in genomic regions, and SSR abundances were much higher in the intronic and intergenic regions compared to other genomic regions. Tri- and hexanucleotide perfect (P) SSRs predominated in coding regions (CDSs), whereas, tetra- and pentanucleotide P-SSRs were less abundant. Trifold P-SSRs had more GC-contents in the 5′-untranslated regions (5'UTRs) and CDSs than other genomic regions, whereas mononucleotide P-SSRs had the least GC-contents. The repeat copy numbers (RCN) of the same mono- to hexanucleotide P-SSRs had different distributions in different genomic regions. The RCN of trinucleotide P-SSRs had increased significantly in the CDSs compared to the transposable elements (TEs), intronic and intergenic regions. The analysis of coefficient of variability (CV) of P-SSRs showed that the RCN of mononucleotide P-SSRs had relative higher variation in different genomic regions, followed by the CV pattern of RCN: dinucleotide P-SSRs > trinucleotide P-SSRs > tetranucleotide P-SSRs > pentanucleotide P-SSRs > hexanucleotide P-SSRs. The CV variations of RCN of the same mono- to hexanucleotide P-SSRs were relative higher in the intron and intergenic regions, followed by that in the TEs, and the relative lower was in the 5'UTR, CDSs and 3'UTRs. 58 novel polymorphic SSR loci were detected based on genotyping DNA from 36 captive FMD and 22 SSR markers finally showed polymorphism, stability, and repetition.