Identification of Novel Missense Mutations in a Large Number of Recent SARS-CoV-2 Genome Sequences
Author(s): Hugh Y Cai, Kimberly K Cai, Julang Li
Background: SARS-CoV-2 infection has spread to over 200 countries since it was first reported in December of 2019. Significant country-specific variations in infection and mortality rate have been noted. Although country-specific differences in public health response have had a large impact on infection rate control, it is currently unclear as to whether evolution of the virus itself has also contributed to variations in infection and mortality rate. Previous studies on SARS-CoV-2 mutations were based on the analysis of ~ 160 SARS-CoV-2 sequences available until mid-February 2020. By mid-April, > 550 SARS-CoV-2 sequences had been deposited in GenBank, and over 8,200 in the GISAID database.
Methods: We performed a sequence analysis on 474 SARS-CoV-2 genomes submitted to GenBank up to April 11, 2020 by multiple alignment using Map to a Reference Assembly and Variants/SNP identification. The results were verified on a larger scale, 8,126 hCoV-19 (SARS-CoV-2) sequences from GISAID database.
Results: We identified 5 recently emerged mutations in many isolates (up to 40%). Our analysis highlights 5 frequent new mutations that have emerged since late February 2020. These mutations are: one each missense (non-synonymous) mutation in orf1ab (C1059T), orf3 (G25563T) and orf8 (C27964T), one in 5’UTR (C241T), one in a non-coding region (G29553A). The final mutation (G29553A) was found to be almost exclusive to the US isolates. The first 3 mutations are non-synonymous, leading to amino acid substitutions in the viral protein sequence. Except for C241T, all the novel mutations identified are absent in the isolates from Italy and Spain in the SARS-CoV-2 genomes deposited in GenBank and GISAID by April 13, 2020.
Conclusion: The results of current study indicate that new mutations are emerging as COVID-19 pandemic are spreading to diff