Page 2 of 14
bioRxiv preprint doi: https://doi.org/10.1101/2020.01.30.927871; this version posted January 31, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. Cases of mild to severe illness, and death from the infection have been reported from Wuhan. This outbreak has spread rapidly distant nations including France, Australia and USA among others. The number of cases within and outside China are increasing steeply. Our current understanding is limited to the virus genome sequences and modest epidemiological and clinical data. Comprehensive analysis of the available 2019- nCoV sequences may provide important clues that may help advance our current understanding to manage the ongoing outbreak. The spike glycoprotein (S) of cornonavirus is cleaved into two subunits (S1 and S2). The S1 subunit helps in receptor binding and the S2 subunit facilitates membrane fusion (Bosch et al., 2003; Li, 2016). The spike glycoproteins of coronoviruses are important determinants of tissue tropism and host range. In addition the spike glycoproteins, are critical targets for vaccine development (Du et al., 2013). For this reason, the spike proteins,represent the most extensively studied among coronaviruses. We therefore sought to investigate the spike glycoprotein of the 2019-nCoV to understand its evolutiongmovel, features Sequence and structural features using computational tools. Methodology Retrieval and alignment of nucleic acid and protein sequences We retrieved all the available coronavirus \sequences (n=55) from NCBI viral genome database (https://www.nebi.nlm.nih.gov/)..and»*we used the GISAID (Elbe & Buckland-Merrett, 2017)[https://www.gisaid.org/] ‘to retrieve all available full-length sequences (n=28) of 2019- nCoV as on 27 Jan 2020: Multiple sequence alignment of all coronavirus genomes was performed by using MUSCLE software (Edgar, 2004) based on neighbour joining method. Out of 55 coronavirus genome 32 representative genomes of all category were used for phylogenetic tree development using MEGAX software (Kumar et al., 2018). The closest relative was found to be SARS CoV. The glycoprotein region of SARS CoV and 2019-nCoV were aligned and visualized using Multalin software (Corpet, 1988). The identified amino acid and nucleotide sequence were aligned with whole viral genome database using BLASTp and BLASTn. The conservation of the nucleotide and amino acid motifs in 28 clinical variants of 2019-nCoV genome were presented by performing multiple sequence alignment using MEGAX software. The three dimensional structure of 2019-nCoV glycoprotein was generated by using SWISS-MODEL online server (Biasini et al., 2014) and the structure was marked and visualized by using PyMol (DeLano, 2002). Methodology Results Gag Our phylogentic tree of full-length coronaviruses suggests that 2019-nCoV is closely related to SARS Cov [Fig1]]. In addition, other recent studies have linked the 2019- nCoV to SARS CoV. We therefore compared the spike glycoprotein sequences of the 2019-nCoV to that of the SARS CoV (NCBI Accession number: AY390556.1). On careful examination of the sequence alignment we found that the 2019- nCoV spike glycoprotein contains 4 insertions [Fig.2]. To further investigate if these inserts are present in any other corona virus, we performed a multiple computational tools. Uncanny similarity of novel inserts in the 2019-nCoV spike protein to HIV-1 gp120 and