Page 8 of 14
bioRxiv preprint doi: https://doi.org/10.1101/2020.01.30.927871; this version posted January 31, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. The insertions were observed to be present in all the genomic sequences of 2019-nCoV virus available from the recent clinical isolates (Supplementary Figure 1). To know the source of these insertions in 2019-nCoV a local alignment was done with BLASTp using these insertions as query with all virus genome. Unexpectedly, all the insertions got aligned with Human immunodeficiency Virus-1 (HIV-1). Further analysis revealed that aligned sequences of HIV-1 with 2019-nCoV were derived from surface glycoprotein gp120 (amino acid sequence positions: 404-409, 462-467, 136- 150) and from Gag protein (366-384 amino acid) (Table 1). Gag protein of HIV@s involved in host membrane binding, packaging of the virus and for the formation of virus-like particlesyGp120 plays crucial role in recognizing the host cell by binding to the primaryreceptor CD4.This binding induces structural rearrangements in GP120, creating a.high affinity binding site for a chemokine co-receptor like CXCR4 and/or CCRS. Discussion The current outbreak of 2019-nCoV warrants a thorough investigation-and understanding of its ability to infect human beings. Keeping in mind that there has been‘a clear change in the preference of host from previous coronaviruses to this virus, we studied the change in spike protein between 2019-nCoV and other viruses. We found fourmew insertions in the S protein of 2019-nCoV when compared to its nearest relative, SARS-CoV. The genome sequence from the recent 28 clinical isolates showed that the sequence coding for these insertions are conserved amongst all these isolates. This indicates-that these insertions have been preferably acquired by the 2019-nCovV, providing it with additional survival and infectivity advantage. Delving deeper we found that these insertions were similar to HIV-1. Our results highlight an astonishing relation between the gp120 and Gag protein of HIV, with 2019-nCoV spike glycoprotein. These proteins are critical for the viruses to identify and latch on to their host cells and for viral assembly (Beniac et al., 2006). Since surface proteins are responsible for host tropism, changes in these proteins imply a change in host specificity of the virus. According to reports from China, there has been a gain of host specificity in case 2019-nCoV as the virus was originally known to infect animals and not humans but after the mutations, it has gained tropism to humans as well. Discussion Moving ahead, 3D modelling of the protein structure displayed that these insertions are present at the binding site of 2019-nCoV. Due to the presence of gp120 motifs in 2019-nCoV spike glycoprotein at its binding domain, we propose that these motif insertions could have provided an enhanced affinity towards host cell receptors. Further, this structural change might have also increased the range of host cells that 2019-nCoV can infect. To the best of our knowledge, the function of these motifs is still not clear in HIV and need to be explored. The exchange of genetic material among the viruses is well known and such critical exchange highlights the risk and the need to investigate the relations between seemingly unrelated virus families. Conclusions Our analysis of the spike glycoprotein of 2019-nCoV revealed several interesting findings: First, we identified 4 unique inserts in the 2019-nCoV spike glycoprotein that are not present in any other coronavirus reported till date. To our surprise, all the 4 inserts in the 2019-nCoV mapped to Insertions share similarity to HIV