Page 4 of 14
bioRxiv preprint doi: https://doi.org/10.1101/2020.01.30.927871; this version posted January 31, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. value. This analysis involved 5 amino acid sequences. There were a total of 1387 positions in the final dataset. Evolutionary analyses were conducted in MEGA X. 1 10 20 30 40 50 60 70 80 90 100 110 120 130 a REDNPVLPFNDGYYFASTEKSNEIRGHIFGTTLOSKTOSLLIVNNATNY DNPYIPFKOGLYF AATEKSNVYRGHVFGS THNNKSQSVITINNSTNY PViPFnDGtYEAaTEKSN! !RGMIFGsTS$#nKsQS1il!NNaTNV asinsert do [nawnnwnntonmannnnnpnnnnnnnnntn———nnnnm tan 2019-nCoY MF YFLYLLPLYSSQCYNLTTRTQ--LPPAY TN--SF TRGYY YPDKVFRSSVLHSTQDLFLPFFSNYTHFHALHYS SARS-G202 NMFIFLLFLTLTSGSDLORCTTFODVQAPNY TQHTSSHRGYY YPDETFRSDTL YL TQDLFLPFYSNVTGFHTINH Consensus HF !FLIILpLtSgqdl treirf#. .qaPaYT#, .SfmRGVYYPDe! FRSdtLh1 TQDLFLPFASNYTehaInhg Insert 2 [n--aa-==4———— =~ 9p teeny == 4---- === == 4 = ee 131140 150 igo 170 180 190 200 210 220 230 240 mannan naan tannana === 4 Mannan pan 2019-nCo¥ VIKYCEFOFCNOPFLG SEF RSG E¥V SPELL EG QGHF REF VEY SHITPINL ROL BGG SL EPL VL PUGINITRE TLL ALARSYL SARS~G202 VIRACHFELCONPFFA ;TQTHTHIF DNAFNCTFEYISOAPSLOVSEKSGNF KHL REEYFKNKOGEL YVYKGYOPTDYYROL PSGFNTLKPIFKLPLGINITNFRATLT-——AFLP Consensus VIraC#F#1Cs#PFla bntsefr !ZdnAinNCTFEY (S#aF 190 LeeKqGNFKnl REF VFKNiDGAIk ! YkghqPT#1VROLPqGFnal ePifdl PiGINITrFraiLa. ..a%Lpaq 391 400 410 420 430 440 450 60 470 480 430 500 510 520 |—--— e-paper ene ten ened 2019-nCols) | NDLCF THY YADSFYIRGDEVROTAPGOTGKIADYNYKLPODF TGCVIAMNSNNL DSKVGGNYNYLYRUFRKSNLKPFERDISTETYQAGSTPCNGVEGFNCYFPL OSYGFQPTNGYGYOPYRYYYLSFE SARS-G202 LNDLCFSNYYADSFYYKGDDVROT@PGOTGYIADYNYKL PODFHGCVLAMNTRNIDATSTGNYNYKYRYLRHGKLRPFERDISNVPFSPDGKPCTP-PALNCYHPL.NDYGFYTTTGIGYOPYRVVYL SFE Consensus LNDLCFsH¥YADSFY frGD#VROIAPGOTGkIADYNYKL PDDFnGCYiAHbisr Mi DaksgGNYNY1LYRLIRhgnL rPFERDISnei ZqadgkPCng.ealNCYFPL #dYGFqpTnG! GYOPYRVYVLSFE 521) 530 540 550 560 570 580 590 610 620 630 640 650 |=-—sebnet————=— Se. anes a Consensus AGCLIGAEHY ¢nSYECDIPIGAGICASY4 781 790 800 810 820 830 840 850 860 870 880 890 900 910 261 © 270 280 290 300 310 320 330 340 350 360 370 380 390 feennaena denna nn nnn pnn manna naif m anna Sh banana nn nt n nnn natn nnn nape oh Henna nnn nn tnnnnnnnnntnnnnenaa=| 2019-nCo¥ |GHTAGAAAYYYGYLOQPRTFLLKYNENGTI (OAVOCALDPL SETKCTLKSE TVEKGIYGTSNFRVGPTESIVRFPNITNL CPF GE YF WATRRASYYAMNRKRISNCYADYSVL YNSASFSTFKCYGYSPTK SARS-G202_|-}iGTSAAAYFYGYLKPTTEMUKYOENGT IT ORYOCSQNPURELKCSVKSPETDKGLYQT SNF RVVPSRDVVRFPNITHL CPF GEVENGIKF PSYYAMERKRISNCYADYSVL YNSTFFSTFKCYGYSATK 2019-nboY | LLHRPATVEGPRKSTIL VKNKCVNFNFNGL TGTS¥I YESNKKPLPF OOF GROTADT TOAVROPOTLETLOITPCSF GGVSVITPGTNTSNOVAVL YODVNGTEVPYATIINOOL TPTARVYSTGSNVFOTR SARS-G202, LLNAPATYCGPKLSTOLTKNOCYNFNFNGLTGTGVLTPSSKRF QPF OOF GROVSDF TDSYROPKTSEIL DISPCSFGGYSVITPGTNASSEVAVL YQDVNCTDYSTAIHADQL TPAHRIYSTGNNYFOTQ Consensus) LLnAPATYCGPKIST#L IRGC NFAT Ter PF OOF GRD Jaf TaYRDPqTLETLDIsPCSFGGYSVITPGTNaSn#VAVL YODYNCT #¥ptATHADOL TPaHR! YSTGANYFOTr eapsert 740 750 760 770 780 ee ee a a 2019-nCo¥ NTQEVFAQVKOLYKTPPIKOF GGF NF SQILPOPSKPSKRSF IEDLLFNKYTLADAGFIKQYGDCLGDIAARDL ICAQKFNGLTYLPPLL TDEMIAQYTSALLAGTITSGHTFGAGAALOIPFANQHAYRF SARS-G202 _NTREVFAQVKOMYKTPTLKOF GGFNFSQILPOPLKPTKRSF IEDLLFNKYTLADAGFMKQYGECLGDINARDL ICAQKFNGL TVLPPLL TODMIARY TAAL VSGTATAGHTFGAGAALOIPFANQHAYRE Consensus _NTrEVFAQVKOiYKTPpiKOFGGFNFSQILPOPIKPsKRSF IEDLLFNKYTLADAGF iKQYG#CLGDI@aARDL ICAQKFNGL TVLPPLL TD#HIAaY Taf laGTaTaGHTFGAGAALOIPFANQHAYRE Consensus igaghAAYZVGYL qPeTP tL Ky sENGT I TORYRGaq#PL aE IKE s1KSFe! sKGEYOTSNFRVgPsrd! VRFPNLINLEPFGEYPNA I rF aSVYAH#RKRISNCYADYSVL YNSaf FSTFKCYGYSaTK 651 660 870 690 700 +N. ne |-------- 2019-nCoV AGCLIGAEHYNNSYECDIPIGAGICASYQTNTNSPRRARSVASQSIIAY THSL GAENSVAYSNNSIAIP THF TISYTTEILPYSHTKTSYDCTHYICGDSTECSNLLLOQYGSFCTQLNRAL TGIAVEQDK SARS-G202 AGCLIGAEHVDTSYECDIPIGAGICASYH sTSQKSIVAY THSLGADSSIAYSNNTIAIPTNFSISITTEVMPYSHAKTSYOCNNYICGDSTECANLLLQYGSFCTOLNRALSGIAREQDR StaqqS1! AYTMSLGA#nS ! AYSNNSIAIPTNF SIS! TTE! $PYSMaKTSVDCaMYICGDSTECaNLLLQYGSFCTQLNRALSGIAAEQDr g11 920 930 940 950 960 970 980 990 1000 1010 1020 1030 1040 ee ee ee a | 2019-nCoV NGIGYTONVL YENOKLIANQFNSATGKIQDSLSSTASAL GKLQDVYNQNAQALNTLYKQL SSNFGATSSVLNDIL SRLDKVEREVQIORL ITGRLQSLOTYVTOQL IRARETRASANLARTKMSECYLGO SARS-G202 NGIGVTQNVL YENOQKQIANQFNKAISQIQESL TTTSTALGKL QDVVNQNAQALNTL VKQL SSNFGATSSVLNOILSRLOKVEREVQIORL ITGRLQSLQTYVTQQL IRARETRASANLAATKNSECYLGQ Consensus NGIGYTQNVL YENQKqIANQFNkALgqIQ#SLssTasALGKLQDVVNQNAQALNTL VKQL SSNFGATSSVLNDIL SRLOKVEREVQIDRL ITGRLOQSLOQTYVTQQLIRAREIRASANLAATKNSECYLGQ 1041 1050 1060 1070 1080 10390 1100 1110 1120 1130 1140 1150 1160 = |--------4---------+---------4---------+---------+----. nata-na---===1 2019-nCo¥ = SKRVDFCGKGYHLMSFPQSAPHGYVFLHYTYVPAQEKNF T TAPALCHDGKAHF PREGVF VSNGTHHF YTQRNF YEPQIIT TONTFYSGNCDVVIGIVNNTVYDPL QPEL DSFKEELDKYFKNHTSPDVDL SARS-G202 SKRVDFCGKGYHLMSFPQAAPHGYVFLHYTYVPSQERNF T TAPALCHEGKAYF PREGVF VFNGTSHF ITQRNFFSPQIITTONTFYSGNCOVVIGIINNTVYDPL QPEL OSFKEELDKYFKNHTSPDVOL Consensus SKRVDFCGKGYHLMSFPQaAPHGYYFLHYTYVPaQErNF TTAPAICH#GKAhFPREGYFYENGTHHF ! TORNF ZePQLITTONTFYSGNCDVYIGI ! NNTYYDPL QPEL DSFKEELOKYFKNHTSPDVDL Se ee a a ee ai74 1180 1190 1200 1210 1220 1230 1240 1250 1260 12701277 onan nnn nn nn nn tn nnn nnn nnn nnn nn penn nnn nnn tennant nnn nnn nen nn nn nent nnn nnn tonne nnn nntennnn= | 2019-nCoV 60: ‘SGINASYYNIOQKE TDRLNEVAKNLNESL IDL QELGKYEQYIKMPHY IML GF IAGL IAL VMYTIMLCCHTSCCSCLKGCCSCGSCCKFDEDDSEPYLKGYKLHYT SARS-G202 GDISGINASYYNIQEETDRLNEVAKNLNESL IDL QELGKYEQYIKHPHYVHL GF IAGL IATVMVTILLCCHTSCCSCLKGACSCGSCCKFDEDDSEPYLKGVKLHYT Consensus GDISGINASVYNIQeEIDRLNEVAKNLNESL IDL QELGKYEQYIKHPHY ! HLGF AGL IATVMVTI$LCCHTSCCSCLKGaCSCGSCCKFDEDDSEPYLKGVKLHYT Figure 2: Multiple sequence alignment between spike proteins of 2019-nCoV and SARS. The sequences of spike proteins of 2019-nCoV (Wuhan-HU-1, Accession NC_045512) and of SARS CoV (GZ02, Accession AY390556) were aligned using MultiAlin software. The sites of difference are highlighted in boxes. We then analyzed all available full-length sequences (n=28) of 2019-nCoV in GISAID (Elbe & Buckland-Merrett, 2017) as on January 27, 2020 for the presence of these inserts. As most of these sequences are not annotated, we compared the nucleotide sequences of the spike glycoprotein of all available 2019-nCoV sequences using BLASTp. Interestingly, all the 4 insertions were absolutely (100%) conserved in all the available 2019- nCoV sequences analyzed [Fig.S2, Fig.S3].