7 Comments

Daoyu has been saying that Csabai's Antarctic reads contain C29095T, but actually there's just 3 reads with C29095T, and the bases with the mutation all have a very low quality: quality 6 in base 137 of read 5599415 in SRR13441704_2 (where there's 14 remaining bases after it in the read out of which half are different from the reference), quality 5 in base 94 of read 21585669 in SRR13441708_2 (where 2 bases before it there's another base that is different from the reference that has a quality of 10), and quality 5 in base 2 of read 23910570 in SRR13441708_2 (where the first base of the read is also different from the reference). And a Phred33 quality of 5 corresponds to an expected error rate of 1/10^.5, or about 32%. So I think the 3 instances of C29095T in the Antarctic reads are just sequencing errors.

`samtools mpileup` doesn't include bases with quality below 13 by default, so in order to see the 3 mutations, you have to remove the quality cutoff with `samtools mpileup -Q0`:

brew install parallel sratoolkit bowtie2 samtools

printf %s\\n SRR134417{00..11}|parallel -j12 fastq-dump --split-3 --gzip

curl 'https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=nuccore&rettype=fasta&id=MN908947.3' >sars2.fa;bowtie2-build sars2.fa{,}

for x in SRR134417*.fastq.gz;do bowtie2 -p3 -x sars2.fa -U $x --no-unal|samtools sort -@2 ->${x%%.*}.sars.bam;done

for x in {00..11}_{1,2};do samtools mpileup -f sars2.fa -Q0 SRR134417$x.sars.bam|ruby -alne'next unless[29095].include?($F[1].to_i);next if$F[3]==0||$F[4]=="*";t=$F[4].upcase.gsub(/[.,]/,$F[2]).gsub(/[+-]\d+[A-Z]+/,"").chars.tally;puts [$F[1],$F[2],t["A"]||0,t["C"]||0,t["G"]||0,t["T"]||0]*" "'|sed s/^/$x\ /;done|sort -nk2|(echo run pos ref A C G T;cat)|column -t

The sample Guangdong/FS-S30-P0052/2020 is also interesting, because it has only 5 mutations from Wuhan-Hu-1 but all of them are shared with RaTG13. The sample has all 3 proCoV2 mutations and the 2 additional mutations C21711T and C24382T. However the Antarctic reads don't seem to include either of the additional mutations. In a set of 719,679 GISAID sequences with collection date in 2020 or earlier, if sequences with over 100 mutations are excluded which are mostly other SARS2-like viruses, then C21711T is found in 128 sequences and C24382T is found in 1,156 sequences: `curl -s https://cdn.discordapp.com/attachments/1093243194231246934/1129149184939929713/early.comb.xz |gzip -dc|sort -k4>early.comb;awk -F\\t '$11<100' early.comb|cut -f12|grep -o '[A-Z]21711[A-Z]'|sort|uniq -c|sort`.

For some reason many of these interesting mutations seem to pop up at Guangdong before Hubei, even though GISAID has much more early samples from Hubei than Guangdong. On GISAID the 7 C29095T samples with the earliest collection date are also found at Guangdong: `grep C29095T early.comb|grep Human|cut -f1,3-7,11,12-|head|tr \\t \|`.

In Jesse Bloom's paper about the recovered deleted sequences, in the tree where he selected WA1/proCoV2 as the root, there's only 3 first-level branches under the root, out of which one branch consists of the mutations C865T and C13694T. Both mutations are common in the Antarctic reads, but there's only 4 samples on GISAID with a collection date in 2020 that contain both mutations: 2 samples from Singapore from late January and 2 samples from Guangdong from early February. All 4 samples have only 5 mutations from Wuhan-Hu-1, where the other 3 mutations are the mutations of proCoV2/WA1. I came up with the term "Guangdong-Singapore line" to refer to the line with the C865T and C13694T and mutations. Neither of the mutations seems to be ancestral though. But it's still interesting that the mutations are found in early Chinese samples without C18060T, so there seems to be an overlooked sublineage of proCoV2 which is also found in the Antarctic reads.

When I did variant calling for the pooled reads from all Antarctic runs, I got a high MAPQ value for the mutation A16156G, which is included in only one sample at GISAID with a collection date in March 2020 or earlier, which is a sample from Guangdong with a total of 4 mutations relative to Wuhan-Hu-1. Two of the additional mutations are the lineage A mutations, and the fourth one is T9013A, which is a rare mutation which is not found in any other samples at GISAID until a sample from Brazil with a collection date in May 2020.

C23525T is another rare mutation in the Antarctic reads that got high MAPQ for my pooled reads. It's found in 6 samples at GISAID with a collection date in February 2020 or earlier, but one of the samples is from Guangdong, and its only other mutation relative to Wuhan-Hu-1 is C13694T which is one of the two "Guangdong-Singapore" mutations.

Out of the Antarctic mutations with a fairly high MAPQ, the only mutations which appear to be ancestral are the 3 proCoV mutations, but other mutations appear to not be ancestral: T9440A, C13694T, A16156G, A17039G, A18082G, A18082G, A21975C, C23525T, G23525T, C25498T, C26895T, and G29449T. So I think it's possible that the Antarctic reads don't actually contain strains that are more ancestral than WA1.

When I ran `samtools mpileup` for the Antarctic reads with the default quality cutoff of 13, I got 20 reads which included position 3037 but zero of them had C3037T.

Expand full comment

Nice!

A few pointers.

I actually made a python script to identify ancestral mutations that you can adjust based on threshold here: https://github.com/mass-blank/sars2oolkit/blob/main/SRAFunctions/conserved.py

here is the output:

0: A, 1: C, 2: G, 3: T

{378: 1, 379: 3, 487: 3, 511: 1, 544: 3, 1104: 1, 1288: 3, 1391: 1, 2110: 3, 2731: 0, 2881: 3, 3602: 0, 4442: 0, 5128: 2, 5305: 0, 5348: 3, 5368: 2, 5374: 0, 5648: 1, 5665: 0, 5730: 3, 5839: 2, 6196: 3, 6262: 1, 6446: 0, 6509: 2, 7267: 3, 7479: 1, 8266: 3, 8623: 1, 8782: 3, 9220: 0, 9693: 3, 10138: 3, 10156: 3, 10228: 3, 10456: 3, 11077: 1, 11081: 2, 11200: 3, 11227: 1, 12439: 0, 12781: 3, 14032: 2, 14220: 3, 14262: 3, 15258: 1, 15273: 3, 15540: 3, 16563: 0, 16692: 1, 16768: 0, 18408: 1, 18486: 3, 18813: 1, 20229: 1, 20393: 1, 20898: 3, 20940: 0, 21234: 3, 21306: 3, 21501: 3, 21662: 0, 21711: 3, 22147: 0, 22264: 3, 22267: 3, 22676: 0, 22858: 3, 23117: 0, 23191: 3, 23260: 3, 24049: 3, 24130: 3, 24490: 2, 24697: 0, 24995: 1, 24997: 3, 25033: 0, 25413: 3, 25420: 1, 25475: 1, 25645: 1, 25648: 1, 25827: 1, 25854: 3, 26168: 1, 26606: 3, 26714: 1, 27328: 1, 27429: 3, 27687: 0, 28144: 1, 28382: 1, 29072: 1, 29073: 0, 29095: 3, 29212: 0, 29341: 1, 29518: 3}

18060T and 3037T is not above because the threshold is set too high.

Amongst other things I developed a python pipeline to identify the proportion of overall ancestral signal in the first 1300 SRR at the start of the pandemic before April 2020. The v lineage (Kumar et al) seems to contain the most ancestral signal from all available SRR that I was able to download. Doing this I was able to identify 4 fasta genomes from GISAID with 8782T, 18060T, 28144C, and 29095T that also contained accessory mutations as well. These genomes were found in Florida, California, and I think one in Washington state. Looking at the data there seems to be some irregularity and confusion at the root, enough irregularity to create doubt on whether the pandemic started in China or US. Enough irregularity to doubt whether there was 1 or 2 spillovers.

Expand full comment

- Thesis of Toxic substance (VEA) present then removed:

Compare with the curve for the 1981 Spanish Toxic Oil Syndrome, page 7 of this doc:

http://www.euro.who.int/document/e84423.pdf

It does not have the slow curve up-front: a toxic batch has a massive up-front effect, which then fizzles off. The Toxic Oil was identified years later.

Yes EVALI was cover-up for early COVID.

We note who insists that it's not, with a level of scrutiny/exploration/data much different from what is applied to WIV/Wuhan-CDC or even Fauxi/Sadsack/Baryc.

What's at play here? Directed Limited Hang-out?

Who died unexpectedly, when?

When did it spread?

- 2009 "Swine" H1N1 Flu is clearly high & out of sync/very early.

Was that a rehearsal, or also a leak? a first attempt *also with a pandemic declaration*, but that fizzled too fast, and also a deadly vakscene withdrawn after maiming a couple hundreds?

The next big ILI curve is 2017-2018 cyan, better synchronized - while it was NOT in Europe

Page 4 of https://www.ecdc.europa.eu/sites/default/files/documents/AER_for_2019_influenza-seasonal.pdf

2017 first invisible contamination wave.

Expand full comment

I think what's quite obvious for me is the myopic view that has been generated. I don't know if it's intentional or accidental. However, we should be investigating where the evidence leads us not who has had more accidents or is worse as a society. Science shouldn't be defamatory nor contain ad hominem attacks. I blame social media it's both a gift and a curse.

Expand full comment

It's not science. It's propaganda (also called: narrative control, or psyop).

Plan B of cover-up. There's certainly a plan C, you'll spot it when *ALL* main stream media suddenly, in a synchrone manner, switch, just as occurred with the Lab leak story, one day to the other.

The day is close, Fauxi-Sadsack will be thrown under the bus, judged/condemned - end of story. This will still not be the truth.

Expand full comment

The original thread from 4 Dec 2020 with now-killed links to doc, was a reply to paid Faïtzer shill & once Drastic associate Rolandbakeriii, still visible on selected archived pages:

@RolandBakeriii:

The Wuhan Lab had no incinerators. Medium rare autoclaved bats anyone?

http://web.archive.org/web/20201204191054/https://twitter.com/amicocolorido/status/1334935030515306497

Expand full comment