2020-12-26

BNT162b2 Pfizer COVID vaccine, on paper tape!

This is an amazing article on how the vaccine works, and it is so much like computer programming. Well worth reading:-

https://berthub.eu/articles/posts/reverse-engineering-source-code-of-the-biontech-pfizer-vaccine

So I decided to see just how small the vaccine is, and it is tiny. If you then code the components in binary, e.g. U/Ψ=00, C=01, A=10, G=11, and pack in to 8 bit bytes it is only 1071 bytes in total, or put it another way, 2.7m of paper tape!

I hope that gives you an idea of how compact this code is, way smaller than any computer anti virus code!

But do read that article, it is amazing!

P.S. I used to get berated for being able to read hex ASCII (not as good at it these days), but after taking the above pictures I looked at the paper tape and realised "that's not how an mRNA sequence should end!". Basically, the sequence has padding, lots of As, which translate to 0xAA on the paper tape which is quite distinctive and I expected to see it but did not. It turns out a change to the ESP-IDF MQTT library moved where the default buffer size is set (which limits MQTT message size), and so truncating my sequence by 73 bytes, hence the missing As on the end (and a bit more). The picture still shows how small it all is, as it was only a small amount lost off the end (I had text printed more tape than that anyway on the ends). But for those of you that cannot read the base code of the universe, on paper tape, it was wrong and would not have worked if you injected that paper tape as shown... So don't try it at home.

This is how it should end (video) :-

P.P.S. it is also not bad as a QR code (base32 encoded) - only slightly larger than the NHS covid QR codes :-)

6 comments:

  1. Why do things have to be Base32 or Base64 or base anything encoded? What is wrong with binary ie, 0 to 255? Can a QR code not encode binary? Why are we always throwing all these bits away just to encode things as text?

    ReplyDelete
    Replies
    1. I did try binary first. A QR code can, but I looked for an ECI to mark a QR as “unspecified binary data” and there isn’t one, they are all “character coding” of some sort. So when done as binary readers try to understand in the context of a character coding.

      Delete
    2. Oh, and also, BASE32 does not throw away as much as you think in a QR as it uses a smaller character set coding where it can. So packed in to only slightly more than 5 bits per 5 bits. Unlike ascii that puts each 5 bits in to 8 bit bytes.

      Delete
    3. Bear in mind the RNA uses a similarly poor coding of 6 bit codon to code for only 20 amino acids.

      Delete
    4. It is a missed opportunity that QR coding does not have a character set coding for the exact 64 characters normally used in base64 which would be 100% efficient but still "text".

      Delete
    5. The triplet codon scheme used by DNA is actually a fairly good coding scheme if what you're aiming for is resilience against random mutation disrupting the emitted amino acid string too often. About decade ago they did a search, and in the space of possible three-codon coding schemes, the one we have is among the most resilient. (Why that coding scheme *in particular* was chosen, over the millions of others with similar resilience... the best guess I've seen is that it's often directly connected to the method used to synthesise each amino acid: exaptation, again.)

      Delete

Comments are moderated purely to filter out obvious spam, but it means they may not show immediately.

I²S

I²S is, err, fun. What is I²S Well, first off, it is grammatically like I²C which is an acronym with two Is in it which people then treat an...