The alphabet soup explained: how SARS-CoV-2 variants are named
Accessing information about variant evolution is easier than you may think
Keeping up to date on the ongoing evolution of SARS-CoV-2 variants can be made challenging not by a lack of information by its sheer volume. The fact that this information is often packaged in (apparently) opaque technical jargon adds another barrier. Fortunately, things are not quite so inaccessible as they may seem at first blush. Here, we provide a brief summary of the major naming systems for variants and mutations. Not sure what is meant by technical labels like XBB.1.5 or S:F456L, why everything seems to be called “Omicron”, or where nicknames like “Kraken” come from? Read on!
Alphabet soup: the PANGO nomenclature system
The letter-number combinations you often hear about — like B.1.1.529, BA.2, XBB.1.5, EG.5, and JN.1 — are the labels assigned to each newly identified variant using the PANGO nomenclature system. PANGO stands for Phylogenetic Assignment of Named Global Outbreak lineages, and is particularly useful as a catalogue of every distinct variant that is found. The procedures and criteria used for designating new variants are technical (you can find them here), but the basics of the naming system are fairly straightforward.
Here is how PANGO names work:
Letter prefixes are assigned in alphabetical order. This began with A, B, C … Z, then proceeded through AA, AB, AC… AZ, then BA BB, BC… BZ, etc.
Numbers indicate ancestry and descent. For example, B.1.1 is the first designated descendant of B.1, and B.1.1.529 is the 529th designated descendant of B.1.1.
A new letter prefix is assigned beyond three levels of descendants. Full PANGO labels retain every additional descendant dot-number, but for simplicity the next available letter prefix is assigned as an alias. For example, B.1.1.529.2 became BA.2 and BA.2.86.1.1 became JN.1. (The full PANGO label for JN.1 is B.1.1.529.2.86.1.1). You can find a list of PANGO aliases here: https://github.com/cov-lineages/pango-designation/blob/master/pango_designation/alias_key.json.
Recombinants are given prefixes beginning with X. If two variants infect the same cell and exchange genetic information with each other, a new recombinant (hybrid) variant may result. PANGO labels indicate recombinant origins for variants by using a prefix beginning with the letter X and proceeding alphabetically. For example, XBB is a recombinant between BJ.1 x BM.1.1.1.
Some letters are not used. To avoid confusion, the letters “I” and “O” are not used because they resemble numbers.
The PANGO system is very useful for researchers because it provides important information about the relationships among variants and the order in which they were discovered. Unfortunately, these formal labels can become very confusing in non-technical discussions, especially as the number of variants and aliases continues to expand. Aliases keep the labels shorter, but this can also obscure relationships because new prefixes are simply assigned in alphabetical order. Consider the following:
Some lineages have now gone through multiple alias prefixes. For example, the label “JL.1” is an alias of B.1.1.529.2.75.3.4.1.1.1.1.17.1.3.2.1 and is part of a lineage that has gone through five alias prefixes: B, BA, CH, FK, and JL. It can be very difficult to keep track of these all being part of the same lineage.
After three levels of descendants, recombinant variants receive a new prefix that does not begin with X. For example, the variant EG.5 is an alias for XBB.1.9.2.5, which obscures the fact that it is descended from a recombinant ancestor.
Some major variants produce a great many descendant lineages, each with a different alias. For example, XBB.1.5 has spawned lineages with the following alias prefixes: XBB.1.5*, EK, EL, EM, EU, FD, FG, FH, FT, FZ, GB, GC, GF, GG, GK, GN, GR, GU, GV, HA, HC, HD, HJ, HM, HP, HQ, HR, HS, HT, HY, HZ, JB, JD, JK, JZ, KA.
Alpha to Omicron: WHO’s Greek letter system
By spring of 2021, it became apparent that the “alphabet soup” of PANGO labels was making non-technical communications difficult. In addition, reports tended to make reference to places where variants were first identified (though not necessarily where they first evolved) — for example, the UK variant or the South African variant. To avoid stigma and to facilitate communication about the most important variants, the World Health Organization (WHO) implemented a system of assigning Greek letters what they designated as Variants of Interest (VOIs) or Variants of Concern (VOCs).
During a period of ~180 days from May and November 2021, the WHO assigned Greek letters to 13 variants, including eight VOIs (Epsilon, Zeta, Eta, Theta, Iota, Kappa, Lambda, Mu) and five VOCs (Alpha, Beta, Gamma, Delta, Omicron). Two letters were skipped between Mu and Omicron: Nu because of potential confusion with “new” and Xi because of its similarity to a common name in China.
The WHO also updated their naming system at various points, including adding a third-tier category of Variants Under Monitoring (VUMs) in March 2023, and revised the working definitions for VOIs and VOCs and determined that only the latter would receive new Greek letters in August 2023. No new Greek letters have been assigned since “Omicron” in November 2021 (i.e., for two years up to the time of this writing), and there are now more than 2,000 variants with PANGO labels within “Omicron”.
Release the Kraken: nicknames and accessible communication
Given that the challenges in communicating about variants have only increased with the evolution of very large numbers of variant lineages with formal PANGO designations, and in light of the fact that Greek letters have not been assigned to any new variants since Omicron in late 2021, a group of volunteer variant trackers and science communicators began using informal nicknames to distinguish among the most relevant variants. These were seen as similar to “common names” in zoology and botany, complementing but not replacing formal taxonomic names.
The first nicknames were based on Greek mythological creatures (and one that is actually Scandinavian, namely “Kraken” for XBB.1.5) but this has since been updated to make use of astronomical names. These informal nicknames have been widely used in media, but not without controversy. In general, there has been very strong overlap between which variants are afforded nicknames and which receive WHO designation as VUMs or VOIs (there are no current VOCs).
Nicknames have several benefits, most notably:
Nicknames are much more memorable and help to identify particularly important variants. This was one of the main functions of the Greek letter system, but since nothing new has been formally nicknamed since “Omicron”, the only option has been a return to PANGO labels. Using nicknames alleviates the confusion associated with an ever more complex variant alphabet soup.
Nicknames can be used to refer to “clans” of variants with different prefixes. For example, “Kraken clan” includes all the descendants listed above, “Arcturus clan” includes XBB.1.16*, FU, GY, HF, JF, JM, KJ, and “Eris clan” includes EG.5*, HK, HV, JG, JJ, JR, KB.
FLips and SLips: labels and nicknames for specific mutations
Identifying, characterizing, and tracking variants (or, more broadly, evolving lineages of variants) remains important, but so too is paying attention to specific mutations of note. Mutations have their own system of labels that allow researchers to communicate about them. As with PANGO labels, this can be confusing unless the naming conventions are explained. Here is how the labels work for individual mutations.
The SARS-CoV-2 genome contains a number of protein-coding regions (genes), the one we hear about most being the spike protein (S) because it is involved in attaching to the receptors on our cells and is what is targeted by vaccines. In addition to the spike protein regions in the genome, there are also "open reading frames" (ORFs), which are regions in between a start and stop codon -- that is, sequences that can be read as a gene.
When a new variant is designated with a PANGO label, its distinguishing mutations are listed. So, for example, you may see that "EG.5 = XBB.1.9.2 + S:F456L + ORF1a:A690V + ORF1a:A3143V", and "EG.5.1 = EG.5 + S:Q52H". The letters before the colon in the mutation notation refer to the gene or open reading frame that has undergone a mutation. In the example from EG.5 and EG.5.1, that's "S" (spike) and "ORF1a" (open reading frame 1a).
Proteins are chains of amino acids, and each type of amino acid has its own name and a corresponding abbreviation. The first letter after the colon is the amino acid that was present at that position in the original protein, the number is the position of the changed amino acid in the protein sequence, and the last letter is the new amino acid that is now specified in the mutated sequence
.So, the designation of a mutation as “S:F456L” means that the 456th amino acid in the spike protein (S) changed from phenylalanine (F) to leucine (L). ORF1a:A690V means that the 690th amino acid in the protein encoded by open reading frame 1a (ORF1a) changed from alanine (A) to valine (V).
Some mutations are especially notable for their effects in allowing a variant to escape existing immunity or to bind effectively to our cells. For example, when two specific mutations occur in adjacent amino acid positions 455 and 456 in the spike protein, they can make a variant especially capable of bypassing prior immunity. These are designated S:L455F and S:F456L, and because they essentially involve a swapping of F (phenylalanine) and leucine (L) in opposite directions in two adjoining positions (L—>F at 455 and F—>L at 456) , they have been dubbed “FLip mutations”. Thus, there may be reference to Kraken + Flip or Eris + FLip in reference to descendants within those clans that undergo these mutations.
More recently, a notable member of the BA.2.86 (Pirola) lineage has undergone a mutation in the same 455 spike position, but this time swapping leucine (L) for serine (S) — that is, S:S455L. Unlike FLip mutations, which both have to occur, this “SLip” mutation alone seems to make the JN.1 (BA.2.86.1.1) variant successful.
Not all mutations result in a change in amino acid sequence. For regions outside protein-coding genes or for “synonymous” mutations (i.e., the same amino acid is still specified), it can be useful to refer to nucleotide rather than amino acid positions and changes. In this case, labels simply reflect the original nucleotide (A, T, G, or C), the nucleotide position in the genome, and the new nucleotide. For example, “T3565C” indicates a change from thymine (T) to cytosine (C) at nucleotide position 3565, while “G11727A” describes a change from guanine (G) to adenine (A) at nucleotide position 11,727.
A variant in an evolutionary tree: Nextstrain
If you’ve used Nextstrain to check on the status of variant evolution, you will have noticed that they use their own labelling system for key variants. These don’t get mentioned in the media, but it’s worth quickly noting how those work as well. Nextstrain uses number-letter labels in which the number is the year and the letter is reflects the order in which the label was assigned. So, for example, the variant 21A is the first variant labelled by Nextstrain in 2021, 22D is the fourth variant labelled in 2022, 23B is the second variant labelled in 2023, and so on.
Be daunted no more!
Anyone could be excused for struggling to make headway in a torrent of information about the ongoing evolution of SARS-CoV-2. Multiple naming systems and aliases that capture an ever-growing menagerie of variants and associated mutations can be a lot to handle. However, as this brief summary has hopefully shown, these are naming systems that can be understood and need not remain inaccessible.