Looking at a family tree can teach us about a particular genetic line. The family tree of a virus, or its phylogeny, is no different.
This sprawling cluster of dots and lines looks like just that to the untrained eye, but virologists can use it to glean insights about coronavirus, including whether lockdown measures and travel bans were effective, how many mutations a virus has, and whether a virus was imported or domestically spread.
Emma Hodcroft, a co-developer of Nextstrain, a platform that uses open-source data to examine scientific and public health potential of pathogen genome data, talked to Al Arabiya English about what we can – and can’t – learn about coronavirus from the data.
How does Nextstrain work?
We look for small changes in the virus called mutations, which are basically just like typos. As the virus replicates, it has to copy itself so many times that it will make an error, but just like a typo in a document, a couple errors here or there doesn’t mean that the whole document is totally unreadable or unusable, so it’s mostly the same for the virus.
Just like we can follow how a document has been copied by following a trail of typos and by seeing which documents have the same typo to tell which came from the same original copy – that’s what we do with viruses.
So then we can draw a family tree of the viruses and show which ones are more closely related and more distantly related.
What does a virus’s phylogeny tell us?
The dots are the samples that are taken, and the lines that go back are connections. They tell us how the virus is connected. So when the viruses are really close together in groups, those are transmissions that occurred closer together in time, and they’re closer in the transmission chain to each other, whereas circles that are far apart that are separated by lots of lines are much more distant from each other.
What does this tell us about how COVID-19 spread and containment measures?
The phylogeny is useful particularly at the beginning of an epidemic because it helps us understand when we make this really important switch from imported transmission to local transmission.
This is an important clue for health officials that you now need to switch strategies. Once it’s spreading in your community, you need to introduce social distancing measures.
And this is also really important at the end of the epidemic when case counts are coming down again and we move out of pandemic mode.
Can you tell if intervention measures were effective?
We can see how the virus has moved around the world at different times. What we see, for example, is that there have been a lot of transmissions within the US, and if we line this up with when different travel bans were put in place, we find that those travel bans didn’t do any good.
It was domestic transmission that was really ramping up in the US at the time [travel bans were implemented], and for travel bans to work, they have to be implemented so early that basically no one would ever do it because it would be seen as a totally out-of-scale reaction.
So it shows us how it traveled around the world?
Yes, and it shows the routes they took, so we can figure out what facilitated the virus’s spread, like trains, planes, and coast-to-coast travel. Even if it’s a little backward looking, these are things that will tell us what worked and will give us an indication for how to continue to contain it and potentially lockdown specific areas, rather than put in place large-scale countrywide lockdowns.
Contact tracing is especially important for this. As we come out of lockdowns, one of the best ways to contain this virus is test, isolate, and trace.
Because it spreads asymptomatically, if we can widen the circle out and isolate people you came into contact with, you can start stamping out transmissions and bringing it to a level you can control.
What else can we learn from a virus’s family tree?
Another thing this can tell us is just how similar this virus is, which I think is a really common misconception circulating at the moment that it has all these mutations and it’s so different.
You say it’s very similar, so can you explain why there are so many headlines saying there are different strains?
Part of the problem is vocabulary. In virology and phylogenetics, we use the terms very loosely. It doesn’t have a hard and fast definition. We say strain – and we didn’t know this would be a bit of a public relations issue for us – but essentially when we say strains we mean two things that aren’t identical. It could be separated by one mutation; it could be separated by 300 mutations.
In the context of other viruses, this one is incredibly similar, and this has mostly to do with timing. The mutation rate of the new coronavirus is similar to other coronaviruses. There’s nothing inherently special about it.
The two most different types of strains still have less than 40 mutations, and the whole genome is 29,000 bases long. So that’s 40 out of 29,000. If you pick two strains at random, they’d be more like 10-20 variations.
What can we learn from these typos, or mutations?
Right now, we don’t have any evidence that any of these different samples are functionally different at all. It hasn’t really had the time or the pressure to adapt. All viruses want to do is make more of themselves, and this virus is doing this very well, so it’s unlikely there’s selection pressure on it right now to change or become more dangerous or deadly, because that’s not lined up with its goal of spreading.
Primarily, these mutations are mostly useful for tracking. In the longer term, if it becomes seasonal or we develop anti-viral drugs to work against it, then tracking the mutations will be very important.
So if it mutates enough then we might have to have a seasonal shot?
We’re not sure which category this will fall into. We still hope someday that we’ll have a flu vaccine but we haven’t gotten there yet. We’re just not sure what category this virus will fall in, and we don’t know much about immunity yet or how your body will be able to recognize slightly different viruses, what we call cross reactivity.
What isn't shown in the data?
[The map] can show us overall the picture of how it spreads. It shows us that the virus started in China and it spread and starts popping up. But we have to be careful how we read the data. For example, we know there’s loads of cases in Iran, but Iran has provided no samples, so we just don’t have data from there. We have to have at least one sample to reconstruct a path.
Each line [on the map] represents a hypothetical ancestor that transmitted from one country to another, but we don’t know if it’s just one or a couple. We’ll only ever reconstruct one.
The width of the line is the number of inferred transmissions, or how many times there was transmission between countries. In places with thicker lines have more transmissions. But again, if you only have two samples, the line will be thin, but there may have been more cases.