Press "Enter" to skip to content

Mutations can reveal how the coronavirus moves—but they’re easy to overinterpret

Tables are empty at St. Mark’s Square in Venice after the Italian government adopted emergency measures to contain the novel coronavirus.

Manuel Silvestri

By Kai Kupferschmidt

Immediately after Christian Drosten published a genetic sequence of the novel coronavirus online on 28 February, he took to Twitter to issue a warning. As the virus has raced around the world, more than 350 genome sequences have been shared on the online platform GISAID. They hold clues to how the new virus, named severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), is spreading and evolving. But because the sequences represent a tiny fraction of cases and show few telltale differences, they are easy to overinterpret, as Drosten realized. 

A virologist at the Charité University Hospital in Berlin, he had sequenced the virus from a German patient infected with COVID-19 in Italy. The genome looked similar to that of a virus found in a patient in Munich, the capital of Bavaria, more than 1 month earlier; both shared three mutations not seen in early sequences from China. Drosten realized this could give rise to the idea that the Italian outbreak was “seeded” by the one in Bavaria, which state public health officials said had been quashed by tracing and quarantining all contacts of the 14 confirmed cases. But he thought it was just as likely that a Chinese variant carrying the three mutations had taken independent routes to both countries. The newly sequenced genome “is not sufficient to claim a link between Munich and Italy,” Drosten tweeted.

His warning went unheeded. A few days later, Trevor Bedford of the Fred Hutchinson Cancer Research Center, who analyzes the stream of viral genomes and discusses them in Twitter threads, wrote that the pattern suggested the outbreak in Bavaria had not been contained after all, and appeared to have led to the Italian outbreak. The analysis spread widely. Technology Review asserted that “the Munich event could be linked to a decent part of the overall European outbreak” and Twitter users called on Germany to apologize. (This Science correspondent retweeted Bedford’s thread as well.)

Virologist Eeva Broberg of the European Centre for Disease Prevention and Control agrees with Drosten that there are more plausible scenarios for how the disease reached northern Italy than an undetected spread from Bavaria. Other scientists say Bedford jumped the gun as well. “I have to kick his butt a bit for this,” says Richard Neher, a computational biologist at the University of Basel who works with Bedford. “It’s a cautionary tale,” says Andrew Rambaut, a molecular evolutionary biologist at the University of Edinburgh. “There is no way you can make that claim just from the phylogeny alone.” Bedford later clarified he believed it was equally plausible there had been two separate introductions from China. “I think I should have been more careful with that Twitter thread,” he says.

It was a case study in the power and pitfalls of real-time analysis of viral genomes. “This is an incredibly important disease. We need to understand how it is moving,” says Bette Korber, a biologist at Los Alamos National Laboratory who is also studying the genome of SARS-CoV-2. “With very limited evolution during the outbreak, [these researchers] are doing what they can and they are making suggestions, which I think at this point should be taken as suggestions.”

As the outbreak unfolds, we expect to see more and more diversity. … And then it will become easier and easier to actually put things together.

Richard Neher, University of Basel

The sequence data were most informative early on, says Kristian Andersen, a computational biologist at Scripps Research. The very first sequence, in early January, answered the most basic question: What pathogen is causing the disease? The ones that followed were almost identical, strongly suggesting there was a single introduction from an animal into the human population. If the virus had jumped the species barrier multiple times, scientists would see more variety among the first human cases.

Now, more diversity is emerging. Like all viruses, SARS-CoV-2 evolves over time through random mutations, only some of which are caught and corrected by the virus’s error correction machinery. Over the length of its 30,000-base-pair genome, SARS-CoV-2 accumulates an average of about one to two mutations per month, Rambaut says. “It’s about two to four times slower than the flu,” he says. Using these little changes, researchers can draw up phylogenetic trees, much like family trees. They can also make connections between different cases of COVID-19 and gauge whether there might be undetected spread of the virus.

For instance, when researchers sequenced the second virus genome in Washington—from a teenager diagnosed with COVID-19 on February 27—it looked like a direct descendant of the first genome, a case found 6 weeks earlier, that had acquired three further mutations. Bedford tweeted that he considered it “highly unlikely” that the two genomes came from separate introductions. “I believe we are facing an already substantial outbreak in Washington State that was not detected until now,” he wrote. That analysis turned out to be correct: Washington has now reported more than 100 cases and 15 deaths and additional genomes from other patients have bolstered the link. In this case, Bedford’s hypothesis was much stronger because the two patients both came from Snohomish County, Rambaut says: “It’s very unlikely that this highly related virus would travel to exactly the same town in Washington,” he says.

Related

Few other firm conclusions about the virus’s spread have emerged, in part because the wealth of genomes is still a tiny sample of the more than 100,000 cases worldwide. Although China accounts for 80% of all COVID-19 cases, only one-third of the published genomes are from China—and very few of them are from later cases. And because it’s early in the outbreak, most genomes are still very similar, which makes it hard to draw conclusions. “We just have this handful of mutations, which makes these groupings so ambiguous,” Neher says. “As the outbreak unfolds, we expect to see more and more diversity and more clearly distinct lineages,” he says. “And then it will become easier and easier to actually put things together.”

Scientists will also be scouring the genomic diversity for mutations that might change how dangerous the pathogen is or how fast it spreads. There, too, caution is warranted. A paper published by Lu Jian of Peking University and colleagues on 3 March in the journal National Science Review analyzed 103 virus genomes and argued that they fell into one of two distinct types, named S and L, distinguished by two mutations. Because 70% of sequenced SARS-CoV-2 genomes belong to L, the newer type, the authors concluded that virus has evolved to become more aggressive and to spread faster.

But they lack evidence, Rambaut says. “What they’ve done is basically seen these two branches and said, ‘That one is bigger, [so that virus] must be more virulent or more transmissible,’” he says. However, just because a virus is exported and leads to a large outbreak elsewhere does not mean it is behaving differently: “One of these lineages is going to be bigger than the other just by chance.” Some researchers have called for the paper to be retracted. “The claims made in it are clearly unfounded and risk spreading dangerous misinformation at a crucial time in the outbreak,” four scientists at the University of Glasgow wrote in a response published on www.virological.org. (In a response, Lu wrote the four had misunderstood his study.)

Most genomic changes don’t alter the virus’s behavior, Drosten says. The only way to confirm that a mutation has an effect is to study it in cell cultures or animal models and show, for instance, that it has become better at entering cells or transmitting, he says. And if the virus does change in an important way, it could go either way, making it more or less dangerous. In 2018, Drosten’s group published a paper showing that early in the SARS outbreak of 2002–03, that virus lost a small chunk of its genome, 29 base pairs in one gene. Adding those base pairs back in the lab made the virus much better at replicating in several cell culture models.

It might seem strange that a mutation that weakens the virus would become established, but that can happen when it has just entered the human population and isn’t competing with strains lacking the mutation, Drosten says. “Sadly, this new virus doesn’t have that deletion,” he adds. 


Source: Science Mag