New Resources for Norovirus

Jennifer Chang, Jover Lee, Kim Andrews, James Hadfield, Allison Li, Katie Kistler, John Huddleston, Trevor Bedford

Historically called "winter vomiting disease" (Adler and Zickl, 1969), norovirus is the bane of any parent, long term care facility, or otherwise contained community. The name of the virus is a mangled reference to Norwalk, Ohio, where a particular 1968 outbreak surged through a population of school children within a 24 hour period. Despite its explosive symptomology, the etiological agent was not identified as a virus until 1972, by immune electron microscopy (Kapikan et al, 1972) and not genetically sequenced until 1989 (Jiang et al, 1993). The subsequent classification of the virus was fraught as it became clear that the virus undergoes recombination, frequently between the ORF1 and ORF2 region (Bull et al, 2005), resulting in different evolutionary histories between the polymerase (RdRp in ORF1) and surface capsid (VP1 in ORF2). Despite multiple attempts and discomfiture among those afflicted, there is no approved norovirus vaccine.

# Phylogenetic analysis

Nextstrain provides regularly updated phylogenetic monitoring of norovirus along several different facets. Since this is a highly recombining virus, we provide individual gene trees as well as the full genome tree of all norovirus samples, building off the prior effort of Allison Li and Katie Kistler and John Huddleston. They have also faceted the genome trees along important VP1 types GII.2, GII.3, GII.4, GII.6 and GII.17. Therefore, as of the time of this writing, we provide 14 regularly updated views of norovirus evolution.

group genome p48 NTPase p22 VPg 3CLpro RdRp VP1 VP2
all genome p48 NTPase p22 VPg 3CLpro RdRp VP1 VP2
GII.2 genome
GII.3 genome
GII.4 genome
GII.6 genome
GII.17 genome

The combination of highly diverged and recombined sequences proved a challenge in rooting the phylogenetic trees, and we advise that any results should be interpreted with caution. Even so, the Nextstrain trees are provided as a broad summary of the genetic diversity and relatedness, and further biological interpretation may require targeted sampling, tuning of parameters, a different alignment reference, or focusing on particular gene combinations. The trees are being annotated by both VP1 and RdRp types. From the map, norovirus types do not appear to be geographically segregated. From the frequency panel, we see indications that there are dynamics of leading types and it is not a virus that has reached genetic equilibrium of the proportion of those types.

Figure 1 Figure 1. Norovirus is globally distributed and highly divergent. Phylogenetic trees are annotated by both VP1 and RdRp types, host, country, date, genome and gene coverage percentages. The full genome tree is shown here. From the map, norovirus types do not appear to be geographically segregated. From the frequency panel, we see indications that there are dynamics of leading types and it is not a virus that has reached genetic equilibrium of the proportion of those types.

# Norovirus groups, types, and variants

Norovirus samples have a dual-typing system based on a polymerase region (RdRp) and capsid region (VP1) of the genome, between which is a known recombination site. The resolution of norovirus typing has undergone multiple changes (Zheng et al., 2006; Eden et al., 2013; Chhabra et al., 2019; Tatusov et al., 2020), but generally are split into a "genogroup", "genotype", and "variant" classification for VP1, and "P-group", "P-type", and "variant" for RdRp. For the sake of naming Nextstrain trees, we will name these VP1 group, type, or variants and RdRp group, type or variants respectively.

Figure 2 Figure 2. Typing of norovirus samples is based on the VP1 and RdRp region and are further split out into group, type, and variant resolution.

Group, type, and variant levels of resolution were roughly classified by a preliminary Nextclade datasets based on either VP1 or RdRp gene. The Norovirus nextclade datasets are preliminary and further development is pending. These datasets have been built from scaffold strains listed at the Human Calicivirus Typing Tool as of September 16, 2025.

Figure 3 Figure 3. Preliminary norovirus classification into Group, Type, and Variant columns.

# Nextstrain resources

We curate sequence data and metadata from NCBI as the starting point for our analyses. We provide snapshots of the exact curated sequences and metadata for norovirus workflows at:

# Acknowledgments & request for comments

We welcome comments or suggestions from norovirus researchers to improve these Nextstrain and Nextclade datasets. Special thanks for feedback from Chao-Yang Pan and Erik Wolfsohn for answering questions and providing some biological context.

All source code is freely available under the terms of the GNU Affero General Public License. Screenshots may be used under a CC-BY-4.0 license and attribution to nextstrain.org must be provided.

This work is made possible by the open sharing of genetic data by research groups from all over the world. We gratefully acknowledge their contributions. Special thanks to Kristian Andersen, Josh Batson, David Blazes, Jesse Bloom, Peter Bogner, Anderson Brito, Matt Cotten, Ana Crisan, Tulio de Oliveira, Gytis Dudas, Vivien Dugan, Karl Erlandson, Nuno Faria, Jennifer Gardy, Nate Grubaugh, Becky Kondor, Dylan George, Ian Goodfellow, Betz Halloran, Christian Happi, Jeff Joy, Paul Kellam, Philippe Lemey, Nick Loman, Duncan MacCannell, Erick Matsen, Sebastian Maurer-Stroh, Placide Mbala, Danny Park, Oliver Pybus, Andrew Rambaut, Colin Russell, Pardis Sabeti, Katherine Siddle, Kristof Theys, Dave Wentworth, Shirlee Wohl and Cecile Viboud for comments, suggestions and data sharing.

Nextstrain is supported by

logologologologologologologologo