HI.nextflu / H3N2 / 20y

Real-time tracking of seasonal influenza
virus evolution in humans

Phylogeny

Legend

Color by

branch labels

HI color:

measurements
tree model
substit. model

Corrections:

serum potency
virus avidity

Sera:

click (repeatedly) on a branch to zoom

Frequencies

Input mutations as position+amino acid, i.e. 159Y, clades as clade name, i.e. 3c2.a and add locations as /AS, /NA, /EU, /OC, i.e. 159Y/AS. By default, positions are interpreted as residues in HA1. You can specify the subunit as HA2:18V. Alternatively, simply click on variable positions in the graph below.

Mutations

Mutation	HI effect
K158N/N189K	3.53
S124G/N133D/I144V/Q156K/K158E/K276N	2.46
K158R	2.41
K189N	2.33
E158K/V196A/N276K	1.84
K145N	1.44
S193F	1.43
V186G	1.35
N121T	1.27
K145S	1.19
F159S	1.18
K140I	1.13
F159Y	1.07
H75Q/H155T	0.96
K144D	0.96
Q75H	0.94
G186S	0.94
K83E	0.9
N145S	0.85
N145Q	0.84
N262S	0.82
T212S	0.8
K144N	0.76
V144I	0.76
N8D	0.76
K145Q	0.74
D172E	0.65
S159Y	0.63
G5E	0.63
N188D	0.63
W222R/G225D	0.57
Q145S	0.55
N145K	0.49
K83R	0.48
S186G	0.47
S199P	0.45
N188Y	0.43
T212A	0.42
Q57R	0.41
N144S	0.41
Q156H	0.4
Y137S	0.38
Q145N	0.36
S137Y	0.35
K326T	0.34
K160T	0.34
S189N	0.33
H183L	0.32
K264R	0.31
I214T	0.31
A198S	0.29
S193N	0.28
S262N	0.28
S159F	0.27
K62E	0.25
R326K	0.25
A212T	0.24
S54R	0.24
D53N	0.23
R142G	0.23
E62K	0.22
V88I	0.22
K92T	0.22
N225D	0.2
I25V	0.18
N126D	0.17
N278K	0.14
D126N	0.13
S45N	0.13
P198S	0.12
K173Q	0.12
L157S	0.12
T192I	0.11
K27R	0.11
N124S	0.1
S199A	0.1
P198A	0.09
I144D	0.08
N6I	0.07
T30A	0.07
D144N	0.07
N122D	0.06
R57Q	0.06
L3I	0.05
G142R	0.05
N144D	0.05
E173K	0.03
E50G/I140K	0.03
Y159F	0.03
A199S	0.01
N144I/E172D	0.0
R261Q	0.0
D144K	0.0

Feature explanation

Click here for help with the nextflu interface or watch the tutorial video below.

HI data

HI data can be displayed as color on the tree or viewed via the tool tips that show when moving the mouse over a circle corresponding to a virus. To explore the HI titer data, select HI distance from focus in the color by menu and click on one of the available reference viruses indicated by grey squares. The tree will then be colored by log2 distance from this reference virus. The coloring either reflects the the direct measurements of HI titers provided by the WHO collaborating centers (notably the annual and interim reports by the NIMR in London), or models that are fit to these data. Whether the raw data, the tree model and the mutation model are used to color the tree can be chosen via the radio button on the left. If more than one measurement is available, we take the average over all available measurements. In the process of fitting the models, column (serum potency) and row (virus avidities) effects are estimated. These corrections can be subtracted from the raw measurements to remove noise. To see all measurements of a virus relative to the chosen reference virus, put the mouse over that virus and a info box (tooltip) will pop up with a table that lists all measurements (and the autologous titers for the sera to facilitate interpretation) and the model predictions.

The tree can also be colored by cumulative antigenic change -- similar to dimension 1 in antigenic cartography.

Phylogenetic tree

Use the date slider to select viruses sampled within the time interval indicated. The size of the interval can be changed by grabing the left end of the bar with the mouse, to move the interval, use the right end of the slider.

Use the drop down menu to color viruses by number of epitope mutations, non-epitope mutations or receptor binding mutations relative to root, or to color viruses by local branching index or geographic region.

Use the input box to specify positions to color viruses by genotype. Amino acid positions must be separated by a comma (e.g. 159,225). The default is HA1, to color by amino acid sequence in other regions use HA2:18 or SigPep:6. To color by nucleotide sequence, use nuc:527.

Mouse over a tip to show virus name, location and features.

Mouse over a branch to graph the frequency of the correponding clade trajectory below or click on a branch to zoom into its descendent clade. The tool tip will show amino acid mutations on this branch.

To restrict the displayed viruses to certain geographic regions, select the region in the drop down menu labeled region.

Frequencies

Enter a mutation or genotype above (e.g. 225D) and click plot frequencies to show estimated frequency of this mutation through time. In addition, geographic regions can be specified by adding AS (Asia), NA (North America), EU (Europe), or OC (Oceania) as 159S/225D/AS. Several genotypes can be entered simultaneously when separated by commas (e.g. 225D, 159S/225D/AS will graph the global frequency of 225D and the frequency of strains containing both 159S and 225D in Asia). Instead of a genotype, the common clades 3c3, 3c3.a, 3c2, 3c2.a can be used. Positions with very little variation are omitted. Beware that region specific frequencies are noisy.

Variability

The second plot shows the variation in the multiple sequence alignment used to construct the tree. High bars indicated variable positions. Clicking on those bars will color the tree by amino acid at this position and plot the frequencies of the corresponding amino acids.

Video tutorial

Rationale and details

Epitope mutations are based on HA structure and exposed residues. Multiple recent mutations at epitope sites have been suggested to be predictive for strains dominating future seasons. Similarly, mutations outside of these epitopes -- termed non-epitope sites --- tend to be damaging and are suggested to be predictive of clade contraction.

Antigenic evolution has been shown to depend primarily on substitutions surrounding the receptor binding site of HA1. These seven positions (145, 155, 156, 158, 159, 189, 193 in HA1 numbering) are referred to here as receptor binding positions and changes at these positions could correspond to large changes in antigenic properties.

The local branching index is the exponentially weighted tree length surrounding a node, which is associated with rapid branching and expansion of clades. A more detailed explanation is available here. Retrospective analysis has shown that LBI correlates with clade growth.

Frequencies are estimated as maximum likelihood trajectories that penalize rapid changes in frequency and slope. The frequencies of large clades or abundant genotypes have sufficiently many observations to by robust, while frequencies of rare mutations can't be reliably estimated.

Built by Richard Neher and Trevor Bedford. This work is made possible by the GISAID Initiative and the open sharing of genetic data by influenza research groups from all over the world. We gratefully acknowledge their contributions. Recent HI titer data was generated by John McCauley, Rod Daniels and colleagues at the Worldwide Influenza Centre at the Francis Crick Institute.

Give us a shout at @richardneher or @trvrb with questions or comments. All source code is freely available under the terms of the GNU Affero General Public License. A detailed description of methods is also available. Data updated and processed with commit .

Please cite: Neher RA, Bedford T. 2015. nextflu: real-time tracking of seasonal influenza virus evolution in humans. Bioinformatics 10.1093/bioinformatics/btv381.