Nowcasting and Forecasting Daily Hospital Deaths from Covid-19
Oliver Stoner is a Research Fellow at the Institute for Data Science and Artificial Intelligence, University of Exeter. He has a background in developing new statistical modelling frameworks for correcting under-reporting and delayed-reporting in count data, such as reports of infectious disease cases or deaths. In recent weeks he has applied this knowledge to develop an advanced model which corrects for reporting delays in NHS England data on daily hospital deaths from Covid-19.
Daily hospital deaths corrected for reporting delay
Official reports of daily hospital deaths from Covid-19 are subject to ‘reporting delay’. It can take several days for the number of deaths that occurred on a specific day to be accurately reported. Using our advanced statistical data model we can adjust for the delay and estimate (nowcast) the number of deaths which occurred today and in recent days before they are fully reported. We can also forecast daily deaths up to 3 days ahead.
In the plots above, the dots show the total number of deaths occurring on each day reported so far. The total number of reported deaths are very low for recent days, as the reporting delay means many deaths still haven't been reported. Our model estimates the number of deaths which have already happened - but haven't been reported yet - by learning about the reporting delay and how it changes by region, over time, and with the time of week. These estimates are shown as the coloured lines in the plots above and the shaded areas represent the model's uncertainty. Estimates are very certain earlier in the time series, as the model knows most of the deaths have been reported by now. Closer to the present day (which is shown as the vertical lines), the estimates are more uncertain as fewer of the deaths which have already happened have been reported. The model can also make predictions for the next 3 days, and the uncertainty grows the further into the future we forecast.
Delayed reporting and the weekly cycle
The plot below illustrates the cumulative proportion of deaths reported as the delay increases, with different colours corresponding to different dates of death. Generally, very few deaths (less than 20%) are reported within 1 day but the vast majority of deaths (more than 90%) are reported after a week or so. We can also see that deaths which occurred in mid-April (lighter colours) tend to be reported with less delay (they are higher in the plot) than deaths which occurred in early-April (darker colours), suggesting the reporting is speeding up as time goes on.
Cumulative proportion of daily hospital deaths from Covid-19 in England reported after each day of delay (100% represents the total reported after 14 days).
There has been much discussion among experts and in the media about the impact of the so-called 'weekend effect' on daily deaths reports. It is thought that the reporting system typically slows down over the weekend and catches back up again in the following week. Our model is the first that we know of which explicitly accounts for the weekly cycle in the reporting delay to improve the accuracy of estimates. The points in the plot below shows the model's estimates of how the day of death affects the reporting delay. Error bars represent 95% uncertainty intervals, higher values represent faster reporting (on average) than lower values, and values are shown relative to the effect of a death occurring on Monday. These estimates display a clear weekly cycle: the highest point in the week is Wednesday, meaning deaths occurring on Wednesday are reported most quickly; and the lowest point is Saturday, meaning deaths which happen on a Saturday are reported most slowly.
Estimated effect of the day of death on the reporting delay, relative to Monday and with 95% uncertainty intervals. Higher values mean faster reporting on average. The dashed line illustrates the estimated weekly cycle.
Daily deaths of hospital patients who have tested positive for Covid-19 are reported by NHS England, but this work was made possible by the compilation of these reports by Prof. Sheila Bird and Prof. Bent Neilson into a format suitable for modelling. Prof. Bird and Prof. Neilson have also developed their own model for nowcasting daily hospital deaths from Covid-19, details of which (as well as the data) can be found here.
Our statistical data model is based on our advanced multivariate hierarchical approach, which we recently extended to data with a spatial dimension. Here the number of deaths occurring on day t and in region r is denoted by y(t,r). This is assumed to be fully reported after a certain number of days have passed (the “cut-off”). Until this time has elapsed, y(t,r) is treated as an unknown quantity we would like to predict. Currently the cut-off is assumed to be 14 days after the date of death (t), meaning model predictions should be interpreted as the total number of deaths after 14 days of reporting delay. As we move into May, this cut-off will be increased for greater accuracy.
The number of deaths y(t,r) is modelled as arising from a probability distribution (called the Negative Binomial) p(y(t,r)∣λ(t,r),θ(r)), with a mean death rate λ(t,r) which follows a smooth trend in time, plus some noise determined by θ(r), both of which vary with geographical region.
Then, the number of deaths occurring on day t and in region r, but reported on day t+d, z(t,r,d) is modelled as arising from a multivariate count probability distribution (called the Generalized-Dirichlet-Multinomial) p(z(t,r)=z(t,r,1),…,z(t,r,12)∣p(t,r)=p(t,r,1),…,p(t,r,12),ϕ(r),y(t,r)), with the constraint that z(t,r,d) sums to y(t,r) over d. This model is characterized by the mean cumulative proportion reported p(t,r,d) after d days of delay - which varies with region, with time and crucially incorporates a weekly cycle (in the form of a day of the week factor) to potentially capture the widely discussed weekend-effect on the reporting of deaths – and some noise which is determined by ϕ(r) and exhibits covariance across delays.
Bayesian inference then makes it straightforward to generate nowcast predictions of the number of deaths y(t,r) occurring today or in recent days, based on any partial reports z(t,r,d) which are available so far. These predictions also draw information from any estimated smooth trends of time and regional effects in the mean number of deaths, and from the weekly cycle, any estimated smooth trends of time and any regional effects in the expected cumulative proportion reported. The smooth trends in time can finally be projected a few days into the future to allow forecasting predictions to be generated.
Predictions for the total hospital deaths in England are generated by summation of the predicted deaths for each region. To ensure these predictions are reasonable, smooth trends in time for each region are constrained by treating them as random effects with an overall smooth trend of time as the mean.
Full details of the models on which this is based can be found in Multivariate hierarchical frameworks for modeling delayed reporting in count data and A hierarchical modelling framework for correcting delayed reporting in spatio-temporal disease surveillance data.
Trends in the death rate and reporting delay
The plot below shows estimated (posterior median) smooth time trends from the part of the model for the true daily deaths (left panel) and estimated smooth time trends from the part of the model for the cumulative proportion reported (right panel, higher values mean faster reporting). A different colour is used for each region. In both plots the dashed line shows the overall mean effect for England. Uncertainty in these effects is quantified by the model and propagated into the nowcasts and forecasts, but isn’t plotted here for now to distinguish between the regions more easily.
Estimated smooth time effects on the daily Covid-19 hospital death rate (left) and estimated smooth time effects on the cumulative proportion reported (right, higher values mean faster reporting). Dashed lines show the overall mean effects for England.
All regions display a peak in the daily death rate between the 6th of April and the 13th of April, followed by a sustained decrease. The region with the fastest-decreasing death rate (on the log-scale) appears to be London. The trends in the cumulative proportion reported are generally increasing over the course of April, suggesting that reporting of daily deaths has sped up over time.
As the true number of deaths is not yet completely known, it is challenging to objectively assess the performance of the model’s nowcasts and forecasts. A subjective but reasonable way to assess these predictions is to treat the model’s current best estimate of the number of deaths occurring in recent days as the truth, and assess how well these pseudo-truths are captured by predictions from previous model fits (or more specifically fits to data which has been censored to reflect the data that would be available for modelling on a specific day).
The plots below show predictions (median of predictive distribution) from the most recent model fit as points, which we interpret as the pseudo-true values for the purpose of criticising previous nowcasts and forecasts. The coloured areas show 95% prediction intervals, corresponding to different model fit dates. Model nowcasts and forecasts are generally consistent with the pseudo-truths, with the exception of the set of predictions from the model fitted on Monday the 20th of April, which were very low compared to previous and subsequent sets of predictions. The data corresponding to this specific fit follows a weekend, meaning reports for the preceding few days were likely lower than expected, though our aim is that this weekly cycle should be captured by the day in the week effect in the model for the cumulative proportion reported. The significant drop on a Monday compared to previous predictions does not seem to have been repeated on Monday the 27th, which we suspect is a sign that the model now has enough information to identify a strong weekly cycle in the reporting.
Posterior 95% predicted daily hospital deaths from Covid-19 for each region. Different colours show the set of predictions corresponding to a model fitted to the data available on that day (or equivalent). Points show the median predicted deaths from the most recent model run.
In summary, we believe the results from the validation are compelling and exhibit practical levels of uncertainty for use in decision-making.
Stoner, O and Economou, T. Multivariate hierarchical frameworks for modelling delayed reporting in count data (2019a). Biometrics. https://doi.org/10.1111/biom.13188
Stoner, O and Economou, T. A hierarchical modelling framework for correcting delayed reporting in spatio-temporal disease surveillance data (2019b). arXiv e-print. https://arxiv.org/abs/1912.05965
For more information or queries please contact Dr Oliver Stoner, Research Fellow, Institute for Data Science and Artificial Intelligence, University of Exeter.