The beta regression has been received considerable attention in the last decade because of its applications to proportional data in several fields. We study the variability of coronavirus death rates in the first wave of twenty European countries using the beta regression with two systematic components for the mean and dispersion parameters. We prove empirically that the population density, proportion of urban population, hospital beds per 100 thousand and running time explain the variability of the COVID-19 death rates in the first wave of these countries.
The new CoV, discovered in China’s Wuhan province in late December 2019, was initially described as 2019-nCoV. After phylogenetic and pathophysiological analyzes, it was named SARS-CoV 2 due to the similarity it had with SARS-CoV.11 The complications caused by this agent came to be called COVID-19. It is characterized by a flu-like condition of fever and cough, which can progress to a stage of pneumonia and dyspnea in more severe cases.22 The disease incubation period varies from two to fourteen days. In many cases, individuals who become infected remain asymptomatic, but become potential vectors of transmission33 (Lauer et al., 2020). The method of contagion is direct, that is, through contact with the sick person through handshakes, saliva droplets, sneezing, coughing or vomits.44 Recent studies showed that this virus is able to survive in the air for more than three hours and on surfaces such as plastics and metals for up to three days (Van Dorealen et al., 2020). The main forms of prophylaxis are: environment and surface hygiene and social distance. Hypertension, obesity, organ transplantation, respiratory diseases, blood cancer and diabetes are the most common comorbidities among patients.5566
The adoption of systematic non-pharmaceutical interventions appear to have been associated with lower incidence and decreasing in mortality (Haug et al., 2020). However, to the best of our knowledge, there is no scientific evidence showing in what extent demographic and socioeconomic variables are related to the death rates in Europe.
We construct a beta regression with two systematic components to determine which independent variables affect mostly the COVID-19 mortality rates in the first wave of twenty countries of West Europe. The remainder of the paper is structured as follows. In Section 2, we review telegraphically the pandemic in West Europe, present the relevant data collected in the first wave of twenty European countries, and calculate some basic statistics. In Section 3, we introduce the beta regression including the estimation of its parameters and the correlation matrix of the data. We also review the simplex regression, construct the best beta regression fitted to the mortality rates, provide influence analysis and useful findings. Finally, some concluding remarks are offered in Section 4.
2.COVID-19 in Europe
According to the World Health Organization (WHO), the first cases of COVID-19 in Europe had been reported in France, on 24 January (two patients in Paris and one in Bordeaux). All had traveled to China. Tobías (2020) argued that the first cases of SARS-Cov-2 in Italy and Spain were confirmed one week apart. On July 4, estimates from Europe Center for Disease Prevention and Control (ECDC) indicate 2,458,791 confirmed cases with the following most affected countries: Russia (667,883), United Kingdom (284,276), Spain (250,545), Italy (241,184) and Germany (196,096).
ECDC provides risk assessments and categorizes areas according to updated epidemiological indicators such as transmission and detection rate. For instance, if basic reproduction number () is below one and there is extensive testing, the risk of the virus is considered low. Conversely, in areas without social distance policies and widespread transmission, the risk is very high. In response to the current pandemic, many governments worldwide have implemented social distancing policies with different levels of both enforcement and compliance. The European Commission presented on 27 May 2020 a revised long-term budget of 1,100 billion euros in the next seven years for supporting EU citizens, businesses and countries to recover from the economic downturn caused by the pandemic.
National governments around Europe adopted different institutional policies to curb transmission and offer economic support for infected citizens. For example, in Spain, the first confirmed case was diagnosed on 31 January in Canary Islands. Since then, national government approved an eight-piece legislation package to initially address the coronavirus crisis.77 The intervention included measures to promote economic development and to increase health infrastructure. On 12 February, the Ministry of Health recommended infection control measures for persons attending public events.88 Strict social distance measures were implemented on 12 March, including rigid control over elderly homes. The policy response to the novel virus in Spain argued that government interventions are primarily reactive rather than anticipatory. On July 9, Spain registered 299,593 confirmed cases and more than 28,000 deaths, which means a mortality rate of 607 per one million people.
Up until late February, UK government repeatedly insisted that the virus was a moderate risk and the own minister advocated in favor of herd immunity.99 In March 16, Imperial College released a report showing that an unmitigated epidemic could lead to 510,000 deaths in Great Britain (GB).1010In March 27, the prime minister Boris Johnson announced that he had developed mild symptoms and tested positive for the virus. Lockdown measures were adopted around GB with varying degrees of compliance and enforcement.1111 On 5 May, the United Kingdom became the first country in COVID-19 total deaths in Europe and estimates from July 9 suggest up to 45,000 fatalities which represents a 656 mortality rate per one million people. Following Spain and United Kingdom, the first case in Italy was diagnosed on 31 January. Data from July 9 indicate more than 242,000 confirmed cases and almost 35,000 deaths. With 577 deaths per one million people, the country is one of the most affected in Europe. According to official estimates, Bergamo is the epicenter of Lombardia SARS-CoV-2 outbreak with more than 6,000 deaths due to COVID-19 (Bernucci et al., 2020).1212 On March 8, the Italian government adopted strict social distance measures aiming to reduce the likelihood that infected people come into contact with symptomatic patients (Remuzzi and Renuzzi, 2020).1313 Examined deaths trends in Bergamo and Brescia1414 and suggested that containment measures adopted on March 11th reduced the spread of the epidemic. Different from other countries in Europe, Swedish government choose an alternative institutional response to the pandemic. Since the first confirmed case on 31 January, the official decision of not adopt lockdown measures were extreme controversial by health specialists.1515 An opinion poll conducted between 17–19 April by an independent agency indicated that 73% of respondents agreed with government strategy to contain the virus. On July 9, Sweden registered 73,858 confirmed cases and 5,482 deaths, which means a mortality rate of 543 per one million people. These figures put Sweden in an extreme worst position in comparison to neighboring countries such as Denmark (105), Finland (59) and Norway (46). On 15 June, for instance, Norway opened its borders to most Nordic countries, but excluded Swedish citizens.1616
The data for our study cover 20 European countries. To ensure comparability, we restrict our sample to the largest European economies. After ranking both EU and non-EU members according to population and gross domestic product (GDP), we arrive at the following twenty countries: Germany, France, Italy, the United Kingdom, Spain, Poland, Romania, Netherlands, Belgium, Greece, Czech Republic, Portugal, Sweden, Hungary, Austria, Switzerland, Bulgaria, Denmark, Finland and Slovakia. Data on the new CoV situation were retrieved from “Our World in Data” (OWID) repository. At the time this article was written, the data have records on COVID-19 cases from January 25, 2020 up until May 14 for our sample of countries. We consider as dependent variable the total deaths per million, which registers the accumulated number of deaths attributed to the virus per million people since the beginning of the 20th confirmed case for each country.
The estimated mortality rates on 4th July 2020 indicate (in decreasing mortality rates) that five countries (Belgium, United Kingdom, Spain, Italy and Sweden) have rates over 500 deaths per million, other six countries (France, Netherlands, Switzerland, Portugal, Germany and Denmark) have rates between 100 and 500 deaths per million, whereas nine countries (Romania, Austria, Hungary, Finland, Poland, Bulgaria, Czech Republic, Greece and Slovakia) have rates less than 100 deaths per million.
Clearly, the mortality rate in a country depends on the restrictions adopted to isolate the population and other strategies that are complicated to introduce in a regression model. Sweden has made no policy of isolation from the population, adopting the well-known herd immunity and, therefore, had a mortality rate 4.9 times higher than Germany, which adopted more restrictive isolation measures, and 7.6 times higher than the average for Finland, Norway and Denmark.
We consider the following independent variables: population density, which divides population by territory area (in km), percentage of urban population, median age of the population, human development Index (HDI), GDP per capita (in current USD) and available hospital beds per 100 thousand inhabitants. In addition to these explanatory variables, our analysis also accounted for the passage time. The date of the first registered case and the rate of contagion were not equal across the 20 countries. While France, for instance, registered its first case on January 25, Bulgaria’s first record was on March 8. Consequently, the time period for each country varies considerably. To ensure that all cases could be compared for the same time span, we register for each country the amount of total deaths per million on two moments: 30 days (20 observations) and 60 days (20 observations) after the date when the 20th detected case was recorded. An important observation is to note that we are considering 20 countries in two time periods. This allows us to obtain a balanced panel with two observations for each of the 20 countries.
So, the variables involved in the study are described below:
• MR (y): Mortality rate;
• PD: Population density (population/km);
• PUP: Percentage of Urban Population;
• HDI: Human Development Index;
• GDPPC: GDP per capita;
• BEDS: Hospital beds per 100 thousand inhabitants;
• TIME: 30 days and 60 days after the 20th confirmed case.
Some descriptive statistics for all variables are reported in Table 1. The source was elaborated by the authors based on data from OWID (Roser et al., 2020), UN Population Division (2019), UN Development Program (2019), and the World Bank (2018). The data for the variables PD, “median age” and BEDS are taken from Our World in Data, which in turn harnesses its indicators from multiple sources. Data for variable PUP (%) came from the UN Population Division (2019), for variable HDI from the UNDP website (2019) and for GPDPC from the World Bank (2018). For a full list, see.1717 The statistics in Table 1 indicate that the explanatory variables PD, PUP, GDPPC and BEDS show great variability, while the HDI does not change much in these twenty countries.
|MR (30 days)||20||29.1||19.9||32.2||0.4||103.9|
|MR (60 days)||20||177.1||75.4||192.9||4.8||670.0|
3.Methods and results
An important class of health problems involves proportional data such as mortality and infection rates of diseases. The beta distribution is useful for modeling random proportions measured continuously in the interval . It is a very versatile to model a variety of uncertainties in many epidemiological studies.
By choosing appropriate distributions for the response variable generally improve standard errors of the estimated regression coefficients. We adopt the re-parameterized beta density function introduced by Bayer and Cribari (2015) in terms of the mean parameter and dispersion parameter , namely
and is the complete gamma function. The idea of the beta regression was pioneered by Ferrari and Cribari-Neto (2004).
where is the variance function and is a kind of dispersion parameter.
Let be a set of independent observations from Eq. (1), where both mean and dispersion vary across observations, say (for ). We consider the beta regression with two systematic components given by
Here, and are strictly monotonic and twice differentiable functions, called the mean and dispersion link functions, respectively, and are linear predictors, is a -vector of non-stochastic independent variables and is a -vector of dispersion explanatory variables (both associated with the th observation) and and are and vectors of unknown regression coefficients.
Different link functions can be defined in Eq. (2) but the logits and are most common in applications. We can choose as a subset of . The linear predictor vectors are and .
where and are known functions of and . The maximum likelihood estimates (MLEs) of and can be determined as solutions of the nonlinear system of score equations and (for and ).
All computational procedures for maximizing Eq. (3) can be performed in the GAMLSS package in R.
In order to study departures from the distributional assumption as well as the presence of outliers, we consider the normalized randomized quantile residuals (Dunn & Smyth, 1996) given by , where is the inverse standard normal cumulative distribution and is the estimated beta cumulative distribution.
Further, we build envelopes to enable better interpretation of the normal probability plot of the residuals. These envelopes are simulated confidence bands (Atkinson, 1985) that contain the residuals such that if the regression is well-fitted, the majority of points will be randomly distributed within these bands.
We consider the mortality rate () per 100 millions to restrict the response variable in the interval . Figure 1 displays the histograms of this rate in the reported times.
• There is a discordant case corresponding to Belgium. This observation deviates the linear correlation between the response variable and some of the covariates. We discuss this case in the influence analysis.
There are simple computer programs to fit several kinds of regressions. We use the RS algorithm as described by Rigby and Stasinopoulos (2005) and Stasinopoulos and Rigby (2007) for maximizing Eq. (3) for the beta regression with two systematic components.
|y vs PD||y vs PUP||y vs HDI||y vs GDPPC||y vs BEDS|
The explanatory variables in the systematic components for the mean and dispersion were selected using the stepwise GAIC method. This strategy is described in Voudouris et al. (2012) and implemented in the GAMLSS package.
The explanatory variables HDI and GDPPC are not significant for the mortality rates. Considering the variables PD, PUP, BEDS and TIME for both mean and dispersion parameters, the first two variables are not significant in the systematic component for the dispersion parameter. Therefore, the final systematic components for the beta regression can be expressed as
where and .
In the following, we shall compare the beta and simplex regressions. The simplex density defined by Barndorff-Nielsen and Jørgensen (1991) for a univariate continuous random variable has the form
where is the mean parameter, is the dispersion parameter, is called the variance function and
is the unit deviance. The systematic components for and adopted for the simplex regression are those defined in Eq. (2).
We compare the fitted beta and simplex regressions using the global deviance (GD), Akaike information criterion (AIC) and Bayesian information criterion (BIC). These statistics in Table 3 indicate that the beta regression has lower values, and then it can be chosen as the best model to explain the coronavirus data.
Next, a study of variable selection was carried out using the stepwise method in the beta regression in order to find a reduced model. Further, the MLEs of the parameters and their standard errors (SEs) and corresponding -values from the fitted beta regression including only significant variables are given in Table 4.
The first tool to perform the sensitivity analysis called the global influence was introduced by Cook (1977). This method proposes the exclusion of cases to study the effect of the th observation when it is eliminated from the data set. The generalized Cook distance is a measure of global influence given by
where a quantity with subscript means the original quantity with the th observation deleted and is the observed information matrix.
An alternative measure to detect possible influential points is called the likelihood displacement
Figure 3 provides the plots of these measurements and the 4th observation is an influential case identified by , which corresponds to Belgium at the time equal to 30 days. This fact has already been commented on in the correlation plots.
The plot of the residuals versus the adjusted values is displayed in Fig. 4 by considering the two systematic components. We can note that in the systematic component of the parameter there is a small trend, whereas in the systematic component of the parameter , there is randomness.
The plot of the quantile residuals (see Section 3.1) versus the order of the observations for the fitted beta regression is displayed in Fig. 5a, which indicates that the residuals are randomly distributed. Further, we verify the quality of the adjustment range of the fitted beta regression by constructing a simulated envelope. The plot in Fig. 5b reveals that there is strong evidence of a good fit of the beta regression to the current data. We can note that there are no discrepant observations. Thus, even Belgium is well fitted by the beta regression.
The higher mortality rate in Belgium (one of the highest in the world) can be explained in great part for two facts outside our modeling: (i) the country has a global health security index (GHSI) lower than that of its neighbors. This indicator developed by the Johns Hopkins Center for Health Security assesses the overall protection capabilities of health systems worldwide. In order of safety, they appear: Netherlands 75.6, France 68.2, Germany 66 and Belgium 61. For more information on the calculation and methodology of this index, see https://www.ghsindex.org/; (ii) low compliance with measures of social confinement and closure of economic sectors in comparison to other neighbors. According to data from community mobility reports, this country had one of the lowest levels of social isolation on the continent during the period of study.
In the following, we present a brief summary of the results from the fitted beta regression with two systematic components defined before.
(i) Findings from the systematic component for the parameter :
– The population density is extremely significant () for the COVID-19 mean mortality rate. Thus, the mean mortality rate is much higher for countries most densely populated.
– The percentage of the urban population is also significant for a significance level of 0.06. Thus, this mean mortality rate is higher for countries with large urban areas.
– The hospital beds proportion is very significant () for the mean mortality rate. In other words, the mean mortality rate increases for countries with lower hospital beds per capita.
– Regarding the independent variable TIME which is highly significant (), there was a mortality increase rate between the 30th and 60th days (after the 20th confirmed case) since the global pandemic was spreading across Europe during this period.
In practical terms, the countries that have low proportions of hospital beds per 100 thousand inhabitants will have to increase their numbers of beds to cope with a future pandemic. Of course, the different public policies to control the pandemic adopted by countries are not including in our modeling.
The numbers of hospital beds are considered fixed for the beta and simplex regressions fitted to the mortality rates. There may have been small fluctuations in these numbers of beds during the studied periods. We did not have access to the fluctuations in the proportion of beds in the countries because this independent variable is supposed to be fixed in the proposed regressions. However, there is great complexity in studying these regressions with unit support for random explanatory variables, and we do not know of any theoretical involvement in this regard.
(ii) For the parameter :
– The hospital beds proportion explains the dispersion of the mortality rate for a significance level of 0.10.
– Finally, there is a highly significant () difference in the variability of the mortality rate from the 30th to 60th days.
(iii) Marginal analysis from the estimated beta regression:
We can study the effects of the independent variables on the mortality rates marginally. The marginal effects of any independent variable (say ) on the response variable MR (assuming the other variables fixed) are obtained by adding one unity to and calculating the marginal relative difference in the mean of MR. We can write . After some algebra, we obtain using the logit function
The last equation can be approximated by . So, an increase of one unity in the population density leads to an increase of 0.3% in the relative mean mortality rate (). Further, an increasing of 1% in the percentage of urban population yields an increasing of 2.7% in the relative mean mortality rate. In a similar manner, an increase of one unity to the hospital beds per 100 thousand inhabitants reduces the relative mean mortality rate of about 21%.
COVID-19 is a disease caused by a new type of coronavirus first identified in Whuan (China) in December 2019. Most countries in West Europe started reporting cases of people infected at the end of February 2020. Confirmed cases across Europe had doubled over periods of three to four days, and even doubling every two days for some countries. The main objective of this article is to identify some demographic, socio-economic and health-care variables that explain the mortality rate in 20 countries in the fist wave in Western Europe using a beta regression with two systematic components. We identify that the population density, percentage of urban population, hospital beds per 100 thousand and during time explain significantly the mean and variability of the death rates in these countries.
3 World Health Organization. Considerations for quarantine of individuals in the context of containment for coronavirus disease (COVID-19). Geneva: World Health Organization; 2020. https://www.who.int/publications/i/item/considerations-for-quarantine-of-individuals-in-the- context-of-containment-for-coronavirus-disease-(covid-19).
4 Transmission of COVID-19 virus by droplets and aerosols: A critical review on the unresolved dichotomy https://www.ncbi.nlm.nih.gov/pmc/ articles/PMC7293495/.
5 Comorbidity and its Impact on Patients with COVID-19 https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7314621/.
6 Comorbidities and the risk of severe or fatal outcomes associated with coronavirus disease 2019: A systematic review and meta-analysis https://www.sciencedirect.com/science/article/pii/S1201971220305725.
9 See: https://www.businessinsider.com/times-world-leaders-downplayed-the-coronavirus-threat-2020-4♯prime-minister-boris-johnson-and-the-uk-government-believed-the-coronavirus-was-a-moderate-risk-up-until-late-february-and-were-late-to-impose-a-national-lockdown-4.
11 To get more information to UK response to COVID-19, see: https://www.gov.uk/coronavirus.
12 According to Bernucci et al. (2020), “this epidemic is challenging the Italian Government and the health care system and is having a devastating impact on all the health activities, including the neurosurgical reality”. See: https://www.wfns.org/WFNSData/Uploads/files/Effects-of-the-COVID-19-Outbreak-in-Northern-Italy-Perspectives-from-the-Bergamo-Neurosurgery-Department.pdf.
15 See: https://doi.org/10.1136/bmj.m2376.
According to Bernucci, et al. ((2020) ). This epidemic is challenging the Italian Government and the health care system and is having a devastating impact on all the health activities, including the neurosurgical reality. See: https://www.wfns.org/WFNSData/Uploads/files/Effects-of-the-COVID-19-Outbreak-in-Northern-Italy-Perspectives-from-the-Bergamo-Neurosurgery-Department.pdf.
Atkinson, A. C. ((1985) ). Plots, transformations, and regression: An introduction to graphical methods of diagnostic regression analysis. Oxford: Clarendon Press Oxford.
Barndorff-Nielsen, O. E. & Jørgensen, B. ((1991) ). Some parametric models on the simplex, Journal of Multivariate Analysis, 39: , 106-116.
Bayer, F. M. & Cribari-Neto, F. ((2015) ). Bootstrap-based model selection criteria for beta regressions, Test, 24: , 776-795.
Cook, R. D. ((1977) ). Detection of influential observation in linear regression, Technometrics, 19: , 15-18.
Dunn, P. K. & Smyth, G. K. ((1996) ). Randomized quantile residuals, Journal of Computational and Graphical Statistics, 5: , 236-244.
Ferrari, S. & Cribari-Neto, F. ((2004) ). Beta regression for modelling rates and proportions, Journal of Applied Statistics, 31: , 799-815.
Guan, Y., Zheng, B. J., He, Y. Q., Liu, X. L., Zhuang, Z. X., Cheung, C. L., Luo, S. W., Li, P. H., Zhang, L. J., Guan, Y. J., Butt, K. M., Wong, K. L., Chan, K. W., Lim, Shortridge, W. K. F., Yuen, K. Y., Peiris, J. S. M. & Poon, L. L. M. ((2003) ). Isolation and characterization of viruses related to the SARS coronavirus from animals in Southern China, Science, 302: , 276-279.
Haug, N., Geyrhofer, L., Londei, A., Dervic, E., Desvars-Larrive, A., Loreto, V., Pimior, B., Thurner, S. & Klimek, P. ((2020) ). Aerosol and surface stability of SARS-CoV-2 as compared with SARS-CoV-1, Nature Human Behaviour, 4: , 1303-1312.
Lauer, S. A., Grantz, K. H., Bi, Q., Jones, F. K., Zheng, Q., Meredith, H. R., Azman, A. S., Reich, N. G. & Lessler, J. ((2020) ). The incubation period of coronavirus disease 2019 (COVID-19) from publicly reported confirmed cases: Estimation and application, Annals of Internal Medicine, 172: , 577-582.
Rigby, R. A. & Stasinopoulos, D. M. ((2005) ). Generalized additive models for location, scale and shape (with discussion), Applied Statistics, 54: , 507-554.
Roser, M., Ritchie, H., Ortiz-Ospina, E. & Hasell, J. ((2020) ). Coronavirus pandemic (COVID-19), Published online at OurWorldInData.org. Retrieved from: https://covid.ourworldindata.org/data/owid-covid-data.csv. Accessed on 14 May 2020.
Stasinopoulos, D. M. & Rigby, R. A. ((2007) ). Generalized additive models for location scale and shape (GAMLSS) in R, Journal of Statistical Software, 23: , 1-46.
To get more information to UK response to COVID-19, see: https://www.gov.uk/coronavirus.
Tobías, A. ((2020) ). Evaluation of the lockdowns for the SARS-CoV-2 epidemic in Italy and Spain after one month follow up, Science of the Total Environment, 138-539. Retrieved from: https://www.sciencedirect.com/science/article/pii/S0048969720320520. Accessed on 30 July 2020.
United Nations Development Program ((2019) ). 2019 Human development index ranking. Retrieved from: http://hdr.undp.org/en/content/2019-human-development-index-ranking. Accessed on 14 May 2020.
United Nations, Department of Economic and Social Affairs, Population Division. ((2019) ). World urbanization prospects: The 2018 revision (ST/ESA/SER.A/420). New York: United Nations. Retrieved from: https://population.un.org/wup/Publications/Files/WUP2018-Report.pdf. Accessed on 14 May 2020.
Van Doremalen, N., Morris, D. H., Holbrook, M. G., Gamble, A., Williamson, B. N., Tamin, A., Harcourt, J. L., Thornburg, N. J., Gerber, S. I., Lloyd-Smith, J. O., de Wit, E. & Munster, V. J. ((2020) ). Aerosol and surface stability of SARS-CoV-2 as compared with SARS-CoV-1, The New England Journal of Medicine, 382: , 1564-1567.
Voudouris, V., Gilchrist, R., Rigby, R., Sedgwick, J. & Stasinopoulos, D. ((2012) ). Modelling skewness and kurtosis with the BCPE density in GAMLSS, Journal of Applied Statistics, 39: , 1279-1293.
WorldBank ((2018) ). GDP per capita (current USS). Retrieved from: https://data.worldbank.org/indicator/NY.GDP.PCAP.CD. Accessed on 14 May 2020.
Wu, D., Wu, T., Liu, Q. & Yang, Z. ((2020) ). The SARS-CoV-2 outbreak: What we know, International Journal of Infectious Diseases. Retrieved from: https://www.ijidonline.com/article/S1201-9712(20)30123-5/fulltext. Accessed on 30 June 2020.