Abstracting and Indexing

  • CrossRef
  • WorldCat
  • Google Scholar
  • ResearchGate
  • Academic Keys
  • DRJI
  • Microsoft Academic
  • Academia.edu
  • OpenAIRE

Scrutinising the COVID-19 Data on 10.676.000 Cases. A Novel Method using Retrospective, Population-based Descriptive Study for Data Quality Surveillance and a Review at 181.426.000 Cases

Author(s): Oriol Gallemí Rovira

Background: Reports on the detected positive patients with COVID-19 are as per today the best estimation of a country spread of the pandemic. In order to evaluate the early indicators for true lethality and recovery time, the data where the model is built must be quality checked. Each country sets different procedures and criteria for fatality count due to COVID-19 and the health system is stressed due to insufficient testing capabilities, untracked infectious and premature discharges. In this paper the dynamics behind such data quality issues are discussed throughout the clinical course to support better modeling and decision-making processes in a stressed healthcare system.

Methods: Based on data compiled and relayed by the Johns Hopkins University, tracking COVID-19 over 10.675.596 infections (July, 1st, 2020), the data is clustered and compared with discrete regression. Regression parameters are restricted by a time interval of 1 day and must be consistent and explanatory on the diagnostic (i.e. a fatality cannot occur before the patient displays symptoms). Cumulative infection curves are taken and built by holding a zero when the infections were lowest at the northern hemisphere. Data is picked from JHU consolidated repository. Infection synthetic curves are built from the Fatality count and the Recovered patient count. The adjusted parameters are τ=time to fatality (days), δ=time to discharge of recovered patients (days) and φ=case fatality rate (CFR in per unit, P.U.). Therefore, the discharge rate (recovery rate) is forced to be (1- φ). Also, a recovery coverage is set in order to determine the number of untracked discharged patients.

Using forward or backward calculations have no influence than the time reference. In both circumstances, time from Onset and Symptoms are neglected and shall be added if such dates are to be plot. There is a gap of 10 days since exposure to Hospital Admission and detection. Having an early diagnosis is of paramount relevance to slow down the infection progress. Cumulative figures are used to smoothen the deviation and to provide the best estimator possible at the present time. The delay factors allow to compare figures belonging to the same date of detection, displacing the curves on the time axis, and allowing to compare the shape of detected infections Vs reconstructed fatalities and reconstructed hospital discharges. In theory, all curves must be similar, but the Healthcare (HC) system capacity is limited and sometimes cannot follow exponential growth.

Fast, daily models which can be used and integrated to a filtering stage on the parameter estimator are left out of scope. Continuous models can also be used and interpolation among the data points is another source of noise to be considered, especially when counting and detection methods are suddenly changing as it is the case with COVID-19. Countries were selected mostly for methodology illustration purposes. Results are discussed and compared across the different groups and potential indicators of this behavior are drawn for further study.

Findings: From 181.425.785 cases in the sample, and the 7 representative samples, the recovery time and the local CFR were found in the past negatively correlated [1]. Therefore, anomalous CFR can be an indicator of data inconsistencies (i.e. Germany CFR of 2,4% and τ of 29 days). At the review part, focus is made on the inconsistencies detected in Germany, Belgium, and Spain as well as the potential misfits on US data. Overall, τ has increased from 6 days in average in 2020 to 12 days in 2021. Germany and US have the longest delays from detection to fatality with 29 and 26 days respectively, which is mostly inconsistent with the average clinical course. Italy holds the longest recovery time and an average τ on 31 and 14 days since detection. To date, average discharge is given at the same time of τ. One potential cause is that positive individuals passing beyond the two-week interval after positive are considered safe and therefore is preferred to free hospital beds.

Interpretation: One simple explanation for the local CFR and Recovery time correlation is to define such rate as a measure of the healthcare system overload. Anomalous CFR indexes point to a stressed healthcare system. The higher the overload, the more focus on critical cases testing, and hence the higher local CFR. By July 1st 2021, the system is not overloaded in the northern hemisphere, displaying consistent CFR among countries, although displaying different discharge time at 1,8% of positive patients. In Spain positive tests account for 5.87% (yearly) [2]. The COVID-19 intrinsic CFR is unlikely to change by a factor of 10x from countries with similar lifestyle, GDP per capita and health services. Because of this fact, early CFR measured before HC system overwhelming (COVID-19 free flow) are more accurate than the measured CFR while the outbreak is still ongoing. Finally, the synthetic Infection indexes are an indirect measure of the real population infection rate and must be used for data quality audit. Any model built upon inconsistent data will be complex to explain and justify.

Grant Support Articles

© 2016-2022, Copyrights Fortune Journals. All Rights Reserved!