Volltext-Downloads (blau) und Frontdoor-Views (grau)
  • search hit 3 of 5
Back to Result List

Bitte verwenden Sie diesen Link, wenn Sie dieses Dokument zitieren oder verlinken wollen: https://nbn-resolving.org/urn:nbn:de:gbv:9-opus-61948

R Packages for Data Quality Assessments and Data Monitoring: A Software Scoping Review with Recommendations for Future Developments

  • Data quality assessments (DQA) are necessary to ensure valid research results. Despite the growing availability of tools of relevance for DQA in the R language, a systematic comparison of their functionalities is missing. Therefore, we review R packages related to data quality (DQ) and assess their scope against a DQ framework for observational health studies. Based on a systematic search, we screened more than 140 R packages related to DQA in the Comprehensive R Archive Network. From these, we selected packages which target at least three of the four DQ dimensions (integrity, completeness, consistency, accuracy) in a reference framework. We evaluated the resulting 27 packages for general features (e.g., usability, metadata handling, output types, descriptive statistics) and the possible assessment’s breadth. To facilitate comparisons, we applied all packages to a publicly available dataset from a cohort study. We found that the packages’ scope varies considerably regarding functionalities and usability. Only three packages follow a DQ concept, and some offer an extensive rule-based issue analysis. However, the reference framework does not include a few implemented functionalities, and it should be broadened accordingly. Improved use of metadata to empower DQA and user-friendliness enhancement, such as GUIs and reports that grade the severity of DQ issues, stand out as the main directions for future developments.

Download full text files

Export metadata

Additional Services

Search Google Scholar

Statistics

frontdoor_oas
Metadaten
Author: Joany Mariño, Elisa Kasbohm, Stephan Struckmann, Lorenz A. Kapsner, Carsten O. Schmidt
URN:urn:nbn:de:gbv:9-opus-61948
DOI:https://doi.org/10.3390/app12094238
ISSN:2076-3417
Parent Title (English):Applied Sciences
Publisher:MDPI
Place of publication:Basel
Document Type:Article
Language:English
Date of first Publication:2022/04/22
Release Date:2022/11/15
Tag:R project for statistical computing; data quality; data quality monitoring; data reporting; exploratory data analysis; initial data analysis
Volume:12
Issue:9
Article Number:4238
Page Number:26
Faculties:Universitätsmedizin / Institut für Community Medicine
Collections:weitere DFG-förderfähige Artikel
Licence (German):License LogoCreative Commons - Namensnennung