Chinese, Simplified
        
SEO Title
              
          下表列出了可用的开放源码数据质量软件发行版,涵盖了数据质量评估的某些方面。
纳入标准
- 在其中一个存储库中可公开访问的任何开放源代码发行版。为简洁起见,当存储库包含许多不同的工具时,只提供一个链接
 - 库/框架不必只关注数据质量,因为功能经常与数据清理或探索性数据分析捆绑在一起。
 - 数据质量评估在广泛不同的环境/工作流程(从验证excel表到大数据管道,离线/在线等)中非常重要,因此该列表包含了不同的集合
 - star/issue/fork计数作为成熟度的粗略衡量标准。使用风险自负
 
开源数据质量软件
| 1. Name | 2. Description | 3. Language | 4. Online Docs | 5. URL | 6. Stars | 7. Issues | 8. Forks | 
|---|---|---|---|---|---|---|---|
| 
 awslabs/ deequ  | 
Deequ is a library built on top of Apache Spark for defining "unit tests for data", which measure data quality in large datasets | Scala | github | 1328 | 90 | 256 | |
| 
 data-cleaning/ validate  | 
validate: Data cleaning for statistical purposes | R | docs | github | 236 | 21 | 18 | 
| 
 datacleaner/ DataCleaner  | 
DataCleaner Community Edition | Java | docs | github | 371 | 172 | 136 | 
| 
 daveoncode/ pyvaru  | 
pyvaru: Rule based data validation library for python | Python | docs | github | 14 | 1 | 3 | 
| 
 great-expectations/ great_expectations  | 
Great Expectations helps data teams eliminate pipeline debt, through data testing, documentation, and profiling | Python | docs | github | 3127 | 147 | 348 | 
| 
 OpenRefine/ OpenRefine  | 
openRefine is a tool for working with messy data | Java | docs | github | 7735 | 595 | 1376 | 
| 
 pandas-profiling/ pandas-profiling  | 
pandas-profiling generates profile reports from a pandas DataFrame | Python | docs | github | 6338 | 44 | 962 | 
| pyeve/cerberus | cerberus is a lightweight, extensible data validation library for Python | Python | docs | github | 2246 | 33 | 202 | 
| 
 ResidentMario/ missingno  | 
missingno is a missing data visualization module for Python | Python | github | 2540 | 15 | 334 | |
| 
 WeBankFinTech/ Qualitis  | 
Qualitis is a data quality management platform that supports quality verification, notification, and management for various datasources | Java | docs | github | 208 | 16 | 107 | 
| 
 whylabs/ whylogs-python  | 
whylogs-python is a Python implementation of whylogs | Python | docs | github | 191 | 10 | 7 | 
讨论:请加入知识星球【超级工程师】,微信【it_training】或者QQ群【11107767】
发布日期
              星期日, 一月 31, 2021 - 20:29
          最后修改
              星期四, 九月 7, 2023 - 22:17
          Article
      
      最新内容
- 3 weeks 2 days ago
 - 3 weeks 2 days ago
 - 3 weeks 2 days ago
 - 3 weeks 2 days ago
 - 3 weeks 2 days ago
 - 3 weeks 2 days ago
 - 3 weeks 2 days ago
 - 3 weeks 2 days ago
 - 3 weeks 2 days ago
 - 3 weeks 2 days ago