54 results found Sort:
- Filter by Primary Language:
- Python (25)
- Jupyter Notebook (9)
- R (2)
- C++ (2)
- JavaScript (2)
- Julia (2)
- Vue (1)
- Go (1)
- HTML (1)
- TSQL (1)
- TypeScript (1)
- C# (1)
- +
The standard data-centric AI package for data quality and machine learning with messy, real-world data and labels.
Created
2018-05-11
1,767 commits to master branch, last one 11 days ago
Refine high-quality datasets and visual AI models
Created
2020-04-22
22,731 commits to develop branch, last one a day ago
Miller is like awk, sed, cut, join, and sort for name-indexed data such as CSV, TSV, and tabular JSON
Created
2015-05-03
8,978 commits to main branch, last one 6 days ago
A light-weight, flexible, and expressive statistical data testing library
Created
2018-11-01
816 commits to main branch, last one 5 days ago
Jupyter notebook and datasets from the pandas video series
Created
2016-03-31
88 commits to master branch, last one 11 months ago
General Assembly's 2015 Data Science course in Washington, DC
Created
2015-08-07
119 commits to master branch, last one 8 years ago
:truck: Agile Data Preparation Workflows made easy with Pandas, Dask, cuDF, Dask-cuDF, Vaex and PySpark
Created
2017-07-13
6,411 commits to develop branch, last one about a year ago
simple tools for data cleaning in R
Created
2016-04-12
985 commits to main branch, last one about a month ago
The JavaScript data transformation and analysis toolkit inspired by Pandas and LINQ.
Created
2017-11-27
510 commits to master branch, last one 3 months ago
Prepping tables for machine learning
Created
2018-03-12
1,745 commits to main branch, last one 3 days ago
An open-source educational chat model from ICALK, East China Normal University. 开源中英教育对话大模型。(通用基座模型,GPU部署,数据清理) 致敬: LLaMA, MOSS, BELLE, Ziya, vLLM
Created
2023-06-26
91 commits to main branch, last one 4 months ago
Schema-Inspector is a simple JavaScript object sanitization and validation module.
Created
2014-01-02
170 commits to master branch, last one 2 months ago
Easy to use Python library of customized functions for cleaning and analyzing data.
Created
2020-03-25
887 commits to main branch, last one about a month ago
The toolkit to test, validate, and evaluate your models and surface, curate, and prioritize the most valuable data for labeling.
Created
2022-09-21
654 commits to main branch, last one 11 days ago
Professional data validation for the R environment
Created
2014-02-21
812 commits to master branch, last one 2 months ago
Desbordante is a high-performance data profiler that is capable of discovering many different patterns in data using various algorithms. It also allows to run data cleaning scenarios using these algor...
data-mining
correlations
data-science
spreadsheets
tabular-data
data-cleaning
data-analytics
data-cleansing
data-profiling
data-wrangling
data-engineering
data-exploration
anomaly-detection
feature-selection
data-preprocessing
feature-extraction
feature-engineering
knowledge-discovery
data-mining-algorithms
exploratory-data-analysis
Created
2020-04-09
1,635 commits to main branch, last one a day ago
🗣️ A book and repo to get you started programming voice computing applications in Python (10 chapters and 200+ scripts).
Created
2018-06-18
646 commits to master branch, last one 3 years ago
Deal with bad samples in your dataset dynamically, use Transforms as Filters, and more!
Created
2018-10-05
30 commits to master branch, last one 3 years ago
Data Science Feature Engineering and Selection Tutorials
Created
2021-05-07
68 commits to main branch, last one 2 years ago
A domain-specific probabilistic programming language for scalable Bayesian data cleaning
Created
2019-09-30
22 commits to master branch, last one 2 years ago
Pydantic extension for annotating autocorrecting fields.
This repository has been archived
(exclude archived)
Created
2024-02-17
106 commits to main branch, last one 10 months ago
LLM-based text extraction from unstructured data like PDFs, Words and HTMLs. Transform and cluster the text into your desired format. Less information loss, more interpretation, and faster R&D!
Created
2023-09-05
890 commits to main branch, last one 9 months ago
CSV Lint plug-in for Notepad++ for syntax highlighting, csv validation, automatic column and datatype detecting, fixed width datasets, change datetime format, decimal separator, sort data, count uniqu...
Created
2019-12-15
350 commits to master branch, last one about a month ago
🗺️ Data Cleaning and Textual Data Visualization 🗺️
Created
2022-05-19
1,023 commits to main branch, last one 8 months ago
An R package for data screening
Created
2016-09-26
493 commits to master branch, last one 3 years ago
🤖 An automated machine learning framework for audio, text, image, video, or .CSV files (50+ featurizers and 15+ model trainers). Python 3.6 required.
Created
2019-05-26
3,418 commits to master branch, last one about a year ago
🚕 A spreadsheet-like data preparation web app that works over Optimus (Pandas, Dask, cuDF, Dask-cuDF, Spark and Vaex)
Created
2018-08-16
2,174 commits to develop branch, last one 2 years ago
Outlier Detection Thresholding
Created
2022-05-29
400 commits to main branch, last one 16 days ago
pyDVL is a library of stable implementations of algorithms for data valuation and influence function computation
Created
2021-04-02
4,068 commits to develop branch, last one 18 days ago
Cluster and merge similar string values: an R implementation of Open Refine clustering algorithms
Created
2017-03-04
246 commits to master branch, last one 11 months ago