51 results found Sort:
- Filter by Primary Language:
- Python (24)
- Jupyter Notebook (9)
- R (2)
- C++ (2)
- JavaScript (2)
- Julia (2)
- Vue (1)
- Go (1)
- HTML (1)
- TypeScript (1)
- C# (1)
- +
The standard data-centric AI package for data quality and machine learning with messy, real-world data and labels.
Created
2018-05-11
1,743 commits to master branch, last one 28 days ago
Miller is like awk, sed, cut, join, and sort for name-indexed data such as CSV, TSV, and tabular JSON
Created
2015-05-03
8,946 commits to main branch, last one a day ago
Refine high-quality datasets and visual AI models
Created
2020-04-22
21,897 commits to develop branch, last one a day ago
A light-weight, flexible, and expressive statistical data testing library
Created
2018-11-01
779 commits to main branch, last one 8 days ago
Jupyter notebook and datasets from the pandas video series
Created
2016-03-31
88 commits to master branch, last one 8 months ago
General Assembly's 2015 Data Science course in Washington, DC
Created
2015-08-07
119 commits to master branch, last one 8 years ago
:truck: Agile Data Preparation Workflows made easy with Pandas, Dask, cuDF, Dask-cuDF, Vaex and PySpark
Created
2017-07-13
6,411 commits to develop branch, last one about a year ago
simple tools for data cleaning in R
Created
2016-04-12
981 commits to main branch, last one a day ago
The JavaScript data transformation and analysis toolkit inspired by Pandas and LINQ.
Created
2017-11-27
510 commits to master branch, last one 26 days ago
Prepping tables for machine learning
Created
2018-03-12
1,696 commits to main branch, last one a day ago
An open-source educational chat model from ICALK, East China Normal University. 开源中英教育对话大模型。(通用基座模型,GPU部署,数据清理) 致敬: LLaMA, MOSS, BELLE, Ziya, vLLM
Created
2023-06-26
91 commits to main branch, last one about a month ago
Schema-Inspector is a simple JavaScript object sanitization and validation module.
Created
2014-01-02
169 commits to master branch, last one about a year ago
Easy to use Python library of customized functions for cleaning and analyzing data.
Created
2020-03-25
884 commits to main branch, last one 19 days ago
The toolkit to test, validate, and evaluate your models and surface, curate, and prioritize the most valuable data for labeling.
Created
2022-09-21
653 commits to main branch, last one 5 months ago
Professional data validation for the R environment
Created
2014-02-21
809 commits to master branch, last one about a month ago
Desbordante is a high-performance data profiler that is capable of discovering many different patterns in data using various algorithms. It also allows to run data cleaning scenarios using these algor...
data-mining
correlations
data-science
spreadsheets
tabular-data
data-cleaning
data-analytics
data-cleansing
data-profiling
data-wrangling
data-engineering
data-exploration
anomaly-detection
feature-selection
data-preprocessing
feature-extraction
feature-engineering
knowledge-discovery
data-mining-algorithms
exploratory-data-analysis
Created
2020-04-09
1,429 commits to main branch, last one 19 hours ago
Deal with bad samples in your dataset dynamically, use Transforms as Filters, and more!
Created
2018-10-05
30 commits to master branch, last one 3 years ago
🗣️ A book and repo to get you started programming voice computing applications in Python (10 chapters and 200+ scripts).
Created
2018-06-18
646 commits to master branch, last one 3 years ago
Data Science Feature Engineering and Selection Tutorials
Created
2021-05-07
68 commits to main branch, last one 2 years ago
A domain-specific probabilistic programming language for scalable Bayesian data cleaning
Created
2019-09-30
22 commits to master branch, last one 2 years ago
Pydantic extension for annotating autocorrecting fields.
Created
2024-02-17
106 commits to main branch, last one 8 months ago
LLM-based text extraction from unstructured data like PDFs, Words and HTMLs. Transform and cluster the text into your desired format. Less information loss, more interpretation, and faster R&D!
Created
2023-09-05
890 commits to main branch, last one 6 months ago
CSV Lint plug-in for Notepad++ for syntax highlighting, csv validation, automatic column and datatype detecting, fixed width datasets, change datetime format, decimal separator, sort data, count uniqu...
Created
2019-12-15
347 commits to master branch, last one 2 months ago
🗺️ Data Cleaning and Textual Data Visualization 🗺️
Created
2022-05-19
1,023 commits to main branch, last one 5 months ago
An R package for data screening
Created
2016-09-26
493 commits to master branch, last one 2 years ago
🚕 A spreadsheet-like data preparation web app that works over Optimus (Pandas, Dask, cuDF, Dask-cuDF, Spark and Vaex)
Created
2018-08-16
2,174 commits to develop branch, last one 2 years ago
🤖 An automated machine learning framework for audio, text, image, video, or .CSV files (50+ featurizers and 15+ model trainers). Python 3.6 required.
Created
2019-05-26
3,418 commits to master branch, last one about a year ago
Outlier Detection Thresholding
Created
2022-05-29
383 commits to main branch, last one about a month ago
pyDVL is a library of stable implementations of algorithms for data valuation and influence function computation
Created
2021-04-02
3,956 commits to develop branch, last one 2 months ago
Cluster and merge similar string values: an R implementation of Open Refine clustering algorithms
Created
2017-03-04
246 commits to master branch, last one 8 months ago