48 results found Sort:

2.0k
11.0k
bsd-3-clause
469
OpenRefine is a free, open source power tool for working with messy data and improving it
Created 2012-10-15
8,481 commits to master branch, last one 4 days ago
140
7.3k
mit
32
Select, put and delete data from JSON, TOML, YAML, XML and CSV files with a single tool. Supports conversion between formats and can be used as a Go package.
Created 2020-09-22
710 commits to master branch, last one 2 months ago
A Collection of Cheatsheets, Books, Questions, and Portfolio For DS/ML Interview Prep
Created 2018-08-09
520 commits to master branch, last one 3 months ago
Carefully curated resource links for data science in one place
Created 2018-12-27
112 commits to master branch, last one 2 years ago
74
2.6k
unlicense
18
Blazing-fast Data-Wrangling toolkit
Created 2020-12-11
11,203 commits to master branch, last one about an hour ago
93
2.1k
apache-2.0
15
ETL, Analytics, Versioning for Unstructured Data
Created 2024-06-25
419 commits to main branch, last one a day ago
A Python toolbox for gaining geometric insights into high-dimensional data
Created 2016-09-27
1,652 commits to master branch, last one 9 months ago
132
1.8k
other
29
Zui is a powerful desktop application for exploring and working with data. The official front-end to the Zed lake.
Created 2018-08-09
5,506 commits to main branch, last one about a month ago
232
1.5k
apache-2.0
38
:truck: Agile Data Preparation Workflows made easy with Pandas, Dask, cuDF, Dask-cuDF, Vaex and PySpark
Created 2017-07-13
6,411 commits to develop branch, last one about a year ago
The JavaScript data transformation and analysis toolkit inspired by Pandas and LINQ.
Created 2017-11-27
510 commits to master branch, last one about a month ago
105
1.3k
bsd-3-clause
22
Prepping tables for machine learning
Created 2018-03-12
1,725 commits to main branch, last one 9 days ago
Statistical Inference via Data Science: A ModernDive into R and the Tidyverse
Created 2016-08-29
2,222 commits to master branch, last one 7 days ago
99
621
other
56
Microsoft Program Synthesis using Examples SDK is a framework of technologies for the automatic generation of programs from input-output examples. This repo includes samples and sample data for the Mi...
Created 2015-10-21
238 commits to main branch, last one 17 days ago
Materials for following along with Hands-On Data Analysis with Pandas – Second Edition
Created 2020-08-24
629 commits to master branch, last one 2 months ago
Materials for following along with Hands-On Data Analysis with Pandas.
Created 2018-09-15
228 commits to master branch, last one about a year ago
Desbordante is a high-performance data profiler that is capable of discovering many different patterns in data using various algorithms. It also allows to run data cleaning scenarios using these algor...
Created 2020-04-09
1,495 commits to main branch, last one 9 days ago
An introductory workshop on pandas with notebooks and exercises for following along. Slides contain all solutions.
Created 2021-05-15
310 commits to main branch, last one about a month ago
Data Analysis and Visualization in R for Ecologists
Created 2015-04-02
1,298 commits to main branch, last one 11 days ago
Pacote que trata e organiza os dados do Cadastro Nacional da Pessoa Jurídica (CNPJ)
Created 2019-03-26
77 commits to master branch, last one 3 years ago
14
310
mit
20
Like awk, but with SQL and table joins
Created 2015-01-16
241 commits to master branch, last one 26 days ago
13
294
other
12
Tools for test driven data-wrangling and data validation.
Created 2016-05-12
2,173 commits to master branch, last one 3 years ago
175
282
unknown
23
Data Cleaning Libraries with Python
Created 2017-04-24
13 commits to master branch, last one 5 years ago
32
180
unknown
23
Catmandu - a data processing toolkit
Created 2010-07-01
2,483 commits to dev branch, last one 2 months ago
13
168
gpl-3.0
6
CSV Lint plug-in for Notepad++ for syntax highlighting, csv validation, automatic column and datatype detecting, fixed width datasets, change datetime format, decimal separator, sort data, count uniqu...
Created 2019-12-15
347 commits to master branch, last one 3 months ago
Plotting and Programming in Python
Created 2016-01-07
1,264 commits to main branch, last one 10 days ago
R for Reproducible Scientific Analysis
Created 2015-04-18
1,628 commits to main branch, last one 17 days ago
Programming with R
Created 2014-12-18
1,407 commits to main branch, last one 11 days ago
Data Analysis and Visualization in Python for Ecologists
Created 2015-03-19
1,176 commits to main branch, last one 2 months ago
24
159
gpl-3.0
13
Data transformation and utility functions for R
Created 2015-03-21
952 commits to master branch, last one 7 months ago
Springboard Program: Data Science Career Track - NLP
Created 2018-10-12
467 commits to master branch, last one 4 years ago