157 results found Sort:
- Filter by Primary Language:
- Python (109)
- Jupyter Notebook (22)
- C# (4)
- C++ (4)
- MATLAB (2)
- Rust (2)
- Go (2)
- Java (2)
- JavaScript (2)
- Rich Text Format (1)
- HTML (1)
- +
Code for Machine Learning for Algorithmic Trading, 2nd edition.
Created
2018-05-09
351 commits to main branch, last one 2 years ago
Mimesis is a robust data generator for Python that can produce a wide range of fake data in multiple languages.
Created
2016-09-09
2,657 commits to master branch, last one 2 months ago
Data processing for and with foundation models! 🍎 🍋 🌽 ➡️ ➡️🍸 🍹 🍷
Created
2023-08-01
349 commits to main branch, last one 11 hours ago
Open Source Data Security Platform for Developers to Monitor and Detect PII, Anonymize Production Data and Sync it across environments.
Created
2023-08-24
2,508 commits to main branch, last one 2 days ago
The easiest tool for fine-tuning LLM models, synthetic data generation, and collaborating on datasets.
Created
2024-07-23
1,521 commits to main branch, last one 6 hours ago
A procedural Blender pipeline for photorealistic training image generation
Created
2019-10-10
5,201 commits to main branch, last one 2 months ago
Distilabel is a framework for synthetic data and AI feedback for engineers who need fast, reliable and scalable pipelines based on verified research papers.
Created
2023-10-16
840 commits to main branch, last one 2 months ago
Synthetic data generation for tabular data
Created
2018-05-11
1,858 commits to main branch, last one a day ago
Synthetic Patient Population Simulator
Created
2016-06-17
4,859 commits to master branch, last one 5 months ago
SDG is a specialized framework designed to generate high-quality structured tabular data.
Created
2023-08-10
281 commits to main branch, last one about a month ago
UnrealCV: Connecting Computer Vision to Unreal Engine
Created
2016-09-08
1,200 commits to 5.2 branch, last one about a month ago
Synthetic data generators for tabular and time-series data
Created
2020-05-04
260 commits to dev branch, last one about a month ago
The Declarative Data Generator
Created
2020-08-09
350 commits to master branch, last one 6 months ago
Conditional GAN for generating synthetic tabular data.
Created
2019-09-08
414 commits to main branch, last one 4 days ago
PostgreSQL database anonymization and synthetic data generation tool
Created
2023-12-01
488 commits to main branch, last one 24 hours ago
Synthetic data curation for post-training and structured data extraction
Created
2024-10-28
1,518 commits to main branch, last one 14 hours ago
A framework for comprehensive diagnosis and optimization of agents using simulated, realistic synthetic interactions
Created
2024-10-28
289 commits to main branch, last one 7 days ago
DataDreamer: Prompt. Generate Synthetic Data. Train & Align Models. 🤖💤
Created
2023-06-02
83 commits to main branch, last one 2 months ago
A lightweight library for generating synthetic instruction tuning datasets for your data without GPT.
Created
2024-02-25
35 commits to main branch, last one about a month ago
Curated list of open source tooling for data-centric AI on unstructured data.
nlp
data-drift
awesome-list
noisy-labels
data-curation
deep-learning
bias-detection
explainable-ai
feature-vector
synthetic-data
active-learning
computer-vision
data-centric-ai
data-versioning
machine-learning
outlier-detection
data-visualization
documentation-only
uncertainty-estimation
robust-machine-learning
Created
2023-02-27
34 commits to main branch, last one about a year ago
A tool that uses advanced Monte Carlo simulations and Turbit parallel processing to create possible Bitcoin prediction scenarios.
Created
2024-08-02
6 commits to main branch, last one 8 months ago
Configurable Generation of Synthetic Schemas and Knowledge Graphs at Your Fingertips
Created
2023-09-07
47 commits to master branch, last one 9 months ago
[ICLR 2025] Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing. Your efficient and high-quality synthetic data generation pipeline!
Created
2024-06-12
74 commits to main branch, last one 25 days ago
Synthetic data generators for structured and unstructured text, featuring differentially private learning.
Created
2020-03-02
360 commits to master branch, last one 24 days ago
A multi-purpose LLM framework for RAG and data creation.
This repository has been archived
(exclude archived)
Created
2023-09-15
196 commits to main branch, last one about a year ago
A library to model multivariate data using copulas.
Created
2017-11-13
895 commits to main branch, last one 8 days ago
A curated list of awesome projects which use Machine Learning to generate synthetic content.
Created
2019-02-19
50 commits to master branch, last one 2 years ago
A library for generating and evaluating synthetic tabular data for privacy, fairness and data augmentation.
Created
2022-03-18
172 commits to main branch, last one 3 months ago
[ICML 2023] The official implementation of the paper "TabDDPM: Modelling Tabular Data with Diffusion Models"
Created
2022-10-02
8 commits to main branch, last one 2 years ago
Official code for our CVPR '22 paper "Dataset Distillation by Matching Training Trajectories"
Created
2022-03-21
17 commits to main branch, last one 8 months ago