157 results found Sort:

Code for Machine Learning for Algorithmic Trading, 2nd edition.
Created 2018-05-09
351 commits to main branch, last one 2 years ago
337
4.5k
mit
59
Mimesis is a robust data generator for Python that can produce a wide range of fake data in multiple languages.
Created 2016-09-09
2,657 commits to master branch, last one 2 months ago
226
4.2k
apache-2.0
20
Data processing for and with foundation models! 🍎 🍋 🌽 ➡️ ➡️🍸 🍹 🍷
Created 2023-08-01
349 commits to main branch, last one 11 hours ago
152
3.8k
other
22
Open Source Data Security Platform for Developers to Monitor and Detect PII, Anonymize Production Data and Sync it across environments.
Created 2023-08-24
2,508 commits to main branch, last one 2 days ago
234
3.4k
other
35
The easiest tool for fine-tuning LLM models, synthetic data generation, and collaborating on datasets.
Created 2024-07-23
1,521 commits to main branch, last one 6 hours ago
462
3.0k
gpl-3.0
43
A procedural Blender pipeline for photorealistic training image generation
Created 2019-10-10
5,201 commits to main branch, last one 2 months ago
190
2.6k
apache-2.0
23
Distilabel is a framework for synthetic data and AI feedback for engineers who need fast, reliable and scalable pipelines based on verified research papers.
Created 2023-10-16
840 commits to main branch, last one 2 months ago
717
2.5k
apache-2.0
77
Synthetic Patient Population Simulator
Created 2016-06-17
4,859 commits to master branch, last one 5 months ago
SDG is a specialized framework designed to generate high-quality structured tabular data.
Created 2023-08-10
281 commits to main branch, last one about a month ago
442
2.0k
mit
93
UnrealCV: Connecting Computer Vision to Unreal Engine
Created 2016-09-08
1,200 commits to 5.2 branch, last one about a month ago
Synthetic data generators for tabular and time-series data
Created 2020-05-04
260 commits to dev branch, last one about a month ago
109
1.4k
apache-2.0
25
The Declarative Data Generator
Created 2020-08-09
350 commits to master branch, last one 6 months ago
307
1.4k
other
23
Conditional GAN for generating synthetic tabular data.
Created 2019-09-08
414 commits to main branch, last one 4 days ago
32
1.3k
apache-2.0
4
PostgreSQL database anonymization and synthetic data generation tool
Created 2023-12-01
488 commits to main branch, last one 24 hours ago
90
1.2k
apache-2.0
9
Synthetic data curation for post-training and structured data extraction
Created 2024-10-28
1,518 commits to main branch, last one 14 hours ago
129
1.0k
apache-2.0
28
A framework for comprehensive diagnosis and optimization of agents using simulated, realistic synthetic interactions
Created 2024-10-28
289 commits to main branch, last one 7 days ago
DataDreamer: Prompt. Generate Synthetic Data. Train & Align Models.   🤖💤
Created 2023-06-02
83 commits to main branch, last one 2 months ago
49
764
bsd-3-clause
12
A lightweight library for generating synthetic instruction tuning datasets for your data without GPT.
Created 2024-02-25
35 commits to main branch, last one about a month ago
414
684
mit
4
A tool that uses advanced Monte Carlo simulations and Turbit parallel processing to create possible Bitcoin prediction scenarios.
Created 2024-08-02
6 commits to main branch, last one 8 months ago
Configurable Generation of Synthetic Schemas and Knowledge Graphs at Your Fingertips
Created 2023-09-07
47 commits to master branch, last one 9 months ago
[ICLR 2025] Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing. Your efficient and high-quality synthetic data generation pipeline!
Created 2024-06-12
74 commits to main branch, last one 25 days ago
Synthetic data generators for structured and unstructured text, featuring differentially private learning.
Created 2020-03-02
360 commits to master branch, last one 24 days ago
54
621
apache-2.0
13
A multi-purpose LLM framework for RAG and data creation.
This repository has been archived (exclude archived)
Created 2023-09-15
196 commits to main branch, last one about a year ago
116
585
other
20
A library to model multivariate data using copulas.
Created 2017-11-13
895 commits to main branch, last one 8 days ago
A curated list of awesome projects which use Machine Learning to generate synthetic content.
Created 2019-02-19
50 commits to master branch, last one 2 years ago
71
532
apache-2.0
15
A library for generating and evaluating synthetic tabular data for privacy, fairness and data augmentation.
Created 2022-03-18
172 commits to main branch, last one 3 months ago
[ICML 2023] The official implementation of the paper "TabDDPM: Modelling Tabular Data with Diffusion Models"
Created 2022-10-02
8 commits to main branch, last one 2 years ago
Official code for our CVPR '22 paper "Dataset Distillation by Matching Training Trajectories"
Created 2022-03-21
17 commits to main branch, last one 8 months ago