55 results found Sort:

629
4.4k
mit
68
Context aware, pluggable and customizable data protection and de-identification SDK for text, images and structured data.
Created 2018-05-04
1,281 commits to main branch, last one 7 days ago
191
2.6k
apache-2.0
23
Distilabel is a framework for synthetic data and AI feedback for engineers who need fast, reliable and scalable pipelines based on verified research papers.
Created 2023-10-16
840 commits to main branch, last one 2 months ago
215
2.5k
apache-2.0
16
A framework for prompt tuning using Intent-based Prompt Calibration
Created 2023-12-02
167 commits to main branch, last one 4 days ago
90
1.2k
apache-2.0
9
Synthetic data curation for post-training and structured data extraction
Created 2024-10-28
1,539 commits to main branch, last one 16 hours ago
DataDreamer: Prompt. Generate Synthetic Data. Train & Align Models.   🤖💤
Created 2023-06-02
83 commits to main branch, last one 2 months ago
Perception toolkit for sim2real training and validation in Unity
Created 2020-04-03
1,439 commits to main branch, last one 5 months ago
49
765
bsd-3-clause
12
A lightweight library for generating synthetic instruction tuning datasets for your data without GPT.
Created 2024-02-25
35 commits to main branch, last one about a month ago
Configurable Generation of Synthetic Schemas and Knowledge Graphs at Your Fingertips
Created 2023-09-07
47 commits to master branch, last one 9 months ago
[ICLR 2025] Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing. Your efficient and high-quality synthetic data generation pipeline!
Created 2024-06-12
74 commits to main branch, last one 28 days ago
A curated list of awesome projects which use Machine Learning to generate synthetic content.
Created 2019-02-19
50 commits to master branch, last one 2 years ago
33
406
apache-2.0
7
Generate large synthetic data using an LLM
Created 2024-10-25
176 commits to main branch, last one 3 days ago
Augmentation pipeline for rendering synthetic paper printing, faxing, scanning and copy machine processes
Created 2021-05-31
1,020 commits to dev branch, last one 14 days ago
55
373
apache-2.0
17
SynthDet - An end-to-end object detection pipeline using synthetic data
This repository has been archived (exclude archived)
Created 2020-03-26
158 commits to master branch, last one 4 months ago
13
337
unknown
7
Compose multimodal datasets 🎹
Created 2024-02-17
148 commits to main branch, last one 8 days ago
Random dataframe and database table generator
Created 2018-03-10
73 commits to master branch, last one 3 years ago
74
305
bsd-3-clause-clear
5
[IMC 2020 (Best Paper Finalist)] Using GANs for Sharing Networked Time Series Data: Challenges, Initial Promise, and Open Questions
Created 2019-09-28
26 commits to master branch, last one about a year ago
4
287
cc-by-4.0
29
[NeurIPS D&B Track 2024] Official implementation of HumanVid
Created 2024-07-19
22 commits to main branch, last one about a month ago
awesome synthetic (text) datasets
Created 2024-02-21
41 commits to main branch, last one 5 months ago
A suite of auto-regressive and Seq2Seq (sequence-to-sequence) transformer models for tabular and relational synthetic data generation.
Created 2022-11-07
37 commits to main branch, last one about a month ago
10
197
apache-2.0
2
✨ A synthetic dataset generation framework that produces diverse coding questions and verifiable solutions - all in one framwork
Created 2025-02-21
25 commits to main branch, last one 12 days ago
[ACL 2024] LLM2LLM: Boosting LLMs with Novel Iterative Data Enhancement
Created 2024-03-21
3 commits to main branch, last one about a year ago
24
172
mit
4
[CVPR 2021] DeFMO: Deblurring and Shape Recovery of Fast Moving Objects
Created 2021-02-06
52 commits to master branch, last one 3 years ago
BEDLAM (CVPR 2023) render pipeline tools
Created 2023-06-20
9 commits to main branch, last one 12 months ago
This is the dataset and code release of the OpenRooms Dataset. For more information, please refer to our webpage below. Thanks a lot for your interest in our research!
Created 2021-05-17
110 commits to main branch, last one about a year ago
12
145
apache-2.0
2
Solving data for LLMs - Create quality synthetic datasets!
Created 2024-06-22
126 commits to main branch, last one 2 months ago
13
130
mit
7
SynCD: Generating Multi-Image Synthetic Data for Text-to-Image Customization
Created 2025-02-03
11 commits to main branch, last one 27 days ago
NVIDIA Dataset Utilities (NVDU)
Created 2018-07-12
15 commits to master branch, last one 5 years ago
25
125
mit
3
Optimize Document Retrieval with Fine-Tuned KnowledgeBases
Created 2025-01-22
145 commits to main branch, last one about a month ago