43 results found Sort:

A simple command line tool for text to image generation, using OpenAI's CLIP and a BigGAN. Technique was originally created by https://twitter.com/advadnoun
Created 2021-01-18
159 commits to main branch, last one 2 years ago
163
1.9k
mit
26
The Cradle framework is a first attempt at General Computer Control (GCC). Cradle supports agents to ace any computer task by enabling strong reasoning abilities, self-improvment, and skill curation, ...
Created 2024-03-03
35 commits to main branch, last one 14 days ago
88
1.3k
unknown
29
Collecting awesome papers of RAG for AIGC. We propose a taxonomy of RAG foundations, enhancements, and applications in paper "Retrieval-Augmented Generation for AI-Generated Content: A Survey".
Created 2024-02-23
130 commits to main branch, last one 3 months ago
144
886
apache-2.0
26
A Comparative Framework for Multimodal Recommender Systems
Created 2018-07-17
1,368 commits to master branch, last one 2 months ago
125
881
mit
13
An official implementation for "CLIP4Clip: An Empirical Study of CLIP for End to End Video Clip Retrieval"
Created 2021-04-13
29 commits to master branch, last one 2 years ago
59
784
unknown
47
[TPAMI 2023] Multimodal Image Synthesis and Editing: The Generative AI Era
Created 2021-12-04
218 commits to main branch, last one about a year ago
88
647
bsd-3-clause
11
Automated modeling and machine learning framework FEDOT
Created 2020-01-13
903 commits to master branch, last one 5 days ago
29
611
unknown
15
✨✨Woodpecker: Hallucination Correction for Multimodal Large Language Models. The first work to correct hallucinations in MLLMs.
Created 2023-09-26
106 commits to main branch, last one 5 months ago
31
527
apache-2.0
7
A novel Multimodal Large Language Model (MLLM) architecture, designed to structurally align visual and textual embeddings.
Created 2024-06-13
31 commits to main branch, last one 16 days ago
25
507
other
8
GPT4RoI: Instruction Tuning Large Language Model on Region-of-Interest
Created 2023-07-06
14 commits to main branch, last one 5 months ago
A CLI tool/python module for generating images from text using guided diffusion and CLIP from OpenAI.
Created 2021-07-21
366 commits to main branch, last one 2 years ago
51
450
bsd-3-clause
5
X-VLM: Multi-Grained Vision Language Pre-Training (ICML 2022)
Created 2021-11-15
31 commits to master branch, last one about a year ago
A knowledge base construction engine for richly formatted data
Created 2018-02-02
1,397 commits to master branch, last one 3 years ago
18
363
mit
22
An open source implementation of "Scaling Autoregressive Multi-Modal Models: Pretraining and Instruction Tuning", an all-new multi modal AI that uses just a decoder to generate both text and images
Created 2023-07-14
73 commits to main branch, last one 11 months ago
🔥🔥🔥 A curated list of papers on LLMs-based multimodal generation (image, video, 3D and audio).
Created 2023-11-17
346 commits to main branch, last one 6 days ago
25
359
apache-2.0
4
This repo contains evaluation code for the paper "MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI"
Created 2023-11-23
127 commits to main branch, last one 16 hours ago
36
351
bsd-2-clause
7
DANCE: a deep learning library and benchmark platform for single-cell analysis
Created 2022-06-07
787 commits to main branch, last one 4 months ago
54
338
mit
10
An official implementation for " UniVL: A Unified Video and Language Pre-Training Model for Multimodal Understanding and Generation"
Created 2020-10-30
20 commits to main branch, last one about a year ago
Attention-based multimodal fusion for sentiment analysis
Created 2018-07-06
69 commits to master branch, last one 3 years ago
Towards Generalist Biomedical AI
Created 2023-07-31
118 commits to main branch, last one 9 months ago
A Survey on multimodal learning research.
Created 2021-09-20
79 commits to main branch, last one about a year ago
LLM2CLIP makes SOTA pretrained CLIP model more SOTA ever.
Created 2024-07-22
43 commits to main branch, last one a day ago
44
243
unknown
3
Pytorch implementation of CVPR2020 paper “VectorNet: Encoding HD Maps and Agent Dynamics from Vectorized Representation”
Created 2020-06-19
2 commits to master branch, last one 2 years ago
10
185
mit
7
My implementation of "Patch n’ Pack: NaViT, a Vision Transformer for any Aspect Ratio and Resolution"
Created 2023-09-28
18 commits to main branch, last one 9 months ago
17
163
unknown
12
This repository contains code and metadata of How2 dataset
Created 2018-10-27
160 commits to master branch, last one about a month ago
13
161
agpl-3.0
5
A Python package housing a collection of deep-learning multi-modal data fusion method pipelines! From data loading, to training, to evaluation - fusilli's got you covered 🌸
Created 2023-08-16
252 commits to main branch, last one 2 months ago
[NeurIPS 2024] This repo contains evaluation code for the paper "Are We on the Right Way for Evaluating Large Vision-Language Models"
Created 2024-03-29
19 commits to main branch, last one about a month ago
Implementation of PALI3 from the paper PALI-3 VISION LANGUAGE MODELS: SMALLER, FASTER, STRONGER"
Created 2023-10-16
44 commits to main branch, last one 8 months ago
[CVPR2024] Generative Region-Language Pretraining for Open-Ended Object Detection
Created 2024-03-15
2 commits to main branch, last one 8 months ago
34
137
apache-2.0
12
A python framework accelerating ML based discovery in the medical field by encouraging code reuse. Batteries included :)
Created 2021-06-23
518 commits to master branch, last one 3 days ago