Search Results - RepositoryStats

303

2.6k

mit

47

A simple command line tool for text to image generation, using OpenAI's CLIP and a BigGAN. Technique was originally created by https://twitter.com/advadnoun

deep-learning multimodality text-to-image artificial-intelligence generative-adversarial-networks

Created 2021-01-18

159 commits to main branch, last one 3 years ago

Cradle BAAI-Agents

183

2.1k

mit

27

The Cradle framework is a first attempt at General Computer Control (GCC). Cradle supports agents to ace any computer task by enabling strong reasoning abilities, self-improvment, and skill curation, ...

ai gcc llm lmm vlm cradle ai-agent grounding personoid generative-ai multimodality computer-control foundation-agent ai-agents-framework large-language-models vision-language-model general-computer-control

Created 2024-03-03

35 commits to main branch, last one 4 months ago

RAG-Survey hymie122

109

1.6k

unknown

34

Collecting awesome papers of RAG for AIGC. We propose a taxonomy of RAG foundations, enhancements, and applications in paper "Retrieval-Augmented Generation for AI-Generated Content: A Survey".

llm rag aigc survey multimodality diffusion-models

Created 2024-02-23

130 commits to main branch, last one 7 months ago

cornac PreferredAI

153

942

apache-2.0

25

A Comparative Framework for Multimodal Recommender Systems

multimodality recommender-system multimodal-learning matrix-factorization recommendation-engine recommendation-system collaborative-filtering recommendation-algorithms

Created 2018-07-17

1,384 commits to master branch, last one about a month ago

CLIP4Clip ArrowLuo

126

928

mit

12

An official implementation for "CLIP4Clip: An Empirical Study of CLIP for End to End Video Clip Retrieval"

clip msvd lsmdc didemo msrvtt search ranking retrieval multimodal activitynet multimodality retrieval-model multimodal-learning video-clip-retrieval video-text-retrieval

Created 2021-04-13

29 commits to master branch, last one 2 years ago

Ovis AIDC-AI

56

858

apache-2.0

13

A novel Multimodal Large Language Model (MLLM) architecture, designed to structurally align visual and textual embeddings.

qwen llama3 chatbot multimodal multimodality vision-language-model vision-language-learning multimodal-large-language-models

Created 2024-06-13

40 commits to main branch, last one 9 days ago

Generative-AI fnzhan

57

753

unknown

39

[TPAMI 2023] Multimodal Image Synthesis and Editing: The Generative AI Era

aigc gans nerfs multimodality diffusion-model

Created 2021-12-04

218 commits to main branch, last one about a year ago

FEDOT aimclub

88

663

bsd-3-clause

11

Automated modeling and machine learning framework FEDOT

fedot automl automation multimodality machine-learning parameter-tuning genetic-programming structural-learning evolutionary-algorithms automated-machine-learning hyperparameter-optimization

Created 2020-01-13

924 commits to master branch, last one 3 days ago

Woodpecker BradyFU

31

634

unknown

16

✨✨Woodpecker: Hallucination Correction for Multimodal Large Language Models

llm mllm hallucination multimodality hallucinations large-language-models multimodal-large-language-models

Created 2023-09-26

107 commits to main branch, last one 3 months ago

GPT4RoI jshilong

27

525

other

8

GPT4RoI: Instruction Tuning Large Language Model on Region-of-Interest

gpt llm roi multimodality computer-vision

Created 2023-07-06

14 commits to main branch, last one 9 months ago

LLM2CLIP microsoft

25

498

mit

13

LLM2CLIP makes SOTA pretrained CLIP model more SOTA ever.

clip multimodality fundation-models

Created 2024-07-22

56 commits to main branch, last one 9 days ago

X-VLM zengyan-97

52

476

bsd-3-clause

4

X-VLM: Multi-Grained Vision Language Pre-Training (ICML 2022)

x-vlm multimodality vision-and-language

Created 2021-11-15

31 commits to master branch, last one 2 years ago

clip-guided-diffusion afiaka87

61

462

mit

12

A CLI tool/python module for generating images from text using guided diffusion and CLIP from OpenAI.

openai diffusion multimodal openai-clip deep-learning multimodality text-to-image image-generation artificial-intelligence text-to-image-synthesis

Created 2021-07-21

366 commits to main branch, last one 3 years ago

Awesome-LLMs-meet-Multimodal-Generation YingqingHe

26

448

unknown

17

🔥🔥🔥 A curated list of papers on LLMs-based multimodal generation (image, video, 3D and audio).

llm aigc lvlm mllm text-to-3d multimodality text-to-audio text-to-image text-to-music text-to-sound text-to-video text-to-speech multimodal-models large-language-models multimodal-generation large-vision-language-models multimodal-large-language-models

Created 2023-11-17

356 commits to main branch, last one 3 days ago

fonduer HazyResearch

78

409

mit

26

A knowledge base construction engine for richly formatted data

multimodality machine-learning knowledge-base-construction

Created 2018-02-02

1,397 commits to master branch, last one 3 years ago

MMMU MMMU-Benchmark

33

407

apache-2.0

3

This repo contains evaluation code for the paper "MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI"

llm llms stem evaluation multimodal deep-learning multimodality computer-vision machine-learning foundation-models question-answering multimodal-learning deep-neural-networks large-language-models large-multimodal-models multimodal-deep-learning visual-question-answering natural-language-processing

Created 2023-11-23

147 commits to main branch, last one 24 days ago

Med-PaLM kyegomez

52

372

mit

7

Towards Generalist Biomedical AI

gpt4 biomedical multimodal opensource deep-learning multimodality multimodal-deep-learning

Created 2023-07-31

118 commits to main branch, last one about a year ago

CM3Leon kyegomez

18

359

mit

21

An open source implementation of "Scaling Autoregressive Multi-Modal Models: Pretraining and Instruction Tuning", an all-new multi modal AI that uses just a decoder to generate both text and images

dalle attention multimodal multimodality imagegeneration multimodal-learning attention-is-all-you-need

Created 2023-07-14

73 commits to main branch, last one about a year ago

dance OmicsML

37

355

bsd-2-clause

6

DANCE: a deep learning library and benchmark platform for single-cell analysis

dance python benchmark single-cell data-science deep-learning multimodality bioinformatics machine-learning single-cell-rna-seq computational-biology graph-neural-networks spatial-transcriptomics single-cell-rna-sequencing

Created 2022-06-07

787 commits to main branch, last one 8 months ago

UniVL microsoft

55

349

mit

9

An official implementation for " UniVL: A Unified Video and Language Pre-Training Model for Multimodal Understanding and Generation"

coin joint video msrvtt caption pretrain alignment youcookii video-text pretraining caption-task localization segmentation multimodality retrieval-task video-language video-text-retrieval multimodal-sentiment-analysis

Created 2020-10-30

20 commits to main branch, last one 2 years ago

multimodal-sentiment-analysis soujanyaporia

74

346

mit

6

Attention-based multimodal fusion for sentiment analysis

lstm attention tensorflow multimodality dialogue-systems sentiment-analysis attention-mechanism conversational-agents sentiment-classification natural-language-processing

Created 2018-07-06

69 commits to master branch, last one 3 years ago

Awesome-Multimodality Yutong-Zhou-cv

22

323

unknown

12

A Survey on multimodal learning research.

awesome-list multimodality multimodal-deep-learning

Created 2021-09-20

79 commits to main branch, last one about a year ago

MedTrinity-25M UCSC-VLAA

18

291

unknown

3

[ICLR 2025] This is the official repository of our paper "MedTrinity-25M: A Large-scale Multimodal Dataset with Multigranular Annotations for Medicine“

mllms dataset multimodality

Created 2024-08-06

23 commits to master branch, last one about a month ago

VectorNet Liang-ZX

52

258

unknown

3

Pytorch implementation of CVPR2020 paper “VectorNet: Encoding HD Maps and Agent Dynamics from Vectorized Representation”

gnn multimodality trajectory-prediction

Created 2020-06-19

2 commits to master branch, last one 2 years ago

NaViT kyegomez

10

225

mit

6

My implementation of "Patch n’ Pack: NaViT, a Vision Transformer for any Aspect Ratio and Resolution"

vit clip gpt4 multimodal multimodality attention-mechanism multimodal-learning multimodal-deep-learning

Created 2023-09-28

18 commits to main branch, last one about a year ago

fusilli florencejt

14

178

agpl-3.0

4

A Python package housing a collection of deep-learning multi-modal data fusion method pipelines! From data loading, to training, to evaluation - fusilli's got you covered 🌸

cnn imaging pytorch multi-view multimodal data-fusion multimodality machine-learning pytorch-lightning attention-mechanism multi-view-learning graph-neural-network multivariate-analysis variational-autoencoder multimodal-deep-learning

Created 2023-08-16

252 commits to main branch, last one 6 months ago

how2-dataset srvk

18

172

unknown

11

This repository contains code and metadata of How2 dataset

video corpus dataset language how2-dataset multimodality speech-recognition machine-translation

Created 2018-10-27

161 commits to master branch, last one 3 months ago

MMStar MMStar-Benchmark

5

170

unknown

1

[NeurIPS 2024] This repo contains evaluation code for the paper "Are We on the Right Way for Evaluating Large Vision-Language Models"

llm llms lvlm lvlms evaluation multimodal multimodality multimodal-learning large-language-models large-multimodal-models visual-question-answering large-vision-language-model large-vision-language-models

Created 2024-03-29

19 commits to main branch, last one 6 months ago

GenerateU FoundationVision

7

167

mit

7

[CVPR2024] Generative Region-Language Pretraining for Open-Ended Object Detection

mllm open-world multimodality open-vocabulary object-detection open-vocabulary-detection

Created 2024-03-15

3 commits to main branch, last one 5 days ago

PALI3 kyegomez

4

145

mit

5

Implementation of PALI3 from the paper PALI-3 VISION LANGUAGE MODELS: SMALLER, FASTER, STRONGER"

gpt4 autogpt multimodal multimodality machine-learning multimodal-learning artificial-intelligence multimodal-deep-learning

Created 2023-10-16

44 commits to main branch, last one about a year ago