Search Results - RepositoryStats

2.3k

20.8k

apache-2.0

157

[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.

gpt-4 llama llava llama2 chatbot chatgpt llama-2 multimodal multi-modality foundation-models instruction-tuning vision-language-model visual-language-learning

Created 2023-04-17

460 commits to main branch, last one 7 months ago

Awesome-Multimodal-Large-Language-Models BradyFU

839

13.2k

unknown

256

:sparkles::sparkles:Latest Advances on Multimodal Large Language Models

multi-modality chain-of-thought instruction-tuning in-context-learning instruction-following large-language-models visual-instruction-tuning large-vision-language-model multimodal-chain-of-thought large-vision-language-models multimodal-instruction-tuning multimodal-in-context-learning multimodal-large-language-models

Created 2023-05-19

782 commits to main branch, last one 8 hours ago

clip-as-service jina-ai

2.1k

12.5k

other

222

🏄 Scalable embedding, reasoning, ranking for images and sentences with CLIP

bert onnx openai pytorch image2vec clip-model sentence2vec deep-learning neural-search cross-modality multi-modality bert-as-service clip-as-service sentence-encoding cross-modal-retrieval

Created 2018-11-12

1,960 commits to main branch, last one about a year ago

deep-daze lucidrains

325

4.4k

mit

75

Simple command line tool for text to image generation using OpenAI's CLIP and Siren (Implicit neural representation network). Technique was originally created by https://twitter.com/advadnoun

siren transformers deep-learning text-to-image multi-modality artificial-intelligence implicit-neural-representation

Created 2021-01-17

231 commits to main branch, last one 2 years ago

Otter Luodian

243

3.6k

mit

100

🦦 Otter, a multi-modal model based on OpenFlamingo (open-sourced version of DeepMind's Flamingo), trained on MIMIC-IT and showcasing improved instruction-following and in-context learning ability.

gpt-4 chatgpt embodied-ai deep-learning multi-modality machine-learning foundation-models instruction-tuning large-scale-models artificial-inteligence visual-language-learning

Created 2023-04-01

626 commits to main branch, last one 9 months ago

InternLM-XComposer InternLM

159

2.6k

apache-2.0

43

InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System for Long-term Streaming Video and Audio Interactions

gpt llm mllm gpt-4 chatgpt foundation multimodal language-model multi-modality instruction-tuning vision-transformer large-language-model supervised-finetuning vision-language-model visual-language-learning large-vision-language-model

Created 2023-09-26

409 commits to main branch, last one 3 days ago

swarms kyegomez

280

2.0k

agpl-3.0

36

The Enterprise-Grade Production-Ready Multi-Agent Orchestration Framework Join our Community: https://discord.com/servers/agora-999382051935506503

Created 2023-05-11

3,277 commits to master branch, last one 3 days ago

3DObjectTracking DLR-RM

141

788

mit

26

Algorithms and Publications on 3D Object Tracking

ijcv rgbd paper tpami accv2020 cvpr2022 iros2023 tracking real-time multi-body articulated multi-modality computer-vision object-tracking pose-estimation

Created 2020-09-21

40 commits to master branch, last one about a year ago

VisRAG OpenBMB

36

505

apache-2.0

9

Parsing-free RAG supported by VLMs

rag retrieval multi-modal multi-modality document-retrieval vision-language-model document-understanding retrieval-augmented-generation

Created 2024-10-14

73 commits to master branch, last one 10 days ago

Multi-Modality-Arena OpenGVLab

36

479

unknown

6

Chatbot Arena meets multi-modality! Multi-Modality Arena allows you to benchmark vision-language models side-by-side while providing images as inputs. Supports MiniGPT-4, LLaMA-Adapter V2, LLaVA, BLIP...

vqa chat llms gradio chatbot chatgpt multi-modality large-language-models vision-language-model

Created 2023-05-10

86 commits to main branch, last one 8 months ago

Gemini kyegomez

56

434

mit

12

The open source implementation of Gemini, the model that will "eclipse ChatGPT" by Google

ai ml gpt4 gemini multimodla multi-modality machine-learning artificial-intelligence

Created 2023-08-29

149 commits to main branch, last one 6 months ago

Collaborative-Diffusion ziqihuangg

32

410

other

9

[CVPR 2023] Collaborative Diffusion

aigc gen-ai face-editing image-editing multi-modality face-generation diffusion-models image-generation stable-diffusion latent-diffusion-models

Created 2023-03-22

15 commits to master branch, last one about a year ago

MM-Diffusion researchmm

22

403

mit

6

[CVPR'23] MM-Diffusion: Learning Multi-Modal Diffusion Models for Joint Audio and Video Generation

multi-modality audio-generation content-creation diffusion-models video-generation

Created 2022-12-11

16 commits to main branch, last one 6 months ago

Open-LLaVA-NeXT xiaoachen98

20

401

unknown

13

An open-source implementation for training LLaVA-NeXT.

gpt-4 gpt4o llama llava llama3 chatbot chatgpt llava-next multimodal multi-modality vision-language-model large-multimodal-models visual-language-learning

Created 2024-05-11

36 commits to master branch, last one about a month ago

Sophia kyegomez

26

377

apache-2.0

8

Effortless plugin and play Optimizer to cut model training costs by 50%. New optimizer that is 2x faster than Adam on LLMs.

chatgpt optimizer deep-learning multi-modality neural-network artificial-intelligence

Created 2023-05-24

58 commits to main branch, last one 6 months ago

CRIS.pytorch DerrickWang005

36

253

mit

1

An official PyTorch implementation of the CRIS paper

multi-modality contrastive-learning referring-image-segmentation

Created 2022-06-01

31 commits to master branch, last one 8 months ago

RLHF-V RLHF-V

6

250

unknown

2

[CVPR'24] RLHF-V: Towards Trustworthy MLLMs via Behavior Alignment from Fine-grained Correctional Human Feedback

gpt-4 llama rlhf-v chatbot multimodal multi-modality visual-language-learning

Created 2023-11-29

73 commits to main branch, last one 3 months ago

UVTR dvlab-research

18

228

unknown

6

Unifying Voxel-based Representation with Transformer for 3D Object Detection (NeurIPS 2022)

pytorch 3d-detection multi-modality

Created 2022-06-01

13 commits to main branch, last one 2 years ago

CVPR21Chal-SLR jackyjsy

51

213

cc0-1.0

3

This repo contains the official code of our work SAM-SLR which won the CVPR 2021 Challenge on Large Scale Signer Independent Isolated Sign Language Recognition.

cvpr2021 multi-modality skeleton-features sign-language-recognition sign-language-recognition-system

Created 2021-03-15

89 commits to main branch, last one 2 years ago

VisionZip dvlab-research

7

191

apache-2.0

3

Official repo for "VisionZip: Longer is Better but Not Necessary in Vision Language Models"

vlms efficiency multi-modality vision-language-model

Created 2024-12-02

8 commits to main branch, last one 5 days ago

CoDA_NeurIPS2023 yangcaoai

16

186

mit

11

Official code for NeurIPS2023 paper: CoDA: Collaborative Novel Box Discovery and Cross-modal Alignment for Open-vocabulary 3D Object Detection

3d-vision detection transformer 3d-detection deep-learning multi-modality open-vocabulary artificial-intelligence

Created 2023-10-05

83 commits to main branch, last one about a month ago

multi_token sshh12

12

176

apache-2.0

3

Embed arbitrary modalities (images, audio, documents, etc) into large language models.

llm llava multimodal large-context multi-modality large-language-models vision-language-model large-multimodal-models

Created 2023-10-11

84 commits to main branch, last one 8 months ago

rungpt jina-ai

21

159

apache-2.0

22

An open-source cloud-native of large multi-modal models (LMMs) serving framework.

gpt-4 llama opengpt flamingo llm-serve lmm-serve llm-hosting self-hosting transformers multi-modality large-language-models large-multimadality-models

Created 2023-04-04

456 commits to main branch, last one about a year ago

MEDIAR Lee-Gihun

32

144

mit

5

(NeurIPS 2022 CellSeg Challenge - 1st Winner) Open source code for "MEDIAR: Harmony of Data-Centric and Model-Centric for Multi-Modality Microscopy"

monai pytorch biomedical miscroscopy cell-biology neurips-2022 multi-modality multi-resolution cell-segmentation vision-transformer pytorch-segmentation instance-segmentation pytorch-implementation

Created 2022-11-16

79 commits to main branch, last one 8 months ago

the-compiler kyegomez

19

143

apache-2.0

4

Seed, Code, Harvest: Grow Your Own App with Tree of Thoughts!

agora autogpt chatgpt deep-learning multi-modality chain-of-thought tree-of-thoughts multi-modal-fusion prompt-engineering artficial-intelligence reinforcement-learning deep-learning-algorithms multimodal-deep-learning

Created 2023-05-22

48 commits to main branch, last one about a year ago

Andromeda kyegomez

20

137

gpl-3.0

9

An all-new Language Model That Processes Ultra-Long Sequences of 100,000+ Ultra-Fast

agi gpt-4 multimodal transformer deep-learning language-model multi-modality neural-networks large-language-models artificial-intelligence artificial-general-intelligence artificial-intelligence-algorithms

Created 2023-05-05

428 commits to master branch, last one 9 months ago

Prompt-Highlighter dvlab-research

2

135

mit

2

[CVPR 2024] Prompt Highlighter: Interactive Control for Multi-Modal LLMs

llm-inference multi-modality text-generation

Created 2023-11-28

18 commits to main branch, last one 5 months ago

MambaByte kyegomez

6

110

mit

4

Implementation of MambaByte in "MambaByte: Token-free Selective State Space Model" in Pytorch and Zeta

ai ml gpt4v mamba megabyte tokenizer multi-modality machine-learning artificial-intelligence

Created 2024-01-26

9 commits to main branch, last one 10 months ago

MMGL SsGood

15

93

mit

3

Multi-modal Graph learning for Disease Prediction (IEEE Trans. on Medical imaging, TMI2022)

pytorch transformer graph-learning multi-modality disease-prediction

Created 2022-01-01

66 commits to main branch, last one 9 months ago

MoE-Mamba kyegomez

5

89

mit

5

Implementation of MoE Mamba from the paper: "MoE-Mamba: Efficient Selective State Space Models with Mixture of Experts" in Pytorch and Zeta

ai ml moe swarms multi-modality multi-modal-fusion

Created 2024-01-21

14 commits to main branch, last one 11 months ago