39 results found Sort:
- Filter by Primary Language:
- Python (34)
- C++ (2)
- Jupyter Notebook (2)
- +
[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.
Created
2023-04-17
460 commits to main branch, last one 6 months ago
:sparkles::sparkles:Latest Advances on Multimodal Large Language Models
multi-modality
chain-of-thought
instruction-tuning
in-context-learning
instruction-following
large-language-models
visual-instruction-tuning
large-vision-language-model
multimodal-chain-of-thought
large-vision-language-models
multimodal-instruction-tuning
multimodal-in-context-learning
multimodal-large-language-models
Created
2023-05-19
728 commits to main branch, last one a day ago
🏄 Scalable embedding, reasoning, ranking for images and sentences with CLIP
Created
2018-11-12
1,960 commits to main branch, last one 11 months ago
Simple command line tool for text to image generation using OpenAI's CLIP and Siren (Implicit neural representation network). Technique was originally created by https://twitter.com/advadnoun
Created
2021-01-17
231 commits to main branch, last one 2 years ago
🦦 Otter, a multi-modal model based on OpenFlamingo (open-sourced version of DeepMind's Flamingo), trained on MIMIC-IT and showcasing improved instruction-following and in-context learning ability.
Created
2023-04-01
626 commits to main branch, last one 8 months ago
InternLM-XComposer-2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output
Created
2023-09-26
395 commits to main branch, last one about a month ago
The Enterprise-Grade Production-Ready Multi-Agent Orchestration Framework Join our Community: https://discord.com/servers/agora-999382051935506503
Created
2023-05-11
3,186 commits to master branch, last one a day ago
Algorithms and Publications on 3D Object Tracking
Created
2020-09-21
40 commits to master branch, last one about a year ago
Chatbot Arena meets multi-modality! Multi-Modality Arena allows you to benchmark vision-language models side-by-side while providing images as inputs. Supports MiniGPT-4, LLaMA-Adapter V2, LLaVA, BLIP...
Created
2023-05-10
86 commits to main branch, last one 7 months ago
The open source implementation of Gemini, the model that will "eclipse ChatGPT" by Google
Created
2023-08-29
149 commits to main branch, last one 5 months ago
[CVPR 2023] Collaborative Diffusion
Created
2023-03-22
15 commits to master branch, last one 11 months ago
Parsing-free RAG supported by VLMs
Created
2024-10-14
66 commits to master branch, last one a day ago
[CVPR'23] MM-Diffusion: Learning Multi-Modal Diffusion Models for Joint Audio and Video Generation
Created
2022-12-11
16 commits to main branch, last one 5 months ago
An open-source implementation for training LLaVA-NeXT.
Created
2024-05-11
36 commits to master branch, last one 29 days ago
Effortless plugin and play Optimizer to cut model training costs by 50%. New optimizer that is 2x faster than Adam on LLMs.
Created
2023-05-24
58 commits to main branch, last one 5 months ago
An official PyTorch implementation of the CRIS paper
Created
2022-06-01
31 commits to master branch, last one 7 months ago
[CVPR'24] RLHF-V: Towards Trustworthy MLLMs via Behavior Alignment from Fine-grained Correctional Human Feedback
Created
2023-11-29
73 commits to main branch, last one 2 months ago
Unifying Voxel-based Representation with Transformer for 3D Object Detection (NeurIPS 2022)
Created
2022-06-01
13 commits to main branch, last one 2 years ago
This repo contains the official code of our work SAM-SLR which won the CVPR 2021 Challenge on Large Scale Signer Independent Isolated Sign Language Recognition.
Created
2021-03-15
89 commits to main branch, last one 2 years ago
Official code for NeurIPS2023 paper: CoDA: Collaborative Novel Box Discovery and Cross-modal Alignment for Open-vocabulary 3D Object Detection
Created
2023-10-05
83 commits to main branch, last one 28 days ago
Embed arbitrary modalities (images, audio, documents, etc) into large language models.
Created
2023-10-11
84 commits to main branch, last one 7 months ago
An open-source cloud-native of large multi-modal models (LMMs) serving framework.
Created
2023-04-04
456 commits to main branch, last one about a year ago
Seed, Code, Harvest: Grow Your Own App with Tree of Thoughts!
Created
2023-05-22
48 commits to main branch, last one about a year ago
(NeurIPS 2022 CellSeg Challenge - 1st Winner) Open source code for "MEDIAR: Harmony of Data-Centric and Model-Centric for Multi-Modality Microscopy"
Created
2022-11-16
79 commits to main branch, last one 7 months ago
An all-new Language Model That Processes Ultra-Long Sequences of 100,000+ Ultra-Fast
Created
2023-05-05
428 commits to master branch, last one 8 months ago
[CVPR 2024] Prompt Highlighter: Interactive Control for Multi-Modal LLMs
Created
2023-11-28
18 commits to main branch, last one 4 months ago
Implementation of MambaByte in "MambaByte: Token-free Selective State Space Model" in Pytorch and Zeta
Created
2024-01-26
9 commits to main branch, last one 9 months ago
Multi-modal Graph learning for Disease Prediction (IEEE Trans. on Medical imaging, TMI2022)
Created
2022-01-01
66 commits to main branch, last one 8 months ago
Implementation of MoE Mamba from the paper: "MoE-Mamba: Efficient Selective State Space Models with Mixture of Experts" in Pytorch and Zeta
Created
2024-01-21
14 commits to main branch, last one 10 months ago
[TCSVT] CorrI2P: Deep Image-to-Point Cloud Registration via Dense CorrespondenceThe code of CorrI2P
Created
2022-08-25
41 commits to main branch, last one about a month ago