42 results found Sort:
- Filter by Primary Language:
- Python (35)
- C++ (2)
- Jupyter Notebook (2)
- +
[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.
Created
2023-04-17
460 commits to main branch, last one 7 months ago
:sparkles::sparkles:Latest Advances on Multimodal Large Language Models
multi-modality
chain-of-thought
instruction-tuning
in-context-learning
instruction-following
large-language-models
visual-instruction-tuning
large-vision-language-model
multimodal-chain-of-thought
large-vision-language-models
multimodal-instruction-tuning
multimodal-in-context-learning
multimodal-large-language-models
Created
2023-05-19
782 commits to main branch, last one 8 hours ago
🏄 Scalable embedding, reasoning, ranking for images and sentences with CLIP
Created
2018-11-12
1,960 commits to main branch, last one about a year ago
Simple command line tool for text to image generation using OpenAI's CLIP and Siren (Implicit neural representation network). Technique was originally created by https://twitter.com/advadnoun
Created
2021-01-17
231 commits to main branch, last one 2 years ago
🦦 Otter, a multi-modal model based on OpenFlamingo (open-sourced version of DeepMind's Flamingo), trained on MIMIC-IT and showcasing improved instruction-following and in-context learning ability.
Created
2023-04-01
626 commits to main branch, last one 9 months ago
InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System for Long-term Streaming Video and Audio Interactions
Created
2023-09-26
409 commits to main branch, last one 3 days ago
The Enterprise-Grade Production-Ready Multi-Agent Orchestration Framework Join our Community: https://discord.com/servers/agora-999382051935506503
Created
2023-05-11
3,277 commits to master branch, last one 3 days ago
Algorithms and Publications on 3D Object Tracking
Created
2020-09-21
40 commits to master branch, last one about a year ago
Parsing-free RAG supported by VLMs
Created
2024-10-14
73 commits to master branch, last one 10 days ago
Chatbot Arena meets multi-modality! Multi-Modality Arena allows you to benchmark vision-language models side-by-side while providing images as inputs. Supports MiniGPT-4, LLaMA-Adapter V2, LLaVA, BLIP...
Created
2023-05-10
86 commits to main branch, last one 8 months ago
The open source implementation of Gemini, the model that will "eclipse ChatGPT" by Google
Created
2023-08-29
149 commits to main branch, last one 6 months ago
[CVPR 2023] Collaborative Diffusion
Created
2023-03-22
15 commits to master branch, last one about a year ago
[CVPR'23] MM-Diffusion: Learning Multi-Modal Diffusion Models for Joint Audio and Video Generation
Created
2022-12-11
16 commits to main branch, last one 6 months ago
An open-source implementation for training LLaVA-NeXT.
Created
2024-05-11
36 commits to master branch, last one about a month ago
Effortless plugin and play Optimizer to cut model training costs by 50%. New optimizer that is 2x faster than Adam on LLMs.
Created
2023-05-24
58 commits to main branch, last one 6 months ago
An official PyTorch implementation of the CRIS paper
Created
2022-06-01
31 commits to master branch, last one 8 months ago
[CVPR'24] RLHF-V: Towards Trustworthy MLLMs via Behavior Alignment from Fine-grained Correctional Human Feedback
Created
2023-11-29
73 commits to main branch, last one 3 months ago
Unifying Voxel-based Representation with Transformer for 3D Object Detection (NeurIPS 2022)
Created
2022-06-01
13 commits to main branch, last one 2 years ago
This repo contains the official code of our work SAM-SLR which won the CVPR 2021 Challenge on Large Scale Signer Independent Isolated Sign Language Recognition.
Created
2021-03-15
89 commits to main branch, last one 2 years ago
Official repo for "VisionZip: Longer is Better but Not Necessary in Vision Language Models"
Created
2024-12-02
8 commits to main branch, last one 5 days ago
Official code for NeurIPS2023 paper: CoDA: Collaborative Novel Box Discovery and Cross-modal Alignment for Open-vocabulary 3D Object Detection
Created
2023-10-05
83 commits to main branch, last one about a month ago
Embed arbitrary modalities (images, audio, documents, etc) into large language models.
Created
2023-10-11
84 commits to main branch, last one 8 months ago
An open-source cloud-native of large multi-modal models (LMMs) serving framework.
Created
2023-04-04
456 commits to main branch, last one about a year ago
(NeurIPS 2022 CellSeg Challenge - 1st Winner) Open source code for "MEDIAR: Harmony of Data-Centric and Model-Centric for Multi-Modality Microscopy"
Created
2022-11-16
79 commits to main branch, last one 8 months ago
Seed, Code, Harvest: Grow Your Own App with Tree of Thoughts!
Created
2023-05-22
48 commits to main branch, last one about a year ago
An all-new Language Model That Processes Ultra-Long Sequences of 100,000+ Ultra-Fast
Created
2023-05-05
428 commits to master branch, last one 9 months ago
[CVPR 2024] Prompt Highlighter: Interactive Control for Multi-Modal LLMs
Created
2023-11-28
18 commits to main branch, last one 5 months ago
Implementation of MambaByte in "MambaByte: Token-free Selective State Space Model" in Pytorch and Zeta
Created
2024-01-26
9 commits to main branch, last one 10 months ago
Multi-modal Graph learning for Disease Prediction (IEEE Trans. on Medical imaging, TMI2022)
Created
2022-01-01
66 commits to main branch, last one 9 months ago
Implementation of MoE Mamba from the paper: "MoE-Mamba: Efficient Selective State Space Models with Mixture of Experts" in Pytorch and Zeta
Created
2024-01-21
14 commits to main branch, last one 11 months ago