42 results found Sort:
- Filter by Primary Language:
- Python (28)
- Jupyter Notebook (3)
- HTML (2)
- C++ (1)
- +
:sparkles::sparkles:Latest Advances on Multimodal Large Language Models
multi-modality
chain-of-thought
instruction-tuning
in-context-learning
instruction-following
large-language-models
visual-chain-of-thought
visual-instruction-tuning
visual-in-context-learning
large-vision-language-model
multimodal-chain-of-thought
large-vision-language-models
multimodal-instruction-tuning
multimodal-in-context-learning
multimodal-large-language-models
Created
2023-05-19
645 commits to main branch, last one a day ago
Mobile-Agent: The Powerful Mobile Device Operation Assistant Family
Created
2024-01-26
98 commits to main branch, last one 2 days ago
ModelScope-Agent: An agent framework connecting models in ModelScope with the world
Created
2023-08-03
368 commits to master branch, last one a day ago
[ICML 2024] Mastering Text-to-Image Diffusion: Recaptioning, Planning, and Generating with Multimodal LLMs (PRG)
Created
2024-01-22
50 commits to main branch, last one 22 days ago
mPLUG-DocOwl: Modularized Multimodal Large Language Model for Document Understanding
Created
2023-07-04
117 commits to main branch, last one 25 days ago
Cambrian-1 is a family of multimodal LLMs with a vision-centric design.
Created
2024-06-17
24 commits to main branch, last one 4 days ago
A family of lightweight multimodal models.
Created
2024-01-31
102 commits to main branch, last one 4 days ago
LLaVA-Plus: Large Language and Vision Assistants that Plug and Learn to Use Skills
Created
2023-11-07
404 commits to main branch, last one 6 months ago
✨✨Woodpecker: Hallucination Correction for Multimodal Large Language Models. The first work to correct hallucinations in MLLMs.
Created
2023-09-26
106 commits to main branch, last one 11 days ago
[CVPR 2024] 🎬💭 chat with over 10K frames of video!
Created
2023-06-26
93 commits to main branch, last one 13 days ago
Speech, Language, Audio, Music Processing with Large Language Model
Created
2023-10-23
612 commits to main branch, last one 17 days ago
A collection of resources on applications of multi-modal learning in medical imaging.
Created
2022-07-13
130 commits to main branch, last one 6 days ago
Research Trends in LLM-guided Multimodal Learning.
Created
2023-05-29
16 commits to main branch, last one 8 months ago
A Gradio demo of MGIE
Created
2023-09-28
1 commits to main branch, last one 4 months ago
Personal Project: MPP-Qwen14B & MPP-Qwen-Next(Multimodal Pipeline Parallel based on Qwen-LM). Support [video/image/multi-image] {sft/conversations}. Don't let the poverty limit your imagination! Train...
Created
2023-10-24
129 commits to master branch, last one 2 days ago
✨✨Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis
Created
2024-06-02
46 commits to main branch, last one 11 days ago
Youku-mPLUG: A 10 Million Large-scale Chinese Video-Language Pre-training Dataset and Benchmarks
Created
2023-06-06
18 commits to main branch, last one 5 months ago
🔥🔥🔥 A curated list of papers on LLMs-based multimodal generation (image, video, 3D and audio).
Created
2023-11-17
325 commits to main branch, last one 23 hours ago
[WACV 2024 Survey Paper] Multimodal Large Language Models for Autonomous Driving
Created
2023-10-22
90 commits to main branch, last one 3 months ago
Curated papers on Large Language Models in Healthcare and Medical domain
Created
2023-06-28
35 commits to main branch, last one 13 days ago
From scratch implementation of a vision language model in pure PyTorch
Created
2024-04-17
47 commits to main branch, last one about a month ago
[Paper][Preprint 2023] Making Large Language Models Perform Better in Knowledge Graph Completion
Created
2023-10-10
32 commits to main branch, last one 4 months ago
The code of “Unveiling Encoder-Free Vision-Language Models“
Created
2024-06-14
8 commits to main branch, last one 6 days ago
🦩 Visual Instruction Tuning with Polite Flamingo - training multi-modal LLMs to be both clever and polite! (AAAI-24 Oral)
Created
2023-07-02
35 commits to main branch, last one 6 months ago
mPLUG-HalOwl: Multimodal Hallucination Evaluation and Mitigating
Created
2024-01-18
14 commits to main branch, last one 5 months ago
Reading list for Multimodal Large Language Models
Created
2023-06-06
39 commits to main branch, last one 10 months ago
Explore the Limits of Omni-modal Pretraining at Scale
Created
2024-06-11
4 commits to master branch, last one a day ago
Matryoshka Multimodal Models
Created
2024-05-27
466 commits to main branch, last one 25 days ago
This is the official implementation of the paper "Needle In A Multimodal Haystack"
Created
2024-06-05
40 commits to main branch, last one 9 days ago
An Easy-to-use Hallucination Detection Framework for LLMs.
Created
2023-12-31
56 commits to main branch, last one 2 months ago