41 results found Sort:
- Filter by Primary Language:
- Python (35)
- Jupyter Notebook (2)
- +
Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities
Created
2019-07-23
1,190 commits to master branch, last one about a month ago
Mobile-Agent: The Powerful Mobile Device Operation Assistant Family
Created
2024-01-26
139 commits to main branch, last one 2 days ago
InternLM-XComposer-2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output
Created
2023-09-26
394 commits to main branch, last one 29 days ago
Cambrian-1 is a family of multimodal LLMs with a vision-centric design.
Created
2024-06-17
55 commits to main branch, last one 9 days ago
Reasoning in Large Language Models: Papers and Resources, including Chain-of-Thought and OpenAI o1 🍓
Created
2022-11-05
163 commits to main branch, last one 16 hours ago
mPLUG-DocOwl: Modularized Multimodal Large Language Model for Document Understanding
Created
2023-07-04
134 commits to main branch, last one 9 hours ago
A family of lightweight multimodal models.
Created
2024-01-31
112 commits to main branch, last one 24 days ago
[CVPR2024] The code for "Osprey: Pixel Understanding with Visual Instruction Tuning"
Created
2023-12-17
29 commits to main branch, last one about a month ago
✨✨Woodpecker: Hallucination Correction for Multimodal Large Language Models. The first work to correct hallucinations in MLLMs.
Created
2023-09-26
106 commits to main branch, last one 3 months ago
[ECCV2024] Grounded Multimodal Large Language Model with Localized Visual Tokenization
Created
2024-04-21
30 commits to main branch, last one 3 months ago
EAGLE: Exploring The Design Space for Multimodal LLMs with Mixture of Encoders
Created
2024-06-27
81 commits to main branch, last one 9 days ago
This project is the official implementation of 'LLMGA: Multimodal Large Language Model based Generation Assistant', ECCV2024 Oral
Created
2023-11-27
2 commits to master branch, last one about a month ago
Custom ComfyUI nodes for Vision Language Models, Large Language Models, Image to Music, Text to Music, Consistent and Random Creative Prompt Generation
Created
2024-01-24
236 commits to main branch, last one 4 days ago
Personal Project: MPP-Qwen14B & MPP-Qwen-Next(Multimodal Pipeline Parallel based on Qwen-LM). Support [video/image/multi-image] {sft/conversations}. Don't let the poverty limit your imagination! Train...
Created
2023-10-24
133 commits to master branch, last one 4 days ago
Youku-mPLUG: A 10 Million Large-scale Chinese Video-Language Pre-training Dataset and Benchmarks
Created
2023-06-06
18 commits to main branch, last one 8 months ago
Awesome_Multimodel is a curated GitHub repository that provides a comprehensive collection of resources for Multimodal Large Language Models (MLLM). It covers datasets, tuning techniques, in-context l...
Created
2023-06-11
35 commits to main branch, last one about a month ago
mPLUG-2: A Modularized Multi-modal Foundation Model Across Text, Image and Video (ICML 2023)
Created
2023-05-22
4 commits to main branch, last one about a year ago
[NeurIPS'24 Spotlight] EVE: Encoder-Free Vision-Language Models
Created
2024-06-14
17 commits to main branch, last one 2 months ago
Official code for Paper "Mantis: Multi-Image Instruction Tuning"
Created
2024-04-12
202 commits to main branch, last one 19 days ago
The code for "TokenPacker: Efficient Visual Projector for Multimodal LLM".
Created
2024-07-03
16 commits to main branch, last one 12 days ago
AUITestAgent is the first automatic, natural language-driven GUI testing tool for mobile apps, capable of fully automating the entire process of GUI interaction and function verification.
Created
2024-07-11
22 commits to main branch, last one 2 months ago
[CVPR2024] Generative Region-Language Pretraining for Open-Ended Object Detection
Created
2024-03-15
2 commits to main branch, last one 6 months ago
Image Textualization: An Automatic Framework for Generating Rich and Detailed Image Descriptions
Created
2024-04-15
98 commits to main branch, last one 2 months ago
DenseFusion-1M: Merging Vision Experts for Comprehensive Multimodal Perception
Created
2024-07-05
8 commits to main branch, last one a day ago
Multimodal chatbot with computer vision capabilities integrated
Created
2023-05-19
19 commits to V1.0 branch, last one 4 months ago
Official Repo of Graphist
Created
2024-03-24
3 commits to main branch, last one 5 months ago
A toolbox for benchmarking trustworthiness of multimodal large language models (MultiTrust, NeurIPS 2024 Track Datasets and Benchmarks)
Created
2024-06-09
156 commits to main branch, last one 2 days ago
A RLHF Infrastructure for Vision-Language Models
Created
2023-12-27
5 commits to main branch, last one 3 months ago
[ECCV2024] Official code implementation of Merlin: Empowering Multimodal LLMs with Foresight Minds
Created
2023-11-30
12 commits to main branch, last one 2 months ago
Emotion-LLaMA: Multimodal Emotion Recognition and Reasoning with Instruction Tuning
Created
2024-05-22
61 commits to main branch, last one about a month ago