14 results found Sort:

Effective prompting for Large Multimodal Models like GPT-4 Vision, LLaVA or CogVLM. 🔥
Created 2023-11-24
71 commits to develop branch, last one 5 months ago
[CVPR 2024 🔥] Grounding Large Multimodal Model (GLaMM), the first-of-its-kind model capable of generating natural language responses that are seamlessly integrated with object segmentation masks.
Created 2023-11-02
42 commits to main branch, last one 27 days ago
The Cradle framework is a first attempt at General Computer Control (GCC). Cradle supports agents to ace any computer task by enabling strong reasoning abilities, self-improvment, and skill curation, ...
Created 2024-03-03
17 commits to main branch, last one 2 days ago
LLaVA-Interactive-Demo
Created 2023-10-12
36 commits to main branch, last one 4 months ago
PG-Video-LLaVA: Pixel Grounding in Large Multimodal Video Models
Created 2023-11-20
8 commits to main branch, last one 5 months ago
3
200
bsd-3-clause
4
[CVPR'24] HallusionBench: You See What You Think? Or You Think What You See? An Image-Context Reasoning Benchmark Challenging for GPT-4V(ision), LLaVA-1.5, and Other Multi-modality Models
Created 2023-10-22
132 commits to main branch, last one 3 months ago
😎 up-to-date & curated list of awesome LMM hallucinations papers, methods & resources.
Created 2023-10-11
57 commits to main branch, last one 3 months ago
2
113
unknown
8
[LMM + AIGC] What do we expect from LMMs as AIGI evaluators and how do they perform?
Created 2024-05-29
44 commits to main branch, last one 8 days ago
8
109
apache-2.0
8
Official code for Paper "Mantis: Multi-Image Instruction Tuning"
Created 2024-04-12
164 commits to main branch, last one a day ago
MLLM-Tool: A Multimodal Large Language Model For Tool Agent Learning
Created 2024-01-08
59 commits to main branch, last one about a month ago
Official Repo of Graphist
Created 2024-03-24
3 commits to main branch, last one 2 months ago
1
57
apache-2.0
3
A RLHF Infrastructure for Vision-Language Models
Created 2023-12-27
5 commits to main branch, last one 17 days ago
LLaVA inference with multiple images at once for cross-image analysis.
Created 2023-11-30
18 commits to main branch, last one 3 months ago
🌋👵🏻 Yo'LLaVA: Your Personalized Language and Vision Assistant
Created 2024-06-13
2 commits to main branch, last one 15 days ago