5 results found Sort:

33
584
apache-2.0
7
A novel Multimodal Large Language Model (MLLM) architecture, designed to structurally align visual and textual embeddings.
Created 2024-06-13
32 commits to main branch, last one 25 days ago
28
296
mit
3
[CVPR 2024 Highlight] OPERA: Alleviating Hallucination in Multi-Modal Large Language Models via Over-Trust Penalty and Retrospection-Allocation
Created 2023-11-30
50 commits to main branch, last one 3 months ago
10
259
unknown
6
RLAIF-V: Aligning MLLMs through Open-Source AI Feedback for Super GPT-4V Trustworthiness
Created 2024-05-13
57 commits to main branch, last one 14 days ago
The official code of the paper "Deciphering Cross-Modal Alignment in Large Vision-Language Models with Modality Integration Rate".
Created 2024-10-09
17 commits to main branch, last one 24 days ago
[CVPR 2024] Situational Awareness Matters in 3D Vision Language Reasoning
Created 2024-06-11
16 commits to main branch, last one 12 days ago