5 results found Sort:

57
846
apache-2.0
13
A novel Multimodal Large Language Model (MLLM) architecture, designed to structurally align visual and textual embeddings.
Created 2024-06-13
40 commits to main branch, last one a day ago
13
327
unknown
5
[CVPR'25] RLAIF-V: Open-Source AI Feedback Leads to Super GPT-4V Trustworthiness
Created 2024-05-13
62 commits to main branch, last one 22 days ago
28
323
mit
3
[CVPR 2024 Highlight] OPERA: Alleviating Hallucination in Multi-Modal Large Language Models via Over-Trust Penalty and Retrospection-Allocation
Created 2023-11-30
50 commits to main branch, last one 7 months ago
The official code of the paper "Deciphering Cross-Modal Alignment in Large Vision-Language Models with Modality Integration Rate".
Created 2024-10-09
17 commits to main branch, last one 3 months ago
[CVPR 2024] Situational Awareness Matters in 3D Vision Language Reasoning
Created 2024-06-11
16 commits to main branch, last one 3 months ago