20 results found Sort:

[NeurIPS 2024] An official implementation of ShareGPT4Video: Improving Video Understanding and Generation with Better Captions
Created 2024-06-06
44 commits to master branch, last one 3 months ago
47
687
other
12
[ICML2024 (Oral)] Official PyTorch implementation of DoRA: Weight-Decomposed Low-Rank Adaptation
Created 2024-04-11
45 commits to main branch, last one 4 months ago
18
448
unknown
6
✨✨Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis
Created 2024-06-02
50 commits to main branch, last one about a month ago
🔥🔥🔥 A curated list of papers on LLMs-based multimodal generation (image, video, 3D and audio).
Created 2023-11-17
348 commits to main branch, last one 12 days ago
Curated papers on Large Language Models in Healthcare and Medical domain
Created 2023-06-28
45 commits to main branch, last one 6 months ago
8
269
bsd-3-clause
4
[CVPR'24] HallusionBench: You See What You Think? Or You Think What You See? An Image-Context Reasoning Benchmark Challenging for GPT-4V(ision), LLaVA-1.5, and Other Multi-modality Models
Created 2023-10-22
136 commits to main branch, last one 2 months ago
[ECCV 2024] ShareGPT4V: Improving Large Multi-modal Models with Better Captions
Created 2024-06-06
3 commits to master branch, last one 7 months ago
A curated list of recent and past chart understanding work based on our survey paper: From Pixels to Insights: A Survey on Automatic Chart Understanding in the Era of Large Foundation Models.
Created 2024-01-10
47 commits to main branch, last one 5 months ago
[NeurIPS 2024] This repo contains evaluation code for the paper "Are We on the Right Way for Evaluating Large Vision-Language Models"
Created 2024-03-29
19 commits to main branch, last one 4 months ago
10
106
bsd-3-clause
2
Talk2BEV: Language-Enhanced Bird's Eye View Maps (ICRA'24)
Created 2023-09-15
33 commits to main branch, last one 2 months ago
2
76
apache-2.0
2
This is the official repo for Debiasing Large Visual Language Models, including a Post-Hoc debias method and Visual Debias Decoding strategy.
Created 2024-01-23
7 commits to main branch, last one 10 months ago
[ECCV 2024] API: Attention Prompting on Image for Large Vision-Language Models
Created 2024-09-04
14 commits to master branch, last one 3 months ago
2
52
unknown
3
[ICML 2024] Safety Fine-Tuning at (Almost) No Cost: A Baseline for Vision Large Language Models.
Created 2024-02-03
14 commits to main branch, last one 11 days ago
4
43
apache-2.0
1
An benchmark for evaluating the capabilities of large vision-language models (LVLMs)
Created 2023-07-17
112 commits to main branch, last one about a year ago
GeoPixel: A Pixel Grounding Large Multimodal Model for Remote Sensing is specifically developed for high-resolution remote sensing image analysis, offering advanced multi-target pixel grounding capabi...
Created 2025-01-23
6 commits to main branch, last one 6 days ago
✨A curated list of papers on the uncertainty in multi-modal large language model (MLLM).
Created 2024-11-27
8 commits to main branch, last one 17 days ago
1
26
unknown
7
Official Repository of Multi-Object Hallucination in Vision-Language Models (NeurIPS 2024)
Created 2024-06-25
36 commits to main branch, last one 2 months ago