28 results found Sort:
- Filter by Primary Language:
- Python (24)
- HTML (1)
- Jupyter Notebook (1)
- +
[NeurIPS 2024] An official implementation of ShareGPT4Video: Improving Video Understanding and Generation with Better Captions
Created
2024-06-06
44 commits to master branch, last one 2 months ago
✨✨VITA: Towards Open-Source Interactive Omni Multimodal LLM
Created
2024-08-10
99 commits to main branch, last one 17 hours ago
Open Source Generative Process Automation (i.e. Generative RPA). AI-First Process Automation with Large ([Language (LLMs) / Action (LAMs) / Multimodal (LMMs)] / Visual Language (VLMs)) Models
Created
2023-04-12
925 commits to main branch, last one 12 days ago
LLaVA-Plus: Large Language and Vision Assistants that Plug and Learn to Use Skills
Created
2023-11-07
404 commits to main branch, last one about a year ago
A Framework of Small-scale Large Multimodal Models
Created
2024-02-21
222 commits to main branch, last one 3 days ago
A collection of resources on applications of multi-modal learning in medical imaging.
Created
2022-07-13
151 commits to main branch, last one about a month ago
An open-source implementation for training LLaVA-NeXT.
Created
2024-05-11
36 commits to master branch, last one about a month ago
This repo contains evaluation code for the paper "MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI"
Created
2023-11-23
131 commits to main branch, last one 11 days ago
[CVPR 2024 Highlight] OPERA: Alleviating Hallucination in Multi-Modal Large Language Models via Over-Trust Penalty and Retrospection-Allocation
Created
2023-11-30
50 commits to main branch, last one 3 months ago
Open Platform for Embodied Agents
Created
2024-03-13
126 commits to main branch, last one 2 months ago
The official evaluation suite and dynamic data release for MixEval.
Created
2024-06-01
120 commits to main branch, last one about a month ago
A minimal codebase for finetuning large multimodal models, supporting llava-1.5/1.6, llava-interleave, llava-next-video, llava-onevision, llama-3.2-vision, qwen-vl, qwen2-vl, phi3-v etc.
Created
2024-07-20
104 commits to main branch, last one 5 days ago
Embed arbitrary modalities (images, audio, documents, etc) into large language models.
Created
2023-10-11
84 commits to main branch, last one 8 months ago
[ECCV 2024] ShareGPT4V: Improving Large Multi-modal Models with Better Captions
Created
2024-06-06
3 commits to master branch, last one 5 months ago
[NeurIPS 2024] This repo contains evaluation code for the paper "Are We on the Right Way for Evaluating Large Vision-Language Models"
Created
2024-03-29
19 commits to main branch, last one 2 months ago
A curated list of awesome Multimodal studies.
Created
2024-04-05
60 commits to main branch, last one 3 days ago
The official code of the paper "Deciphering Cross-Modal Alignment in Large Vision-Language Models with Modality Integration Rate".
Created
2024-10-09
17 commits to main branch, last one 24 days ago
[ECCV 2024] BenchLMM: Benchmarking Cross-style Visual Capability of Large Multimodal Models
Created
2023-11-20
93 commits to main branch, last one 4 months ago
[ECCV 2024] API: Attention Prompting on Image for Large Vision-Language Models
Created
2024-09-04
14 commits to master branch, last one 2 months ago
Evaluation framework for paper "VisualWebBench: How Far Have Multimodal LLMs Evolved in Web Page Understanding and Grounding?"
Created
2024-04-02
26 commits to main branch, last one 2 months ago
The official repo for “TextCoT: Zoom In for Enhanced Multimodal Text-Rich Image Understanding”.
Created
2024-04-15
20 commits to main branch, last one 2 months ago
LMM which strictly superset LLM embedded
Created
2024-08-23
5 commits to main branch, last one about a month ago
A Comprehensive Framework for Developing and Evaluating Multimodal Role-Playing Agents
Created
2024-07-26
15 commits to main branch, last one about a month ago
A bug-free and improved implementation of LLaVA-UHD, based on the code from the official repo
Created
2024-06-12
11 commits to main branch, last one 5 months ago
CAMEL-Bench is an Arabic benchmark for evaluating multimodal models across eight domains with 29,000 questions.
Created
2024-10-23
21 commits to main branch, last one 14 days ago
[NeurIPS-2024] The offical Implementation of "Instruction-Guided Visual Masking"
Created
2024-04-17
71 commits to master branch, last one about a month ago
This repo contains evaluation code for the paper "MileBench: Benchmarking MLLMs in Long Context"
llm
llms
benchmark
evaluation
multimodal
deep-learning
multimodality
computer-vision
machine-learning
foundation-models
deep-neural-networks
large-language-models
long-context-modeling
large-multimodal-models
long-context-transformers
visual-question-answering
natural-language-processing
multimodal-large-language-models
Created
2024-04-12
17 commits to main branch, last one 5 months ago
Awesome multi-modal large language paper/project, collections of popular training strategies, e.g., PEFT, LoRA.
Created
2023-06-13
29 commits to main branch, last one 4 months ago