81 results found Sort:
- Filter by Primary Language:
- Python (59)
- Jupyter Notebook (11)
- C++ (2)
- HTML (1)
- TypeScript (1)
- +
[ECCV 2024] Official implementation of the paper "Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection"
Created
2023-03-09
84 commits to main branch, last one 3 months ago
PyTorch code for BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation
Created
2022-01-25
64 commits to main branch, last one 2 years ago
Unified embedding generation and search engine. Also available on cloud - cloud.marqo.ai
Created
2022-08-01
1,239 commits to mainline branch, last one 10 hours ago
Chinese version of CLIP which achieves Chinese cross-modal retrieval and representation generation.
Created
2022-07-08
382 commits to master branch, last one 3 months ago
Official repository of OFA (ICML 2022). Paper: OFA: Unifying Architectures, Tasks, and Modalities Through a Simple Sequence-to-Sequence Learning Framework
Created
2022-01-29
712 commits to main branch, last one about a year ago
A collection of original, innovative ideas and algorithms towards Advanced Literate Machinery. This project is maintained by the OCR Team in the Language Technology Lab, Tongyi Lab, Alibaba Group.
ocr
document
documentai
multimodal
end-to-end-ocr
text-detection
computer-vision
vision-language
text-recognition
document-analysis
document-recognition
scene-text-detection
document-intelligence
vision-language-model
document-understanding
scene-text-recognition
artificial-intelligence
multimodal-deep-learning
vision-language-transformer
scene-text-detection-recognition
Created
2022-09-28
63 commits to main branch, last one 11 days ago
[ACL 2024 🔥] Video-ChatGPT is a video conversation model capable of generating meaningful conversation about videos. It combines the capabilities of LLMs with a pretrained visual encoder adapted for ...
Created
2023-05-18
43 commits to main branch, last one 3 months ago
日本語LLMまとめ - Overview of Japanese LLMs
Created
2023-07-09
478 commits to main branch, last one 2 days ago
A general representation model across vision, audio, language modalities. Paper: ONE-PEACE: Exploring One General Representation Model Toward Unlimited Modalities
Created
2023-05-18
136 commits to main branch, last one about a month ago
[ECCV 2024 Oral] DriveLM: Driving with Graph Visual Question Answering
Created
2023-08-08
412 commits to main branch, last one 2 months ago
Pix2Seq codebase: multi-tasks with generative modeling (autoregressive and diffusion)
Created
2022-03-08
35 commits to main branch, last one about a year ago
🔥🔥 LLaVA++: Extending LLaVA with Phi-3 and LLaMA-3 (LLaVA LLaMA-3, LLaVA Phi-3)
Created
2024-04-26
11 commits to main branch, last one 7 months ago
[CVPR 2024] Alpha-CLIP: A CLIP Model Focusing on Wherever You Want
Created
2023-11-27
97 commits to main branch, last one 4 months ago
[ICLR 2024] Controlling Vision-Language Models for Universal Image Restoration. 5th place in the NTIRE 2024 Restore Any Image Model in the Wild Challenge.
Created
2023-09-30
40 commits to main branch, last one 3 months ago
A Framework of Small-scale Large Multimodal Models
Created
2024-02-21
219 commits to main branch, last one 2 days ago
Official implementation of SEED-LLaMA (ICLR 2024).
Created
2023-07-15
81 commits to main branch, last one 2 months ago
This is the third party implementation of the paper Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection.
Created
2023-10-14
12 commits to main branch, last one 5 months ago
CLIPort: What and Where Pathways for Robotic Manipulation
Created
2021-09-20
91 commits to master branch, last one about a year ago
多模态中文LLaMA&Alpaca大语言模型(VisualCLA)
Created
2023-06-16
16 commits to main branch, last one about a year ago
CALVIN - A benchmark for Language-Conditioned Policy Learning for Long-Horizon Robot Manipulation Tasks
Created
2021-07-20
263 commits to main branch, last one 3 months ago
METER: A Multimodal End-to-end TransformER Framework
Created
2021-11-03
20 commits to main branch, last one 2 years ago
[ICCV2021 & TPAMI2023] Vision-Language Transformer and Query Generation for Referring Segmentation
Created
2021-07-23
7 commits to main branch, last one 3 years ago
🛰️ Official repository of paper "RemoteCLIP: A Vision Language Foundation Model for Remote Sensing" (IEEE TGRS)
Created
2023-07-15
39 commits to main branch, last one 5 months ago
[IEEE Transactions on Medical Imaging/TMI] This repo is the official implementation of "LViT: Language meets Vision Transformer in Medical Image Segmentation"
Created
2022-03-10
64 commits to main branch, last one about a year ago
[CVPR2024] ViP-LLaVA: Making Large Multimodal Models Understand Arbitrary Visual Prompts
Created
2023-12-02
44 commits to main branch, last one 4 months ago
Tools for movie and video research
Created
2019-06-05
91 commits to master branch, last one 2 years ago
💐Kaleido-BERT: Vision-Language Pre-training on Fashion Domain
Created
2021-03-05
153 commits to main branch, last one 2 years ago
Experiments and data for the paper "When and why vision-language models behave like bags-of-words, and what to do about it?" Oral @ ICLR 2023
Created
2022-10-07
12 commits to main branch, last one about a year ago
Code and Model for VAST: A Vision-Audio-Subtitle-Text Omni-Modality Foundation Model and Dataset
Created
2023-05-29
13 commits to master branch, last one 8 months ago
Official Repository of paper VideoGPT+: Integrating Image and Video Encoders for Enhanced Video Understanding
Created
2024-06-13
6 commits to main branch, last one 4 months ago