Search Results - RepositoryStats

awesome-generative-ai-guide aishwaryanr

2.4k

11.7k

mit

371

A one stop repository for generative AI research updates, interview resources, notebooks and much more!

llms awesome awesome-list generative-ai notebook-jupyter interview-questions vision-and-language large-language-models

Created 2024-02-06

267 commits to main branch, last one 13 days ago

LAVIS salesforce

1.0k

10.4k

bsd-3-clause

95

LAVIS - A One-stop Library for Language-Vision Intelligence

salesforce deep-learning image-captioning vision-framework multimodal-datasets vision-and-language deep-learning-library multimodal-deep-learning visual-question-anwsering vision-language-pretraining vision-language-transformer

Created 2022-08-24

492 commits to main branch, last one 4 months ago

maestro roboflow

203

2.5k

apache-2.0

34

streamline the fine-tuning process for multimodal models: PaliGemma 2, Florence-2, and Qwen2.5-VL

vqa qwen2-vl paligemma captioning florence-2 multimodal fine-tuning phi-3-vision transformers objectdetection vision-and-language

Created 2023-11-24

505 commits to develop branch, last one 7 days ago

OmAgent om-ai-lab

271

2.5k

apache-2.0

133

Build multimodal language agents for fast prototype and production

Created 2024-07-04

531 commits to main branch, last one about a month ago

ALBEF salesforce

205

1.6k

bsd-3-clause

12

Code for ALBEF: a new vision-language pre-training method

image-text vision-and-language contrastive-learning representation-learning weakly-supervised-learning

Created 2021-07-13

39 commits to main branch, last one 2 years ago

Multimodal-GPT open-mmlab

131

1.5k

apache-2.0

13

Multimodal-GPT

gpt gpt-4 llama flamingo multimodal transformer vision-and-language

Created 2023-04-26

24 commits to main branch, last one about a year ago

ViLT dandelin

216

1.4k

apache-2.0

14

Code for the ICML 2021 (long talk) paper: "ViLT: Vision-and-Language Transformer Without Convolution or Region Supervision"

vision-and-language

Created 2021-03-25

19 commits to master branch, last one 3 years ago

OmDet om-ai-lab

111

1.3k

apache-2.0

69

Real-time and accurate open-vocabulary end-to-end object detection

coco lvis real-time zero-shot computer-vision open-vocabulary object-detection vision-and-language zero-shot-object-detection

Created 2024-03-11

32 commits to main branch, last one 3 months ago

prismer NVlabs

73

1.3k

other

16

The implementation of "Prismer: A Vision-Language Model with Multi-Task Experts".

vqa language-model image-captioning multi-task-learning vision-and-language multi-modal-learning vision-language-model

Created 2023-03-02

36 commits to main branch, last one about a year ago

awesome-vision-language-pretraining-papers yuewang-cuhk

104

1.2k

unknown

51

Recent Advances in Vision and Language PreTrained Models (VL-PTMs)

bert vl-ptms pretraining vision-and-language multimodal-deep-learning

Created 2020-03-25

38 commits to master branch, last one 3 years ago

awesome-japanese-llm llm-jp

34

1.1k

apache-2.0

26

日本語LLMまとめ - Overview of Japanese LLMs

llm llms japanese multimodal japanese-llm llm-japanese generative-ai language-model language-models vision-language generative-model foundation-models generative-models japanese-language vision-and-language large-language-model large-language-models vision-language-model japanese-language-model

Created 2023-07-09

519 commits to main branch, last one a day ago

Oscar microsoft

252

1.0k

mit

25

Oscar and VinVL

vqa oscar vinvl pre-training image-captioning image-text-search vision-and-language

This repository has been archived (exclude archived)

Created 2020-05-14

28 commits to master branch, last one about a year ago

Aria rhymes-ai

86

1.0k

apache-2.0

20

Codebase for Aria - an Open Multimodal Native MoE

multimodal mixture-of-experts vision-and-language

Created 2024-09-29

207 commits to main branch, last one 2 months ago

ONE-PEACE OFA-Sys

71

1.0k

apache-2.0

14

A general representation model across vision, audio, language modalities. Paper: ONE-PEACE: Exploring One General Representation Model Toward Unlimited Modalities

multimodal audio-language vision-language contrastive-loss foundation-models vision-transformer vision-and-language representation-learning

Created 2023-05-18

136 commits to main branch, last one 6 months ago

xmodaler YehLi

105

970

other

28

X-modaler is a versatile and high-performance codebase for cross-modal analytics(e.g., image captioning, video captioning, vision-language pre-training, visual question answering, visual commonsense r...

tden pretraining image-captioning video-captioning vision-and-language cross-modal-retrieval visual-question-answering

Created 2021-06-25

84 commits to master branch, last one 2 years ago

groundingLMM mbzuai-oryx

46

863

unknown

30

[CVPR 2024 🔥] Grounding Large Multimodal Model (GLaMM), the first-of-its-kind model capable of generating natural language responses that are seamlessly integrated with object segmentation masks.

lmm llm-agent foundation-models vision-and-language vision-language-model

Created 2023-11-02

43 commits to main branch, last one 4 months ago

DL-NLP-Readings 26hzhang

262

850

mit

80

My Reading Lists of Deep Learning and Natural Language Processing

paper robotics commonsense deep-learning language-model machine-learning source-code-link bibtex-references vision-and-language reinforcement-learning natural-language-processing

Created 2018-02-02

18 commits to master branch, last one 2 years ago

AlphaCLIP SunzeY

55

802

apache-2.0

12

[CVPR 2024] Alpha-CLIP: A CLIP Model Focusing on Wherever You Want

deep-learning vision-language machine-learning vision-transformer vision-and-language vision-language-model

Created 2023-11-27

97 commits to main branch, last one 8 months ago

UNITER ChenRocks

112

792

unknown

17

Research code for ECCV 2020 paper "UNITER: UNiversal Image-TExt Representation Learning"

pytorch pre-training transformers vision-and-language

Created 2020-01-28

70 commits to master branch, last one 3 years ago

DoRA NVlabs

53

763

other

10

[ICML2024 (Oral)] Official PyTorch implementation of DoRA: Weight-Decomposed Low-Rank Adaptation

lora deep-learning instruction-tuning vision-and-language deep-neural-networks commonsense-reasoning large-language-models parameter-efficient-tuning large-vision-language-models parameter-efficient-fine-tuning

Created 2024-04-11

45 commits to main branch, last one 6 months ago

PointLLM OpenRobotLab

37

760

unknown

13

[ECCV 2024 Best Paper Candidate] PointLLM: Empowering Large Language Models to Understand Point Clouds

3d gpt-4 llama chatbot pointllm objaverse multimodal point-cloud foundation-models vision-and-language large-language-models representation-learning

Created 2023-08-17

51 commits to master branch, last one 5 months ago

VL-BERT jackroos

111

740

mit

14

Code for ICLR 2020 paper "VL-BERT: Pre-training of Generic Visual-Linguistic Representations".

bert pytorch vl-bert iclr2020 pre-training vision-and-language representation-learning self-supervised-learning

Created 2019-11-22

25 commits to master branch, last one 4 years ago

ClipBERT jayleicn

86

718

mit

8

[CVPR 2021 Best Student Paper Honorable Mention, Oral] Official PyTorch code for ClipBERT, an efficient framework for end-to-end learning on image-text and video-text tasks.

vqa pytorch cvpr2021 video-retrieval vision-and-language video-question-answering

Created 2021-02-10

14 commits to main branch, last one 2 years ago

top-cvpr-2024-papers SkalskiP

59

711

cc0-1.0

16

This repository is a curated collection of the most exciting and influential CVPR 2024 papers. 🔥 [Paper + Code + Demo]

cvpr paper cvpr2024 transformers computer-vision object-detection image-segmentation vision-and-language

Created 2024-04-10

56 commits to master branch, last one 9 months ago

top-cvpr-2023-papers SkalskiP

64

646

cc0-1.0

12

This repository is a curated collection of the most exciting and influential CVPR 2023 papers. 🔥 [Paper + Code]

cvpr paper cvpr2023 transformers computer-vision object-detection image-segmentation vision-and-language

Created 2023-06-15

30 commits to master branch, last one 9 months ago

Proctoring-AI vardanagarwal

339

573

mit

29

Creating a software for automatic monitoring in online proctoring

ssd dlib nltk opencv tflite yolov3 mobilenet automation proctoring eye-tracking face-spoofing hacktoberfest proctoring-ai face-detection speech-to-text phone-detection vision-and-language

Created 2020-05-04

86 commits to master branch, last one 4 months ago