258 results found Sort:
- Filter by Primary Language:
- Python (194)
- Jupyter Notebook (42)
- C++ (3)
- TeX (3)
- C# (1)
- Shell (1)
- +
OpenMMLab Detection Toolbox and Benchmark
Created
2018-08-22
2,706 commits to main branch, last one 10 months ago
pix2tex: Using a ViT to convert images of equations into LaTeX code.
Created
2020-12-11
323 commits to main branch, last one 15 days ago
This repository contains demos I made with the Transformers library by HuggingFace.
Created
2020-08-31
431 commits to master branch, last one 2 months ago
[NeurIPS 2024 Oral][GPT beats diffusion🔥] [scaling laws in visual generation📈] Official impl. of "Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction". An *ultra-simp...
Created
2024-04-01
45 commits to main branch, last one 15 days ago
Ingest, parse, and optimize any data format ➡️ from documents to multimedia ➡️ for enhanced compatibility with GenAI frameworks
Created
2024-06-04
122 commits to main branch, last one about a month ago
An ultimately comprehensive paper list of Vision Transformer/Attention, including papers, codes, and related websites
Created
2021-09-15
1,600 commits to main branch, last one 4 months ago
SwinIR: Image Restoration Using Swin Transformer (official repository)
Created
2021-08-16
66 commits to main branch, last one 2 years ago
Efficient AI Backbones including GhostNet, TNT and MLP, developed by Huawei Noah's Ark Lab.
Created
2019-11-16
152 commits to master branch, last one 21 days ago
OpenMMLab Pre-training Toolbox and Benchmark
Created
2020-07-09
974 commits to main branch, last one about a month ago
Scenic: A Jax Library for Computer Vision Research and Beyond
Created
2021-07-12
715 commits to main branch, last one 3 days ago
Towhee is a framework that is dedicated to making neural data processing pipelines simple and fast.
Created
2021-07-13
1,586 commits to main branch, last one 2 months ago
InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System for Long-term Streaming Video and Audio Interactions
Created
2023-09-26
409 commits to main branch, last one 3 days ago
Efficient vision foundation models for high-resolution generation and perception.
Created
2023-04-05
134 commits to master branch, last one 12 days ago
EVA Series: Visual Representation Fantasies from BAAI
Created
2022-11-14
276 commits to master branch, last one 4 months ago
[CVPR 2021] Official PyTorch implementation for Transformer Interpretability Beyond Attention Visualization, a novel method to visualize classifications by Transformer based networks.
Created
2020-11-23
86 commits to main branch, last one about a year ago
An all-in-one toolkit for computer vision
Created
2022-04-02
304 commits to master branch, last one 5 months ago
This is a collection of our NAS and Vision Transformer work.
Created
2020-10-12
222 commits to main branch, last one 11 months ago
[ECCV2024] Video Foundation Models & Data for Multimodal Understanding
benchmark
multimodal
video-clip
video-data
video-dataset
self-supervised
video-retrieval
foundation-models
action-recognition
instruction-tuning
masked-autoencoder
vision-transformer
video-understanding
zero-shot-retrieval
contrastive-learning
open-set-recognition
video-question-answering
zero-shot-classification
temporal-action-localization
spatio-temporal-action-localization
Created
2022-11-23
229 commits to main branch, last one 10 days ago
[NeurIPS 2022 Spotlight] VideoMAE: Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-Training
Created
2022-03-23
64 commits to main branch, last one about a year ago
The official repo for [NeurIPS'22] "ViTPose: Simple Vision Transformer Baselines for Human Pose Estimation" and [TPAMI'23] "ViTPose++: Vision Transformer for Generic Body Pose Estimation"
Created
2022-04-27
19 commits to main branch, last one about a year ago
VRT: A Video Restoration Transformer (official repository)
Created
2022-01-18
15 commits to main branch, last one 2 years ago
[ICLR 2023 Spotlight] Vision Transformer Adapter for Dense Predictions
Created
2022-05-16
62 commits to main branch, last one 11 months ago
Extract clean data from anywhere, powered by vision-language models ⚡
Created
2024-03-22
312 commits to main branch, last one about a month ago
ICCV2021, Tokens-to-Token ViT: Training Vision Transformers from Scratch on ImageNet
Created
2021-01-23
63 commits to main branch, last one 2 years ago
Awesome List of Attention Modules and Plug&Play Modules in Computer Vision
Created
2021-01-10
110 commits to main branch, last one about a year ago
Official PyTorch implementation of VoxFormer [CVPR 2023 Highlight]
Created
2023-02-21
47 commits to main branch, last one about a year ago
A general representation model across vision, audio, language modalities. Paper: ONE-PEACE: Exploring One General Representation Model Toward Unlimited Modalities
Created
2023-05-18
136 commits to main branch, last one 2 months ago
A curated list of foundation models for vision and language tasks
Created
2023-04-04
282 commits to main branch, last one 2 days ago
Explainability for Vision Transformers
Created
2020-12-29
19 commits to main branch, last one 3 years ago
SOTA Semantic Segmentation Models in PyTorch
Created
2021-06-02
98 commits to main branch, last one 9 months ago