Trending repositories for topic computer-vision
AI app store powered by 24/7 desktop history. open source | 100% local | dev friendly | 24/7 screen, mic recording
Label Studio is a multi-type data labeling and annotation tool with standardized output format
Visualize streams of multimodal data. Free, fast, easy to use, and simple to integrate. Built in Rust.
Implementation of Vision Transformer, a simple way to achieve SOTA in vision classification with only a single transformer encoder, in Pytorch
500 AI Machine learning Deep learning Computer vision NLP Projects with code
[ARXIV'25] ReCamMaster: Camera-Controlled Generative Rendering from A Single Video
Original reference implementation of "3D Gaussian Splatting for Real-Time Radiance Field Rendering"
[CVPR 2025] MMAudio: Taming Multimodal Joint Training for High-Quality Video-to-Audio Synthesis
RF-DETR is a real-time object detection model architecture developed by Roboflow, SOTA on COCO & designed for fine-tuning.
[CVPR 2025] MASt3R-SLAM: Real-Time Dense SLAM with 3D Reconstruction Priors
Cross-platform, customizable ML solutions for live and streaming media.
Annotate better with CVAT, the industry-leading data engine for machine learning. Used and trusted by teams at any scale, for data of any scale.
🦙 LaMa Image Inpainting, Resolution-robust Large Mask Inpainting with Fourier Convolutions, WACV 2022
[CVPR 2025] Science-T2I: Addressing Scientific Illusions in Image Synthesis
⚕️GenAI powered multi-agentic medical diagnostics and healthcare research assistance chatbot. 🏥 Designed for healthcare professionals, researchers and patients.
The offical implementation of 'FFAA: Multimodal Large Language Model based Explainable Open-World Face Forgery Analysis Assistant'
Florence-2 is a novel vision foundation model with a unified, prompt-based representation for a variety of computer vision and vision-language tasks.
[ARXIV'25] ReCamMaster: Camera-Controlled Generative Rendering from A Single Video
Zone Evaluation: Revealing Spatial Bias in Object Detection (TPAMI 2024)
[CVPR 2025] A unified framework for Scene Coordinate Regression-based visual localization
Pixeltable — AI Data infrastructure providing a declarative, incremental approach for multimodal workloads.
[CVPR 2025] Complexity Experts are Task-Discriminative Learners for Any Image Restoration
[CVPR 2025] MMAudio: Taming Multimodal Joint Training for High-Quality Video-to-Audio Synthesis
CVPR 2025 DarkIR: Robust Low-Light Image Restoration - State of the art low light deblurring [Official PyTorch Implementation]
A Comprehensive Framework for Visual SLAM Systems and Datasets
AI app store powered by 24/7 desktop history. open source | 100% local | dev friendly | 24/7 screen, mic recording
Label Studio is a multi-type data labeling and annotation tool with standardized output format
[CVPR 2025] MMAudio: Taming Multimodal Joint Training for High-Quality Video-to-Audio Synthesis
Implementation of Vision Transformer, a simple way to achieve SOTA in vision classification with only a single transformer encoder, in Pytorch
500 AI Machine learning Deep learning Computer vision NLP Projects with code
Original reference implementation of "3D Gaussian Splatting for Real-Time Radiance Field Rendering"
Cross-platform, customizable ML solutions for live and streaming media.
Visualize streams of multimodal data. Free, fast, easy to use, and simple to integrate. Built in Rust.
RF-DETR is a real-time object detection model architecture developed by Roboflow, SOTA on COCO & designed for fine-tuning.
[ARXIV'25] ReCamMaster: Camera-Controlled Generative Rendering from A Single Video
Interactive deep learning book with multi-framework code, math, and discussions. Adopted at 500 universities from 70 countries including Stanford, MIT, Harvard, and Cambridge.
[CVPR 2025] MASt3R-SLAM: Real-Time Dense SLAM with 3D Reconstruction Priors
[CVPR 2025] Science-T2I: Addressing Scientific Illusions in Image Synthesis
⚕️GenAI powered multi-agentic medical diagnostics and healthcare research assistance chatbot. 🏥 Designed for healthcare professionals, researchers and patients.
LightlyTrain is the first PyTorch framework to pretrain computer vision models on unlabeled data for industrial applications
A Comprehensive Framework for Visual SLAM Systems and Datasets
CVPR 2025 DarkIR: Robust Low-Light Image Restoration - State of the art low light deblurring [Official PyTorch Implementation]
Florence-2 is a novel vision foundation model with a unified, prompt-based representation for a variety of computer vision and vision-language tasks.
Tutorials on computer vision with PyTorch and FiftyOne
The offical implementation of 'FFAA: Multimodal Large Language Model based Explainable Open-World Face Forgery Analysis Assistant'
[CVPR 2025] A unified framework for Scene Coordinate Regression-based visual localization
[CVPR 2025] MMAudio: Taming Multimodal Joint Training for High-Quality Video-to-Audio Synthesis
Zone Evaluation: Revealing Spatial Bias in Object Detection (TPAMI 2024)
[Neural Networks 2025] Dual Selective Fusion Transformer Network for Hyperspectral Image Classification
[ARXIV'25] ReCamMaster: Camera-Controlled Generative Rendering from A Single Video
[CVPR 2025] Complexity Experts are Task-Discriminative Learners for Any Image Restoration
LightlyTrain is the first PyTorch framework to pretrain computer vision models on unlabeled data for industrial applications
Official implementation for WorldScore: A Unified Evaluation Benchmark for World Generation
[CVPR 2025] Science-T2I: Addressing Scientific Illusions in Image Synthesis
AI app store powered by 24/7 desktop history. open source | 100% local | dev friendly | 24/7 screen, mic recording
RF-DETR is a real-time object detection model architecture developed by Roboflow, SOTA on COCO & designed for fine-tuning.
Label Studio is a multi-type data labeling and annotation tool with standardized output format
Lazyeat 吃饭时看剧/刷网页不想沾油手? 对着摄像头比划手势就能暂停视频/全屏/切换视频!Lazyeat is a touch-free controller for use while eating! Don't want greasy hands while watching shows or browsing the web during meals? You can pause v...
Implementation of Vision Transformer, a simple way to achieve SOTA in vision classification with only a single transformer encoder, in Pytorch
500 AI Machine learning Deep learning Computer vision NLP Projects with code
Cross-platform, customizable ML solutions for live and streaming media.
[ARXIV'25] ReCamMaster: Camera-Controlled Generative Rendering from A Single Video
Original reference implementation of "3D Gaussian Splatting for Real-Time Radiance Field Rendering"
《明日方舟》小助手,全日常一键长草!| A one-click tool for the daily tasks of Arknights, supporting all clients.
[CVPR 2025] MASt3R-SLAM: Real-Time Dense SLAM with 3D Reconstruction Priors
Interactive deep learning book with multi-framework code, math, and discussions. Adopted at 500 universities from 70 countries including Stanford, MIT, Harvard, and Cambridge.
Visualize streams of multimodal data. Free, fast, easy to use, and simple to integrate. Built in Rust.
⚕️GenAI powered multi-agentic medical diagnostics and healthcare research assistance chatbot. 🏥 Designed for healthcare professionals, researchers and patients.
[CVPR 2025] Science-T2I: Addressing Scientific Illusions in Image Synthesis
[CVPR 2025] Uni4D: Unifying Visual Foundation Models for 4D Modeling from a Single Video
Lazyeat 吃饭时看剧/刷网页不想沾油手? 对着摄像头比划手势就能暂停视频/全屏/切换视频!Lazyeat is a touch-free controller for use while eating! Don't want greasy hands while watching shows or browsing the web during meals? You can pause v...
A Comprehensive Framework for Visual SLAM Systems and Datasets
Object detection, tracking, and 6DoF Pose Estimation in web browser - Integrated Training Environment to train your own neural network models
[CVPR 2025 Oral] Alias-free Latent Diffusion Models (official implementation)
LightlyTrain is the first PyTorch framework to pretrain computer vision models on unlabeled data for industrial applications
Tutorials on computer vision with PyTorch and FiftyOne
(CVPR 2025) Adversarial Diffusion Compression for Real-World Image Super-Resolution [PyTorch]
[CVPR 2025] A unified framework for Scene Coordinate Regression-based visual localization
RF-DETR is a real-time object detection model architecture developed by Roboflow, SOTA on COCO & designed for fine-tuning.
[ARXIV'25] ReCamMaster: Camera-Controlled Generative Rendering from A Single Video
[CVPR 2025] Complexity Experts are Task-Discriminative Learners for Any Image Restoration
AI app store powered by 24/7 desktop history. open source | 100% local | dev friendly | 24/7 screen, mic recording
RF-DETR is a real-time object detection model architecture developed by Roboflow, SOTA on COCO & designed for fine-tuning.
[CVPR 2025] MASt3R-SLAM: Real-Time Dense SLAM with 3D Reconstruction Priors
Cambrian-1 is a family of multimodal LLMs with a vision-centric design.
[CVPR 2025] MMAudio: Taming Multimodal Joint Training for High-Quality Video-to-Audio Synthesis
MixTeX multimodal LaTeX, ZhEn, and, Table OCR. It performs efficient CPU-based inference in a local offline on Windows.
[ARXIV'25] ReCamMaster: Camera-Controlled Generative Rendering from A Single Video
🔥 Sa2VA: Marrying SAM2 with LLaVA for Dense Grounded Understanding of Images and Videos
This series will take you on a journey from the fundamentals of NLP and Computer Vision to the cutting edge of Vision-Language Models.
This project extends the idea of the innovative architecture of Kolmogorov-Arnold Networks (KAN) to the Convolutional Layers, changing the classic linear transformation of the convolution to learnable...
CityGaussian Series for High-quality Large-Scale Scene Reconstruction with Gaussians
A curated list of 3D Vision papers relating to Robotics domain in the era of large models i.e. LLMs/VLMs, inspired by awesome-computer-vision, including papers, codes, and related websites
AI Productivity Tool - Free and open source, improve user productivity, protect privacy and data security. Provide efficient and convenient AI solutions, built-in local exclusive ChatGPT, Phi, DeepSee...
Lazyeat 吃饭时看剧/刷网页不想沾油手? 对着摄像头比划手势就能暂停视频/全屏/切换视频!Lazyeat is a touch-free controller for use while eating! Don't want greasy hands while watching shows or browsing the web during meals? You can pause v...
[ICLR 2025] Official implementation of Posterior-Mean Rectified Flow: Towards Minimum MSE Photo-Realistic Image Restoration
[TMLR 2025🔥] A survey for the autoregressive models in vision.
AI app store powered by 24/7 desktop history. open source | 100% local | dev friendly | 24/7 screen, mic recording
500 AI Machine learning Deep learning Computer vision NLP Projects with code
Label Studio is a multi-type data labeling and annotation tool with standardized output format
Original reference implementation of "3D Gaussian Splatting for Real-Time Radiance Field Rendering"
Implementation of Vision Transformer, a simple way to achieve SOTA in vision classification with only a single transformer encoder, in Pytorch
Cross-platform, customizable ML solutions for live and streaming media.
Interactive deep learning book with multi-framework code, math, and discussions. Adopted at 500 universities from 70 countries including Stanford, MIT, Harvard, and Cambridge.
《明日方舟》小助手,全日常一键长草!| A one-click tool for the daily tasks of Arknights, supporting all clients.
This repository offers a comprehensive collection of tutorials on state-of-the-art computer vision models and techniques. Explore everything from foundational architectures like ResNet to cutting-edge...
《Pytorch实用教程》(第二版)无论是零基础入门,还是CV、NLP、LLM项目应用,或是进阶工程化部署落地,在这里都有。相信在本书的帮助下,读者将能够轻松掌握 PyTorch 的使用,成为一名优秀的深度学习工程师。
Visualize streams of multimodal data. Free, fast, easy to use, and simple to integrate. Built in Rust.
Cambrian-1 is a family of multimodal LLMs with a vision-centric design.
[CVPR 2025] MMAudio: Taming Multimodal Joint Training for High-Quality Video-to-Audio Synthesis
[ICLR 2025] Official implementation of Posterior-Mean Rectified Flow: Towards Minimum MSE Photo-Realistic Image Restoration
A curated list of 3D Vision papers relating to Robotics domain in the era of large models i.e. LLMs/VLMs, inspired by awesome-computer-vision, including papers, codes, and related websites
[ARXIV'25] ReCamMaster: Camera-Controlled Generative Rendering from A Single Video
A hub for various industry-specific schemas to be used with VLMs.
Evaluating and reproducing real-world robot manipulation policies (e.g., RT-1, RT-1-X, Octo) in simulation under common setups (e.g., Google Robot, WidowX+Bridge) (CoRL 2024)
This project is dedicated to the implementation and research of Kolmogorov-Arnold convolutional networks. The repository includes implementations of 1D, 2D, and 3D convolutions with different kernels...
Official repo for VGGHeads: 3D Multi Head Alignment with a Large-Scale Synthetic Dataset..
This is a warehouse for MobileNetV4-Pytorch-model, can be used to train your image-datasets for vision tasks.
Real-time and accurate open-vocabulary end-to-end object detection
Generative Camera Dolly: Extreme Monocular Dynamic Novel View Synthesis (ECCV 2024 Oral) - Official Implementation
Gaussian Haircut: Human Hair Reconstruction with Strand-Aligned 3D Gaussians
Official PyTorch implementation of the paper "Dataset Distillation with Neural Characteristic Function: A Minmax Perspective" (NCFM) in CVPR 2025 (Highlight).
[CVPR 2025]MAGiC-SLAM: Multi-Agent Gaussian Globally Consistent SLAM