Trending repositories for topic computer-vision
TorchGeo: datasets, samplers, transforms, and pre-trained models for geospatial data
one API to get all user desktop data (local, cross platform, 24/7, screen, voice, keyboard, mouse, camera recording). sandboxed js plugin system. keyboard and mouse control
Label Studio is a multi-type data labeling and annotation tool with standardized output format
Original reference implementation of "3D Gaussian Splatting for Real-Time Radiance Field Rendering"
Visualize streams of multimodal data. Free, fast, easy to use, and simple to integrate. Built in Rust.
Implementation of Vision Transformer, a simple way to achieve SOTA in vision classification with only a single transformer encoder, in Pytorch
Cross-platform, customizable ML solutions for live and streaming media.
《明日方舟》小助手,全日常一键长草!| A one-click tool for the daily tasks of Arknights, supporting all clients.
Interactive deep learning book with multi-framework code, math, and discussions. Adopted at 500 universities from 70 countries including Stanford, MIT, Harvard, and Cambridge.
Semantic segmentation models with 500+ pretrained convolutional and transformer-based backbones.
Chinese version of CLIP which achieves Chinese cross-modal retrieval and representation generation.
The world's 1st free and open source palm recognition SDK for Windows and Linux (Palm detection, ROI extraction, Template extraction, Template mathcing)
Explore a collection of resources and projects in Computer Science, covering algorithms, data structures, programming languages, and emerging technologies. Ideal for learners and enthusiasts looking t...
[TIP2024] MWFormer: Multi-Weather Image Restoration Using Degradation-Aware Transformers
The world's 1st free and open source palm recognition SDK for Windows and Linux (Palm detection, ROI extraction, Template extraction, Template mathcing)
Official PyTorch implementation of the WACV 2025 paper "Composed Image Retrieval for Training-FREE DOMain Conversion".
Official Implementation of the paper: "A Distractor-Aware Memory for Visual Object Tracking with SAM2"
Pixeltable — AI Data infrastructure providing a declarative, incremental approach for multimodal workloads.
TorchGeo: datasets, samplers, transforms, and pre-trained models for geospatial data
Material Anything: Generating Materials for Any 3D Object via Diffusion
[arXiv 2024] This is the official implementation of paper "Robots Pre-Train Robots: Manipulation-Centric Robotic Representation from Large-Scale Robot Datasets".
This repository is the official code for ResEmoteNet. The project is written in Python using PyTorch in MacBook Pro (M2 Pro 10-core CPU and 16-core GPU).
Human-VDM: Learning Single-Image 3D Human Gaussian Splatting from Video Diffusion Models
Official code for "Amodal Completion via Progressive Mixed Context Diffusion" [CVPR 2024 Highlight]
A machine learning framework for reconstructing articulated 3D animals from images
(ROS, C++) YOLOv9 detection using TensorRT, now supporting TensorRT 10
TorchGeo: datasets, samplers, transforms, and pre-trained models for geospatial data
one API to get all user desktop data (local, cross platform, 24/7, screen, voice, keyboard, mouse, camera recording). sandboxed js plugin system. keyboard and mouse control
Label Studio is a multi-type data labeling and annotation tool with standardized output format
Implementation of Vision Transformer, a simple way to achieve SOTA in vision classification with only a single transformer encoder, in Pytorch
Original reference implementation of "3D Gaussian Splatting for Real-Time Radiance Field Rendering"
Visualize streams of multimodal data. Free, fast, easy to use, and simple to integrate. Built in Rust.
500 AI Machine learning Deep learning Computer vision NLP Projects with code
Cross-platform, customizable ML solutions for live and streaming media.
The world's 1st free and open source palm recognition SDK for Windows and Linux (Palm detection, ROI extraction, Template extraction, Template mathcing)
《明日方舟》小助手,全日常一键长草!| A one-click tool for the daily tasks of Arknights, supporting all clients.
Advanced AI Explainability for computer vision. Support for CNNs, Vision Transformers, Classification, Object detection, Segmentation, Image similarity and more.
Interactive deep learning book with multi-framework code, math, and discussions. Adopted at 500 universities from 70 countries including Stanford, MIT, Harvard, and Cambridge.
The world's 1st free and open source palm recognition SDK for Windows and Linux (Palm detection, ROI extraction, Template extraction, Template mathcing)
Explore a collection of resources and projects in Computer Science, covering algorithms, data structures, programming languages, and emerging technologies. Ideal for learners and enthusiasts looking t...
[TIP2024] MWFormer: Multi-Weather Image Restoration Using Degradation-Aware Transformers
Official Implementation of the paper: "A Distractor-Aware Memory for Visual Object Tracking with SAM2"
Official PyTorch implementation of the WACV 2025 paper "Composed Image Retrieval for Training-FREE DOMain Conversion".
Pixeltable — AI Data infrastructure providing a declarative, incremental approach for multimodal workloads.
[NeurIPS2024] Multiview Scene Graph (topologically representing a scene from unposed images by interconnected place and object nodes)
This repository is the official code for ResEmoteNet. The project is written in Python using PyTorch in MacBook Pro (M2 Pro 10-core CPU and 16-core GPU).
Material Anything: Generating Materials for Any 3D Object via Diffusion
DSL-FIQA: Assessing Facial Image Quality via Dual-Set Degradation Learning and Landmark-Guided Transformer (CVPR 2024)
[arXiv 2024] This is the official implementation of paper "Robots Pre-Train Robots: Manipulation-Centric Robotic Representation from Large-Scale Robot Datasets".
This is a warehouse for MobileNetV4-Pytorch-model, can be used to train your image-datasets for vision tasks.
Python scripts performing Open Vocabulary Object Detection using the YOLO-World model in ONNX.
Official code for "Amodal Completion via Progressive Mixed Context Diffusion" [CVPR 2024 Highlight]
Open source digital rocks software platform for micro-CT, CT, thin sections and borehole image analysis. Includes tools for: annotation, AI, HPC, porous media flow simulation, porosity analysis, perme...
[ECCV 2024] Monocular Occupancy Prediction for Scalable Indoor Scenes
Material Anything: Generating Materials for Any 3D Object via Diffusion
The official implementation of the paper titled "StableV2V: Stablizing Shape Consistency in Video-to-Video Editing".
The world's 1st free and open source palm recognition SDK for Windows and Linux (Palm detection, ROI extraction, Template extraction, Template mathcing)
Official Implementation of the paper: "A Distractor-Aware Memory for Visual Object Tracking with SAM2"
[3DV 2025] GarmentDreamer: 3DGS Guided Garment Synthesis with Diverse Geometry and Texture Details
[TIP2024] MWFormer: Multi-Weather Image Restoration Using Degradation-Aware Transformers
one API to get all user desktop data (local, cross platform, 24/7, screen, voice, keyboard, mouse, camera recording). sandboxed js plugin system. keyboard and mouse control
Visualize streams of multimodal data. Free, fast, easy to use, and simple to integrate. Built in Rust.
Implementation of Vision Transformer, a simple way to achieve SOTA in vision classification with only a single transformer encoder, in Pytorch
Original reference implementation of "3D Gaussian Splatting for Real-Time Radiance Field Rendering"
Label Studio is a multi-type data labeling and annotation tool with standardized output format
Cross-platform, customizable ML solutions for live and streaming media.
500 AI Machine learning Deep learning Computer vision NLP Projects with code
Interactive deep learning book with multi-framework code, math, and discussions. Adopted at 500 universities from 70 countries including Stanford, MIT, Harvard, and Cambridge.
OpenPose: Real-time multi-person keypoint detection library for body, face, hands, and foot estimation
《明日方舟》小助手,全日常一键长草!| A one-click tool for the daily tasks of Arknights, supporting all clients.
Explore a collection of resources and projects in Computer Science, covering algorithms, data structures, programming languages, and emerging technologies. Ideal for learners and enthusiasts looking t...
Material Anything: Generating Materials for Any 3D Object via Diffusion
[TIP2024] MWFormer: Multi-Weather Image Restoration Using Degradation-Aware Transformers
Official Implementation of the paper: "A Distractor-Aware Memory for Visual Object Tracking with SAM2"
[NeurIPS2024] Multiview Scene Graph (topologically representing a scene from unposed images by interconnected place and object nodes)
This repository is the official code for ResEmoteNet. The project is written in Python using PyTorch in MacBook Pro (M2 Pro 10-core CPU and 16-core GPU).
an inference lib for image/video restoration with VapourSynth support
[CoRL 24] GenDP: 3D Semantic Fields for Category-Level Generalizable Diffusion Policy
Pixeltable — AI Data infrastructure providing a declarative, incremental approach for multimodal workloads.
A paper list for Robotics / Embodied AI - Tianxing Chen
🔥 Aurora Series: A more efficient multimodal large language model series for video.
one API to get all user desktop data (local, cross platform, 24/7, screen, voice, keyboard, mouse, camera recording). sandboxed js plugin system. keyboard and mouse control
Cambrian-1 is a family of multimodal LLMs with a vision-centric design.
One-step image-to-image with Stable Diffusion turbo: sketch2image, day2night, and more
Desktop app for automatically translating comics - BDs, Manga, Manhwa, Fumetti and more in a variety of formats (Image, Pdf, Epub, cbr, cbz, etc) and in multiple languages.
MixTeX multimodal LaTeX, ZhEn, and, Table OCR. It performs efficient CPU-based inference in a local offline on Windows.
This project extends the idea of the innovative architecture of Kolmogorov-Arnold Networks (KAN) to the Convolutional Layers, changing the classic linear transformation of the convolution to learnable...
[ECCV 2024 - Oral] ACE0 is a learning-based structure-from-motion approach that estimates camera parameters of sets of images by learning a multi-view consistent, implicit scene representation.
This repository is a curated collection of the most exciting and influential CVPR 2024 papers. 🔥 [Paper + Code + Demo]
A curated list of 3D Vision papers relating to Robotics domain in the era of large models i.e. LLMs/VLMs, inspired by awesome-computer-vision, including papers, codes, and related websites
AI Productivity Tool - Free and open-source, enhancing user productivity while ensuring privacy and data security. It provides efficient and convenient AI solutions, including but not limited to: buil...
CityGaussian Series for High-quality Large-Scale Scene Reconstruction with Gaussians
[CVPR 2024 Highlight] "MIGC: Multi-Instance Generation Controller for Text-to-Image Synthesis" (Official Implementation)
one API to get all user desktop data (local, cross platform, 24/7, screen, voice, keyboard, mouse, camera recording). sandboxed js plugin system. keyboard and mouse control
500 AI Machine learning Deep learning Computer vision NLP Projects with code
Original reference implementation of "3D Gaussian Splatting for Real-Time Radiance Field Rendering"
Implementation of Vision Transformer, a simple way to achieve SOTA in vision classification with only a single transformer encoder, in Pytorch
Label Studio is a multi-type data labeling and annotation tool with standardized output format
Interactive deep learning book with multi-framework code, math, and discussions. Adopted at 500 universities from 70 countries including Stanford, MIT, Harvard, and Cambridge.
Cross-platform, customizable ML solutions for live and streaming media.
《明日方舟》小助手,全日常一键长草!| A one-click tool for the daily tasks of Arknights, supporting all clients.
Visualize streams of multimodal data. Free, fast, easy to use, and simple to integrate. Built in Rust.
Cambrian-1 is a family of multimodal LLMs with a vision-centric design.
Desktop app for automatically translating comics - BDs, Manga, Manhwa, Fumetti and more in a variety of formats (Image, Pdf, Epub, cbr, cbz, etc) and in multiple languages.
Superfast AI decision making and intelligent processing of multi-modal data.
Gaussian-SLAM: Photo-realistic Dense SLAM with Gaussian Splatting
[CVPR 2024 Highlight] "MIGC: Multi-Instance Generation Controller for Text-to-Image Synthesis" (Official Implementation)
Official implementation of Posterior-Mean Rectified Flow: Towards Minimum MSE Photo-Realistic Image Restoration
A curated list of 3D Vision papers relating to Robotics domain in the era of large models i.e. LLMs/VLMs, inspired by awesome-computer-vision, including papers, codes, and related websites
[NeurIPS 2024] PointMamba: A Simple State Space Model for Point Cloud Analysis
This repository is a curated collection of the most exciting and influential CVPR 2024 papers. 🔥 [Paper + Code + Demo]
YoloDotNet - A C# .NET 8.0 project for Classification, Object Detection, OBB Detection, Segmentation and Pose Estimation in both images and videos.
A curated list of data science & AI guided projects to start building your portfolio
Open-source evaluation toolkit of large vision-language models (LVLMs), support 160+ VLMs, 50+ benchmarks
This project is dedicated to the implementation and research of Kolmogorov-Arnold convolutional networks. The repository includes implementations of 1D, 2D, and 3D convolutions with different kernels...
Official repo for VGGHeads: 3D Multi Head Alignment with a Large-Scale Synthetic Dataset..
[CVPR 2024 Highlight] Official repository for paper "SIFU: Side-view Conditioned Implicit Function for Real-world Usable Clothed Human Reconstruction"