Trending repositories for topic computer-vision
Label Studio is a multi-type data labeling and annotation tool with standardized output format
Interactive deep learning book with multi-framework code, math, and discussions. Adopted at 500 universities from 70 countries including Stanford, MIT, Harvard, and Cambridge.
rewind.ai x cursor.com = your AI assistant that has all the context
Drench yourself in Deep Learning, Reinforcement Learning, Machine Learning, Computer Vision, and NLP by learning from these exciting lectures!!
Original reference implementation of "3D Gaussian Splatting for Real-Time Radiance Field Rendering"
500 AI Machine learning Deep learning Computer vision NLP Projects with code
CityGaussian Series for High-quality Large-Scale Scene Reconstruction with Gaussians
Implementation of Vision Transformer, a simple way to achieve SOTA in vision classification with only a single transformer encoder, in Pytorch
OpenPose: Real-time multi-person keypoint detection library for body, face, hands, and foot estimation
Cross-platform, customizable ML solutions for live and streaming media.
《明日方舟》小助手,全日常一键长草!| A one-click tool for the daily tasks of Arknights, supporting all clients.
Annotate better with CVAT, the industry-leading data engine for machine learning. Used and trusted by teams at any scale, for data of any scale.
MACVO: Metrics-aware Covariance for Learning-based Stereo Visual Odometry
Training YOLO5 model with custom data
Deploying Android application for image classification
[arXiv 2024] This is the official implementation of paper "Robots Pre-Train Robots: Manipulation-Centric Robotic Representation from Large-Scale Robot Datasets".
Offical implementation of the paper "MagicTailor: Component-Controllable Personalization in Text-to-Image Diffusion Models".
Mamba in Vision: A Comprehensive Survey of Techniques and Applications
CityGaussian Series for High-quality Large-Scale Scene Reconstruction with Gaussians
[CoRL 24 Oral] D^3Fields: Dynamic 3D Descriptor Fields for Zero-Shot Generalizable Rearrangement
Official implementation of "Align and Distill: Unifying and Improving Domain Adaptive Object Detection"
[ECCV 2024] This is the official code for the paper "Fast Context-Based Low-Light Image Enhancement via Neural Implicit Representations"
STB-VMM: Swin Transformer Based Video Motion Magnification (official repository)
[ECCV 2024] UMERegRobust - Universal Manifold Embedding Compatible Features for Robust Point Cloud Registration
Mask R-CNN for object detection and instance segmentation on Keras and TensorFlow 2.14.0 and Python 3.10.12
👀 Apply YOLOv8 exported with ONNX or TensorRT(FP16, INT8) to the Real-time camera
Demonstrates Voice Recognition, Text to Speech, Language Translation, OAuth2, Image Generation, Face Detection and Voice Chatbot. Source code and Documentation for my 2023 ADUG Symposium Talk.
[ICIP'24 Lecture Presentation] Official implementation of "CST-YOLO: A Novel Method for Blood Cell Detection Based on Improved YOLOv7 and CNN-Swin Transformer".
Label Studio is a multi-type data labeling and annotation tool with standardized output format
Interactive deep learning book with multi-framework code, math, and discussions. Adopted at 500 universities from 70 countries including Stanford, MIT, Harvard, and Cambridge.
rewind.ai x cursor.com = your AI assistant that has all the context
Drench yourself in Deep Learning, Reinforcement Learning, Machine Learning, Computer Vision, and NLP by learning from these exciting lectures!!
Original reference implementation of "3D Gaussian Splatting for Real-Time Radiance Field Rendering"
CityGaussian Series for High-quality Large-Scale Scene Reconstruction with Gaussians
Implementation of Vision Transformer, a simple way to achieve SOTA in vision classification with only a single transformer encoder, in Pytorch
OpenPose: Real-time multi-person keypoint detection library for body, face, hands, and foot estimation
[arXiv 2024] This is the official implementation of paper "Robots Pre-Train Robots: Manipulation-Centric Robotic Representation from Large-Scale Robot Datasets".
Deploying Android application for object detection
Deploying Android application for image classification
Training YOLO5 model with custom data
[arXiv 2024] This is the official implementation of paper "Robots Pre-Train Robots: Manipulation-Centric Robotic Representation from Large-Scale Robot Datasets".
Deploying Android application for object detection
MACVO: Metrics-aware Covariance for Learning-based Stereo Visual Odometry
Deploying Android application for image classification
Training YOLO5 model with custom data
Offical implementation of the paper "MagicTailor: Component-Controllable Personalization in Text-to-Image Diffusion Models".
Mamba in Vision: A Comprehensive Survey of Techniques and Applications
CityGaussian Series for High-quality Large-Scale Scene Reconstruction with Gaussians
[CoRL 24 Oral] D^3Fields: Dynamic 3D Descriptor Fields for Zero-Shot Generalizable Rearrangement
Official implementation of "Align and Distill: Unifying and Improving Domain Adaptive Object Detection"
[CoRL 24] GenDP: 3D Semantic Fields for Category-Level Generalizable Diffusion Policy
[ECCV 2024] This is the official code for the paper "Fast Context-Based Low-Light Image Enhancement via Neural Implicit Representations"
STB-VMM: Swin Transformer Based Video Motion Magnification (official repository)
[ECCV 2024] UMERegRobust - Universal Manifold Embedding Compatible Features for Robust Point Cloud Registration
Mask R-CNN for object detection and instance segmentation on Keras and TensorFlow 2.14.0 and Python 3.10.12
👀 Apply YOLOv8 exported with ONNX or TensorRT(FP16, INT8) to the Real-time camera
Offical implementation of the paper "MagicTailor: Component-Controllable Personalization in Text-to-Image Diffusion Models".
Deploying Android application for object detection
[arXiv 2024] This is the official implementation of paper "Robots Pre-Train Robots: Manipulation-Centric Robotic Representation from Large-Scale Robot Datasets".
Deploying Android application for image classification
rewind.ai x cursor.com = your AI assistant that has all the context
Label Studio is a multi-type data labeling and annotation tool with standardized output format
Original reference implementation of "3D Gaussian Splatting for Real-Time Radiance Field Rendering"
500 AI Machine learning Deep learning Computer vision NLP Projects with code
Implementation of Vision Transformer, a simple way to achieve SOTA in vision classification with only a single transformer encoder, in Pytorch
Cross-platform, customizable ML solutions for live and streaming media.
Interactive deep learning book with multi-framework code, math, and discussions. Adopted at 500 universities from 70 countries including Stanford, MIT, Harvard, and Cambridge.
Official implementation of Posterior-Mean Rectified Flow: Towards Minimum MSE Photo-Realistic Image Restoration
《明日方舟》小助手,全日常一键长草!| A one-click tool for the daily tasks of Arknights, supporting all clients.
Examples and tutorials on using SOTA computer vision models and techniques. Learn everything from old-school ResNet, through YOLO and object-detection transformers like DETR, to the latest models like...
Mamba in Vision: A Comprehensive Survey of Techniques and Applications
A comprehensive tool for processing and analyzing video footage, producing detailed insights into gameplay and player performance enhancing game understanding and performance evaluation.
MACVO: Metrics-aware Covariance for Learning-based Stereo Visual Odometry
Offical implementation of the paper "MagicTailor: Component-Controllable Personalization in Text-to-Image Diffusion Models".
[NeurIPS 2024] NaRCan: Natural Refined Canonical Image with Integration of Diffusion Prior for Video Editing
Training YOLO5 model with custom data
A toolkit for quantitative evaluation of data attribution methods.
Deploying Android application for object detection
Official implementation of Posterior-Mean Rectified Flow: Towards Minimum MSE Photo-Realistic Image Restoration
🔥 Aurora Series: A more efficient multimodal large language model series for video.
Official code base for paper EZIGen: Enhancing zero-shot subject-driven image generation with precise subject encoding and decoupled guidance
[arXiv 2024] This is the official implementation of paper "Robots Pre-Train Robots: Manipulation-Centric Robotic Representation from Large-Scale Robot Datasets".
[CoRL 24] GenDP: 3D Semantic Fields for Category-Level Generalizable Diffusion Policy
ML Nexus is an open-source collection of machine learning projects, covering topics like neural networks, computer vision, and NLP. Whether you're a beginner or expert, contribute, collaborate, and gr...
Gaussian Haircut: Human Hair Reconstruction with Strand-Aligned 3D Gaussians
YoloDotNet - A C# .NET 8.0 project for Classification, Object Detection, OBB Detection, Segmentation and Pose Estimation in both images and videos.
Cambrian-1 is a family of multimodal LLMs with a vision-centric design.
Must-have resource for anyone who wants to experiment with and build on the OpenAI vision API 🔥
One-step image-to-image with Stable Diffusion turbo: sketch2image, day2night, and more
SplaTAM: Splat, Track & Map 3D Gaussians for Dense RGB-D SLAM (CVPR 2024)
Open-source evaluation toolkit of large vision-language models (LVLMs), support 160+ VLMs, 50+ benchmarks
Desktop app for automatically translating comics - BDs, Manga, Manhwa, Fumetti and more in a variety of formats (Image, Pdf, Epub, cbr, cbz, etc) and in multiple languages.
Gaussian-SLAM: Photo-realistic Dense SLAM with Gaussian Splatting
MixTeX multimodal LaTeX, ZhEn, and, Table OCR. It performs efficient CPU-based inference in a local offline on Windows.
This project extends the idea of the innovative architecture of Kolmogorov-Arnold Networks (KAN) to the Convolutional Layers, changing the classic linear transformation of the convolution to learnable...
This repository is a curated collection of the most exciting and influential CVPR 2024 papers. 🔥 [Paper + Code + Demo]
[ECCV 2024 - Oral] ACE0 is a learning-based structure-from-motion approach that estimates camera parameters of sets of images by learning a multi-view consistent, implicit scene representation.
The official implementation of SAGA (Segment Any 3D GAussians)
rewind.ai x cursor.com = your AI assistant that has all the context
Original reference implementation of "3D Gaussian Splatting for Real-Time Radiance Field Rendering"
500 AI Machine learning Deep learning Computer vision NLP Projects with code
Implementation of Vision Transformer, a simple way to achieve SOTA in vision classification with only a single transformer encoder, in Pytorch
Label Studio is a multi-type data labeling and annotation tool with standardized output format
Interactive deep learning book with multi-framework code, math, and discussions. Adopted at 500 universities from 70 countries including Stanford, MIT, Harvard, and Cambridge.
《明日方舟》小助手,全日常一键长草!| A one-click tool for the daily tasks of Arknights, supporting all clients.
Cross-platform, customizable ML solutions for live and streaming media.
Visualize streams of multimodal data. Free, fast, easy to use, and simple to integrate. Built in Rust.
OpenPose: Real-time multi-person keypoint detection library for body, face, hands, and foot estimation
Cambrian-1 is a family of multimodal LLMs with a vision-centric design.
Desktop app for automatically translating comics - BDs, Manga, Manhwa, Fumetti and more in a variety of formats (Image, Pdf, Epub, cbr, cbz, etc) and in multiple languages.
Gaussian-SLAM: Photo-realistic Dense SLAM with Gaussian Splatting
Must-have resource for anyone who wants to experiment with and build on the OpenAI vision API 🔥
[CVPR 2024 Highlight] "MIGC: Multi-Instance Generation Controller for Text-to-Image Synthesis" (Official Implementation)
Official implementation of Posterior-Mean Rectified Flow: Towards Minimum MSE Photo-Realistic Image Restoration
This repo contains evaluation code for the paper "MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI"
Open-source evaluation toolkit of large vision-language models (LVLMs), support 160+ VLMs, 50+ benchmarks
A curated list of 3D Vision papers relating to Robotics domain in the era of large models i.e. LLMs/VLMs, inspired by awesome-computer-vision, including papers, codes, and related websites
[NeurIPS 2024] PointMamba: A Simple State Space Model for Point Cloud Analysis
A curated list of data science & AI guided projects to start building your portfolio
This repository is a curated collection of the most exciting and influential CVPR 2024 papers. 🔥 [Paper + Code + Demo]
[CVPR 2024 Highlight] Official repository for paper "SIFU: Side-view Conditioned Implicit Function for Real-world Usable Clothed Human Reconstruction"
DiffSeg is an unsupervised zero-shot segmentation method using attention information from a stable-diffusion model. This repo implements the main DiffSeg algorithm and additionally includes an experim...
This project is dedicated to the implementation and research of Kolmogorov-Arnold convolutional networks. The repository includes implementations of 1D, 2D, and 3D convolutions with different kernels...
Official repo for VGGHeads: A Large-Scale Synthetic Dataset for 3D Human Heads.