Trending repositories for topic computer-vision
rewind.ai x cursor.com = your AI assistant that has all the context. 24/7 screen & voice recording for the age of super intelligence. get your data ready or be left behind
Implementation of Vision Transformer, a simple way to achieve SOTA in vision classification with only a single transformer encoder, in Pytorch
Original reference implementation of "3D Gaussian Splatting for Real-Time Radiance Field Rendering"
Label Studio is a multi-type data labeling and annotation tool with standardized output format
Interactive deep learning book with multi-framework code, math, and discussions. Adopted at 500 universities from 70 countries including Stanford, MIT, Harvard, and Cambridge.
Visualize streams of multimodal data. Free, fast, easy to use, and simple to integrate. Built in Rust.
Cross-platform, customizable ML solutions for live and streaming media.
Official Implementation of OCR-free Document Understanding Transformer (Donut) and Synthetic Document Generator (SynthDoG), ECCV 2022
Open-source evaluation toolkit of large vision-language models (LVLMs), support 160+ VLMs, 50+ benchmarks
OpenPose: Real-time multi-person keypoint detection library for body, face, hands, and foot estimation
This is a warehouse for MobileNetV4-Pytorch-model, can be used to train your image-datasets for vision tasks.
[ECCV 2024] Monocular Occupancy Prediction for Scalable Indoor Scenes
Mamba in Vision: A Comprehensive Survey of Techniques and Applications
:fire: [CVPR 2024] Color Shift Estimation-and-Correction for Image Enhancement
This repository contains resources in the form of ebooks, which are related to Data Science, Machine Learning, and similar topics.
Robust Video Matting for high-fidelity human segmentation in Unity Engine.
Evaluating and reproducing real-world robot manipulation policies (e.g., RT-1, RT-1-X, Octo) in simulation under common setups (e.g., Google Robot, WidowX+Bridge) (CoRL 2024)
A Collection of Low Level Vision Research Groups
Motion capture for the character models of Honkai: Star Rail base on Unity and MediaPipe. Currently face only. (Do not need an iPhone)
[ECCV 2024] UMERegRobust - Universal Manifold Embedding Compatible Features for Robust Point Cloud Registration
SuperSLAM: Open Source Framework for Deep Learning based Visual SLAM (Work in Progress)
[NeurIPS2024] Multiview Scene Graph (topologically representing a scene from unposed images by interconnected place and object nodes)
Open-source evaluation toolkit of large vision-language models (LVLMs), support 160+ VLMs, 50+ benchmarks
MultiCorrupt: A benchmark for robust multi-modal 3D object detection, evaluating LiDAR-Camera fusion models in autonomous driving. Includes diverse corruption types (e.g., misalignment, miscalibration...
Gradio UI for running Meta AI's Segment Anything on own hardware. Promptable segmentation via keypoints and bounding boxes.
Notebooks and Code about Generative Ai, LLMs, MLOPS, NLP , CV and Graph databases
rewind.ai x cursor.com = your AI assistant that has all the context. 24/7 screen & voice recording for the age of super intelligence. get your data ready or be left behind
Implementation of Vision Transformer, a simple way to achieve SOTA in vision classification with only a single transformer encoder, in Pytorch
Original reference implementation of "3D Gaussian Splatting for Real-Time Radiance Field Rendering"
Label Studio is a multi-type data labeling and annotation tool with standardized output format
500 AI Machine learning Deep learning Computer vision NLP Projects with code
Cross-platform, customizable ML solutions for live and streaming media.
Visualize streams of multimodal data. Free, fast, easy to use, and simple to integrate. Built in Rust.
《明日方舟》小助手,全日常一键长草!| A one-click tool for the daily tasks of Arknights, supporting all clients.
Interactive deep learning book with multi-framework code, math, and discussions. Adopted at 500 universities from 70 countries including Stanford, MIT, Harvard, and Cambridge.
Open-source evaluation toolkit of large vision-language models (LVLMs), support 160+ VLMs, 50+ benchmarks
[NeurIPS2024] Multiview Scene Graph (topologically representing a scene from unposed images by interconnected place and object nodes)
an inference lib for image/video restoration with VapourSynth support
[CoRL 24] GenDP: 3D Semantic Fields for Category-Level Generalizable Diffusion Policy
MobileNet for Image Classification
:fire: [CVPR 2024] Color Shift Estimation-and-Correction for Image Enhancement
Multimodal Brain mpMRI segmentation on BraTS 2023 and BraTS 2021 datasets.
A camera ISP (image signal processor) pipeline that contains modules with simple to complex algorithms implemented at the application level.
This is a warehouse for MobileNetV4-Pytorch-model, can be used to train your image-datasets for vision tasks.
Mamba in Vision: A Comprehensive Survey of Techniques and Applications
Symbolic Continuous-Time Gaussian Belief Propagation Framework with Ceres Interoperability
An all-weather, day-and-night, collision avoidance simulator that can be implemented as a digital twin for the autonomous COLREG-compliant navigation of maritime vessels.
Resources Related to Event-based Vision | Event Cameras | DVS
[ECCV 2024] UMERegRobust - Universal Manifold Embedding Compatible Features for Robust Point Cloud Registration
Project Code for the paper "Learning Visual Locomotion with Cross-Modal Supervision" (ICRA2023)
Deploying Android application for object detection
[arXiv 2024] This is the official implementation of paper "Robots Pre-Train Robots: Manipulation-Centric Robotic Representation from Large-Scale Robot Datasets".
rewind.ai x cursor.com = your AI assistant that has all the context. 24/7 screen & voice recording for the age of super intelligence. get your data ready or be left behind
Original reference implementation of "3D Gaussian Splatting for Real-Time Radiance Field Rendering"
Implementation of Vision Transformer, a simple way to achieve SOTA in vision classification with only a single transformer encoder, in Pytorch
Label Studio is a multi-type data labeling and annotation tool with standardized output format
500 AI Machine learning Deep learning Computer vision NLP Projects with code
Interactive deep learning book with multi-framework code, math, and discussions. Adopted at 500 universities from 70 countries including Stanford, MIT, Harvard, and Cambridge.
Cross-platform, customizable ML solutions for live and streaming media.
《明日方舟》小助手,全日常一键长草!| A one-click tool for the daily tasks of Arknights, supporting all clients.
Annotate better with CVAT, the industry-leading data engine for machine learning. Used and trusted by teams at any scale, for data of any scale.
OpenPose: Real-time multi-person keypoint detection library for body, face, hands, and foot estimation
Training YOLO5 model with custom data
Deploying Android application for object detection
Deploying Android application for image classification
MobileNet for Image Classification
[arXiv 2024] This is the official implementation of paper "Robots Pre-Train Robots: Manipulation-Centric Robotic Representation from Large-Scale Robot Datasets".
MACVO: Metrics-aware Covariance for Learning-based Stereo Visual Odometry
[NeurIPS 2024] NaRCan: Natural Refined Canonical Image with Integration of Diffusion Prior for Video Editing
Offical implementation of the paper "MagicTailor: Component-Controllable Personalization in Text-to-Image Diffusion Models".
Mamba in Vision: A Comprehensive Survey of Techniques and Applications
[CoRL 24] GenDP: 3D Semantic Fields for Category-Level Generalizable Diffusion Policy
ECCV24 - Improving Point-based Crowd Counting and Localization Based on Auxiliary Point Guidance
an OCR tool to translate Old Persian cuneiform (Achaemenid language)
A camera ISP (image signal processor) pipeline that contains modules with simple to complex algorithms implemented at the application level.
rewind.ai x cursor.com = your AI assistant that has all the context. 24/7 screen & voice recording for the age of super intelligence. get your data ready or be left behind
Cambrian-1 is a family of multimodal LLMs with a vision-centric design.
One-step image-to-image with Stable Diffusion turbo: sketch2image, day2night, and more
SplaTAM: Splat, Track & Map 3D Gaussians for Dense RGB-D SLAM (CVPR 2024)
Open-source evaluation toolkit of large vision-language models (LVLMs), support 160+ VLMs, 50+ benchmarks
Desktop app for automatically translating comics - BDs, Manga, Manhwa, Fumetti and more in a variety of formats (Image, Pdf, Epub, cbr, cbz, etc) and in multiple languages.
Gaussian-SLAM: Photo-realistic Dense SLAM with Gaussian Splatting
MixTeX multimodal LaTeX, ZhEn, and, Table OCR. It performs efficient CPU-based inference in a local offline on Windows.
This project extends the idea of the innovative architecture of Kolmogorov-Arnold Networks (KAN) to the Convolutional Layers, changing the classic linear transformation of the convolution to learnable...
This repository is a curated collection of the most exciting and influential CVPR 2024 papers. 🔥 [Paper + Code + Demo]
[ECCV 2024 - Oral] ACE0 is a learning-based structure-from-motion approach that estimates camera parameters of sets of images by learning a multi-view consistent, implicit scene representation.
The official implementation of SAGA (Segment Any 3D GAussians)
A Unreal Engine 5 (UE5) based plugin aiming to provide real-time visulization, management, editing, and scalable hybrid rendering of Guassian Splatting model.
rewind.ai x cursor.com = your AI assistant that has all the context. 24/7 screen & voice recording for the age of super intelligence. get your data ready or be left behind
Original reference implementation of "3D Gaussian Splatting for Real-Time Radiance Field Rendering"
500 AI Machine learning Deep learning Computer vision NLP Projects with code
Implementation of Vision Transformer, a simple way to achieve SOTA in vision classification with only a single transformer encoder, in Pytorch
Label Studio is a multi-type data labeling and annotation tool with standardized output format
Interactive deep learning book with multi-framework code, math, and discussions. Adopted at 500 universities from 70 countries including Stanford, MIT, Harvard, and Cambridge.
《明日方舟》小助手,全日常一键长草!| A one-click tool for the daily tasks of Arknights, supporting all clients.
Cross-platform, customizable ML solutions for live and streaming media.
Visualize streams of multimodal data. Free, fast, easy to use, and simple to integrate. Built in Rust.
OpenPose: Real-time multi-person keypoint detection library for body, face, hands, and foot estimation
Cambrian-1 is a family of multimodal LLMs with a vision-centric design.
Desktop app for automatically translating comics - BDs, Manga, Manhwa, Fumetti and more in a variety of formats (Image, Pdf, Epub, cbr, cbz, etc) and in multiple languages.
Superfast AI decision making and intelligent processing of multi-modal data.
Gaussian-SLAM: Photo-realistic Dense SLAM with Gaussian Splatting
[CVPR 2024 Highlight] "MIGC: Multi-Instance Generation Controller for Text-to-Image Synthesis" (Official Implementation)
Official implementation of Posterior-Mean Rectified Flow: Towards Minimum MSE Photo-Realistic Image Restoration
Open-source evaluation toolkit of large vision-language models (LVLMs), support 160+ VLMs, 50+ benchmarks
This repo contains evaluation code for the paper "MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI"
A curated list of 3D Vision papers relating to Robotics domain in the era of large models i.e. LLMs/VLMs, inspired by awesome-computer-vision, including papers, codes, and related websites
[NeurIPS 2024] PointMamba: A Simple State Space Model for Point Cloud Analysis
A curated list of data science & AI guided projects to start building your portfolio
This repository is a curated collection of the most exciting and influential CVPR 2024 papers. 🔥 [Paper + Code + Demo]
[CVPR 2024 Highlight] Official repository for paper "SIFU: Side-view Conditioned Implicit Function for Real-world Usable Clothed Human Reconstruction"
DiffSeg is an unsupervised zero-shot segmentation method using attention information from a stable-diffusion model. This repo implements the main DiffSeg algorithm and additionally includes an experim...
This project is dedicated to the implementation and research of Kolmogorov-Arnold convolutional networks. The repository includes implementations of 1D, 2D, and 3D convolutions with different kernels...
Official repo for VGGHeads: A Large-Scale Synthetic Dataset for 3D Human Heads.
[CVPR 2024✨Highlight] Official repository for HOLD, the first method that jointly reconstructs articulated hands and objects from monocular videos without assuming a pre-scanned object template and 3D...