Trending repositories for topic computer-vision
RobustSAM: Segment Anything Robustly on Degraded Images (CVPR 2024 Highlight)
Official Code for ECCV 2024 paper — One-Shot Diffusion Mimicker for Handwritten Text Generation
Interactive deep learning book with multi-framework code, math, and discussions. Adopted at 500 universities from 70 countries including Stanford, MIT, Harvard, and Cambridge.
Original reference implementation of "3D Gaussian Splatting for Real-Time Radiance Field Rendering"
Implementation of Vision Transformer, a simple way to achieve SOTA in vision classification with only a single transformer encoder, in Pytorch
Library to build personalized AI powered by what you've seen, said, or heard. Works with Ollama. Alternative to Rewind.ai. Open. Secure. You own your data. Rust.
500 AI Machine learning Deep learning Computer vision NLP Projects with code
OpenPose: Real-time multi-person keypoint detection library for body, face, hands, and foot estimation
《明日方舟》小助手,全日常一键长草!| A one-click tool for the daily tasks of Arknights, supporting all clients.
Label Studio is a multi-type data labeling and annotation tool with standardized output format
OpenVINO™ is an open-source toolkit for optimizing and deploying AI inference
Cross-platform, customizable ML solutions for live and streaming media.
A toolkit for making real world machine learning and data analysis applications in C++
Official Code for ECCV 2024 paper — One-Shot Diffusion Mimicker for Handwritten Text Generation
RobustSAM: Segment Anything Robustly on Degraded Images (CVPR 2024 Highlight)
Code for our paper "VisionTS: Visual Masked Autoencoders Are Free-Lunch Zero-Shot Time Series Forecasters".
The best collection of AI tutorials to make you a boss of Data Science!
Welcome to the "Top 100 Computer Vision Projects Idea for 2024" repository! This repository contains a curated list of computer vision project ideas that you can explore, implement, and experiment wit...
[ECCV 2024] UMERegRobust - Universal Manifold Embedding Compatible Features for Robust Point Cloud Registration
[CVPR'24] DuPL: Dual Student with Trustworthy Progressive Learning for Robust Weakly Supervised Semantic Segmentation.
A user-friendly library for reproducible video moment retrieval and highlight detection.
:zap: Cloud-native, AI-powered, document processing pipelines on AWS.
🚗 VehicleDetectionTracker: Real-time vehicle detection and tracking powered by YOLO. 🚙🚕 Enhance your computer vision projects with speed, precision, and adaptability.
Multimodal Brain mpMRI segmentation on BraTS 2023 and BraTS 2021 datasets.
PyTorch implementation of the YOLOv1 architecture presented in "You Only Look Once: Unified, Real-Time Object Detection" by Joseph Redmon, Santosh Divvala, Ross Girshick, Ali Farhadi
Library to build personalized AI powered by what you've seen, said, or heard. Works with Ollama. Alternative to Rewind.ai. Open. Secure. You own your data. Rust.
Dlib compiled binary (.whl) for Python 3.7-3.12 and Windows x64
A curated list of awesome self-supervised learning methods in videos
Minimal code and examnples for inferencing Sapiens foundation human models in Pytorch
Python scripts for the Segment Anythin 2 (SAM2) model in ONNX
The official PyTorch implementation of the IEEE/CVF International Conference on Computer Vision (ICCV) '23 paper Multimodal Motion Conditioned Diffusion Model for Skeleton-based Video Anomaly Detectio...
Tello drone object tracking using object detection (YOLO) and reinforcement learning (DDPG)
《明日方舟》小助手,全日常一键长草!| A one-click tool for the daily tasks of Arknights, supporting all clients.
Original reference implementation of "3D Gaussian Splatting for Real-Time Radiance Field Rendering"
Label Studio is a multi-type data labeling and annotation tool with standardized output format
Library to build personalized AI powered by what you've seen, said, or heard. Works with Ollama. Alternative to Rewind.ai. Open. Secure. You own your data. Rust.
500 AI Machine learning Deep learning Computer vision NLP Projects with code
TorchGeo: datasets, samplers, transforms, and pre-trained models for geospatial data
Implementation of Vision Transformer, a simple way to achieve SOTA in vision classification with only a single transformer encoder, in Pytorch
RobustSAM: Segment Anything Robustly on Degraded Images (CVPR 2024 Highlight)
Interactive deep learning book with multi-framework code, math, and discussions. Adopted at 500 universities from 70 countries including Stanford, MIT, Harvard, and Cambridge.
Cross-platform, customizable ML solutions for live and streaming media.
OpenPose: Real-time multi-person keypoint detection library for body, face, hands, and foot estimation
Official Code for ECCV 2024 paper — One-Shot Diffusion Mimicker for Handwritten Text Generation
Official Code for ECCV 2024 paper — One-Shot Diffusion Mimicker for Handwritten Text Generation
Code for our paper "VisionTS: Visual Masked Autoencoders Are Free-Lunch Zero-Shot Time Series Forecasters".
RobustSAM: Segment Anything Robustly on Degraded Images (CVPR 2024 Highlight)
The best collection of AI tutorials to make you a boss of Data Science!
[ECCV 2024] UMERegRobust - Universal Manifold Embedding Compatible Features for Robust Point Cloud Registration
Minimal code and examnples for inferencing Sapiens foundation human models in Pytorch
Welcome to the "Top 100 Computer Vision Projects Idea for 2024" repository! This repository contains a curated list of computer vision project ideas that you can explore, implement, and experiment wit...
A user-friendly library for reproducible video moment retrieval and highlight detection.
[ECCV 2024] GaussCtrl: Multi-View Consistent Text-Driven 3D Gaussian Splatting Editing
An end-to-end solution to digitize piping and instrument diagrams using Azure Services including Azure Machine Learning and AKS.
A curated list of papers on the applications of RWKV in computer vision.
Python scripts for the Segment Anythin 2 (SAM2) model in ONNX
Library to build personalized AI powered by what you've seen, said, or heard. Works with Ollama. Alternative to Rewind.ai. Open. Secure. You own your data. Rust.
[ECCV24] Keypoint Promptable Re-Identification: SOTA ReID method robust to occlusions and multi-person ambiguity
Official repo for Recursion's accepted spotlight paper at NeurIPS 2023 Generative AI & Biology workshop.
Quality-Aware Image-Text Alignment for Real-World Image Quality Assessment
🚗 VehicleDetectionTracker: Real-time vehicle detection and tracking powered by YOLO. 🚙🚕 Enhance your computer vision projects with speed, precision, and adaptability.
A curated list of 3D Vision papers relating to Robotics domain in the era of large models i.e. LLMs/VLMs, inspired by awesome-computer-vision, including papers, codes, and related websites
Code for our paper "VisionTS: Visual Masked Autoencoders Are Free-Lunch Zero-Shot Time Series Forecasters".
Minimal code and examnples for inferencing Sapiens foundation human models in Pytorch
Survey: A collection of AWESOME papers and resources on the latest research in Mixture of Experts.
A curated list of 3D Vision papers relating to Robotics domain in the era of large models i.e. LLMs/VLMs, inspired by awesome-computer-vision, including papers, codes, and related websites
Library to build personalized AI powered by what you've seen, said, or heard. Works with Ollama. Alternative to Rewind.ai. Open. Secure. You own your data. Rust.
《明日方舟》小助手,全日常一键长草!| A one-click tool for the daily tasks of Arknights, supporting all clients.
MixTeX multimodal LaTeX, ZhEn, and, Table OCR. It performs efficient CPU-based inference in a local offline on Windows.
Original reference implementation of "3D Gaussian Splatting for Real-Time Radiance Field Rendering"
Label Studio is a multi-type data labeling and annotation tool with standardized output format
Implementation of Vision Transformer, a simple way to achieve SOTA in vision classification with only a single transformer encoder, in Pytorch
Interactive deep learning book with multi-framework code, math, and discussions. Adopted at 500 universities from 70 countries including Stanford, MIT, Harvard, and Cambridge.
Visualize streams of multimodal data. Fast, easy to use, and simple to integrate. Built in Rust using egui.
500 AI Machine learning Deep learning Computer vision NLP Projects with code
Cross-platform, customizable ML solutions for live and streaming media.
Search over large image datasets with natural language and computer vision!
A curated list of 3D Vision papers relating to Robotics domain in the era of large models i.e. LLMs/VLMs, inspired by awesome-computer-vision, including papers, codes, and related websites
Official Code for ECCV 2024 paper — One-Shot Diffusion Mimicker for Handwritten Text Generation
Official repo for VGGHeads: A Large-Scale Synthetic Dataset for 3D Human Heads.
Minimal code and examnples for inferencing Sapiens foundation human models in Pytorch
MixTeX multimodal LaTeX, ZhEn, and, Table OCR. It performs efficient CPU-based inference in a local offline on Windows.
A visual sudoku solver which runs in the web browser. Built using OpenCV and Tensorflow to identify the sudoku grid, recognise digits, and overlay the solution.
[NeurIPS 2022] Official Code for REVIVE: Regional Visual Representation Matters in Knowledge-Based Visual Question Answering
Resources Related to Event-based Vision | Event Cameras | DVS
A user-friendly library for reproducible video moment retrieval and highlight detection.
Search over large image datasets with natural language and computer vision!
[ECCV24] Keypoint Promptable Re-Identification: SOTA ReID method robust to occlusions and multi-person ambiguity
Hamba: Single-view 3D Hand Reconstruction with Graph-guided Bi-Scanning Mamba
🚗 VehicleDetectionTracker: Real-time vehicle detection and tracking powered by YOLO. 🚙🚕 Enhance your computer vision projects with speed, precision, and adaptability.
Python scripts for the Segment Anythin 2 (SAM2) model in ONNX
Library to build personalized AI powered by what you've seen, said, or heard. Works with Ollama. Alternative to Rewind.ai. Open. Secure. You own your data. Rust.
Welcome to the "Top 100 Computer Vision Projects Idea for 2024" repository! This repository contains a curated list of computer vision project ideas that you can explore, implement, and experiment wit...
Superfast AI decision making and intelligent processing of multi-modal data.
Cambrian-1 is a family of multimodal LLMs with a vision-centric design.
Must-have resource for anyone who wants to experiment with and build on the OpenAI vision API 🔥
SplaTAM: Splat, Track & Map 3D Gaussians for Dense RGB-D SLAM (CVPR 2024)
One-step image-to-image with Stable Diffusion turbo: sketch2image, day2night, and more
Library to build personalized AI powered by what you've seen, said, or heard. Works with Ollama. Alternative to Rewind.ai. Open. Secure. You own your data. Rust.
Open-source evaluation toolkit of large vision-language models (LVLMs), support ~100 VLMs, 40+ benchmarks
Desktop app for automatically translating comics - BDs, Manga, Manhwa, Fumetti and more in a variety of formats (Image, Pdf, Epub, cbr, cbz, etc) and in multiple languages.
Gaussian-SLAM: Photo-realistic Dense SLAM with Gaussian Splatting
🪐 Objaverse-XL is a Universe of 10M+ 3D Objects. Contains API Scripts for Downloading and Processing!
This project extends the idea of the innovative architecture of Kolmogorov-Arnold Networks (KAN) to the Convolutional Layers, changing the classic linear transformation of the convolution to learnable...
[CVPR 2024 Highlight] Putting the Object Back Into Video Object Segmentation
Amica is an open source interface for interactive communication with 3D characters with voice synthesis and speech recognition.
Original reference implementation of "3D Gaussian Splatting for Real-Time Radiance Field Rendering"
500 AI Machine learning Deep learning Computer vision NLP Projects with code
Implementation of Vision Transformer, a simple way to achieve SOTA in vision classification with only a single transformer encoder, in Pytorch
Interactive deep learning book with multi-framework code, math, and discussions. Adopted at 500 universities from 70 countries including Stanford, MIT, Harvard, and Cambridge.
Label Studio is a multi-type data labeling and annotation tool with standardized output format
《明日方舟》小助手,全日常一键长草!| A one-click tool for the daily tasks of Arknights, supporting all clients.
The open-source tool for building high-quality datasets and computer vision models
Cross-platform, customizable ML solutions for live and streaming media.
Visualize streams of multimodal data. Fast, easy to use, and simple to integrate. Built in Rust using egui.
OpenPose: Real-time multi-person keypoint detection library for body, face, hands, and foot estimation
Cambrian-1 is a family of multimodal LLMs with a vision-centric design.
Desktop app for automatically translating comics - BDs, Manga, Manhwa, Fumetti and more in a variety of formats (Image, Pdf, Epub, cbr, cbz, etc) and in multiple languages.
Must-have resource for anyone who wants to experiment with and build on the OpenAI vision API 🔥
Gaussian-SLAM: Photo-realistic Dense SLAM with Gaussian Splatting
[CVPR 2024 Highlight] "MIGC: Multi-Instance Generation Controller for Text-to-Image Synthesis" (Official Implementation)
This repo contains evaluation code for the paper "MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI"
PointMamba: A Simple State Space Model for Point Cloud Analysis
Open-source evaluation toolkit of large vision-language models (LVLMs), support ~100 VLMs, 40+ benchmarks
A curated list of 3D Vision papers relating to Robotics domain in the era of large models i.e. LLMs/VLMs, inspired by awesome-computer-vision, including papers, codes, and related websites
This repository is a curated collection of the most exciting and influential CVPR 2024 papers. 🔥 [Paper + Code + Demo]
[CVPR 2024 Highlight] Official repository for paper "SIFU: Side-view Conditioned Implicit Function for Real-world Usable Clothed Human Reconstruction"
[CVPR 2024] Official repository of "Material Palette: Extraction of Materials from a Single Real-world Image"
DiffSeg is an unsupervised zero-shot segmentation method using attention information from a stable-diffusion model. This repo implements the main DiffSeg algorithm and additionally includes an experim...
A Full Stack ML (Machine Learning) Roadmap involves learning the necessary skills and technologies to become proficient in all aspects of machine learning, including data collection and preprocessing,...
This project is dedicated to the implementation and research of Kolmogorov-Arnold convolutional networks. The repository includes implementations of 1D, 2D, and 3D convolutions with different kernels...
🧙🏻♂️A list of papers curated for you to dive into the Awesome Radiance Field-based 3D Editing.