Trending repositories for topic computer-vision

Last 3 days (new repositories)

no newly created repositories trending in the last 3 days

Last 3 days (absolute gain)

mediar-ai/screenpipe

rewind.ai x cursor.com = your AI assistant that has all the context. 24/7 screen & voice recording for the age of super intelligence. get your data ready or be left behind

9,104 (+188)

mit

d2l-ai/d2l-zh

《动手学深度学习》：面向中文读者、能运行、可讨论。中英文版被70多个国家的500多所大学用于教学。

63,767 (+105)

apache-2.0

Tohrusky/Final2x

2^x Image Super-Resolution

5,840 (+79)

bsd-3-clause

lucidrains/vit-pytorch

Implementation of Vision Transformer, a simple way to achieve SOTA in vision classification with only a single transformer encoder, in Pytorch

20,692 (+51)

mit

opencv/opencv

Open Source Computer Vision Library

79,199 (+50)

apache-2.0

graphdeco-inria/gaussian-splatting

Original reference implementation of "3D Gaussian Splatting for Real-Time Radiance Field Rendering"

14,725 (+50)

HumanSignal/label-studio

Label Studio is a multi-type data labeling and annotation tool with standardized output format

19,405 (+43)

apache-2.0

AccumulateMore/CV

✔（已完结）最全面的深度学习笔记【土堆 Pytorch】【李沐动手学深度学习】【吴恩达深度学习】

6,275 (+43)

roboflow/supervision

We write your reusable computer vision tools. 💜

24,239 (+36)

mit

d2l-ai/d2l-en

Interactive deep learning book with multi-framework code, math, and discussions. Adopted at 500 universities from 70 countries including Stanford, MIT, Harvard, and Cambridge.

24,004 (+34)

rerun-io/rerun

Visualize streams of multimodal data. Free, fast, easy to use, and simple to integrate. Built in Rust.

6,646 (+34)

apache-2.0

google-ai-edge/mediapipe

Cross-platform, customizable ML solutions for live and streaming media.

27,652 (+34)

apache-2.0

mlfoundations/open_clip

An open source implementation of CLIP.

10,349 (+33)

amusi/CVPR2024-Papers-with-Code

CVPR 2024 论文和开源项目合集

18,328 (+33)

carla-simulator/carla

Open-source simulator for autonomous driving research.

11,420 (+32)

mit

clovaai/donut

Official Implementation of OCR-free Document Understanding Transformer (Donut) and Synthetic Document Generator (SynthDoG), ECCV 2022

5,862 (+28)

mit

open-compass/VLMEvalKit

Open-source evaluation toolkit of large vision-language models (LVLMs), support 160+ VLMs, 50+ benchmarks

1,371 (+28)

apache-2.0

CMU-Perceptual-Computing-Lab/openpose

OpenPose: Real-time multi-person keypoint detection library for body, face, hands, and foot estimation

31,293 (+28)

junyanz/pytorch-CycleGAN-and-pix2pix

Image-to-Image Translation in PyTorch

23,118 (+25)

wasmvision/wasmvision

wasmVision gets you going with computer vision.

118 (+18)

Last 3 days (relative gain)

wasmvision/wasmvision

wasmVision gets you going with computer vision.

118 (+18%)

TensoRaws/FinalRip

a distributed AI video processing tool

38 (+15%)

gpl-3.0

AIEngineersDev/solo-server

Simple server to manage compound AI

25 (+9%)

jiaowoguanren0615/MobileNetV4

This is a warehouse for MobileNetV4-Pytorch-model, can be used to train your image-datasets for vision tasks.

97 (+8%)

mit

hongxiaoy/ISO

[ECCV 2024] Monocular Occupancy Prediction for Scalable Indoor Scenes

29 (+7%)

apache-2.0

maklachur/Mamba-in-Computer-Vision

Mamba in Vision: A Comprehensive Survey of Techniques and Applications

71 (+4%)

yiyulics/CSEC

:fire: [CVPR 2024] Color Shift Estimation-and-Correction for Image Enhancement

48 (+4%)

mit

aniketpotabatti/Data-Science-EBooks

This repository contains resources in the form of ebooks, which are related to Data Science, Machine Learning, and similar topics.

438 (+4%)

natmlx/robust-video-matting-unity

Robust Video Matting for high-fidelity human segmentation in Unity Engine.

30 (+3%)

gpl-3.0

simpler-env/SimplerEnv

Evaluating and reproducing real-world robot manipulation policies (e.g., RT-1, RT-1-X, Octo) in simulation under common setups (e.g., Google Robot, WidowX+Bridge) (CoRL 2024)

327 (+3%)

mit

fqhank/CS231n-2021spring

【更新完毕】斯坦福大学计算机视觉经典课程CS231n自学材料，总结了一些遇到的问题和知识点

40 (+3%)

Kobaayyy/Awesome-Low-Level-Vision-Research-Groups

A Collection of Low Level Vision Research Groups

161 (+3%)

stalomeow/StarRailMotionCapture

Motion capture for the character models of Honkai: Star Rail base on Unity and MediaPipe. Currently face only. (Do not need an iPhone)

81 (+3%)

mit

yuvalH9/UMERegRobust

[ECCV 2024] UMERegRobust - Universal Manifold Embedding Compatible Features for Robust Point Cloud Registration

41 (+3%)

mit

adityamwagh/SuperSLAM

SuperSLAM: Open Source Framework for Deep Learning based Visual SLAM (Work in Progress)

88 (+2%)

ai4ce/MSG

[NeurIPS2024] Multiview Scene Graph (topologically representing a scene from unposed images by interconnected place and object nodes)

47 (+2%)

open-compass/VLMEvalKit

Open-source evaluation toolkit of large vision-language models (LVLMs), support 160+ VLMs, 50+ benchmarks

1,371 (+2%)

apache-2.0

ika-rwth-aachen/MultiCorrupt

MultiCorrupt: A benchmark for robust multi-modal 3D object detection, evaluating LiDAR-Camera fusion models in autonomous driving. Includes diverse corruption types (e.g., misalignment, miscalibration...

53 (+2%)

mit

dogeplusplus/sam-at-home

Gradio UI for running Meta AI's Segment Anything on own hardware. Promptable segmentation via keypoints and bounding boxes.

59 (+2%)

apache-2.0

olonok69/LLM_Notebooks

Notebooks and Code about Generative Ai, LLMs, MLOPS, NLP , CV and Graph databases

59 (+2%)

Last week (new repositories)

no newly created repositories trending in the last week

Last week (absolute gain)

mediar-ai/screenpipe

rewind.ai x cursor.com = your AI assistant that has all the context. 24/7 screen & voice recording for the age of super intelligence. get your data ready or be left behind

9,104 (+250)

mit

d2l-ai/d2l-zh

《动手学深度学习》：面向中文读者、能运行、可讨论。中英文版被70多个国家的500多所大学用于教学。

63,767 (+210)

apache-2.0

lucidrains/vit-pytorch

Implementation of Vision Transformer, a simple way to achieve SOTA in vision classification with only a single transformer encoder, in Pytorch

20,692 (+147)

mit

roboflow/supervision

We write your reusable computer vision tools. 💜

24,239 (+144)

mit

graphdeco-inria/gaussian-splatting

Original reference implementation of "3D Gaussian Splatting for Real-Time Radiance Field Rendering"

14,725 (+117)

opencv/opencv

Open Source Computer Vision Library

79,199 (+115)

apache-2.0

HumanSignal/label-studio

Label Studio is a multi-type data labeling and annotation tool with standardized output format

19,405 (+96)

apache-2.0

AccumulateMore/CV

✔（已完结）最全面的深度学习笔记【土堆 Pytorch】【李沐动手学深度学习】【吴恩达深度学习】

6,275 (+90)

Tohrusky/Final2x

2^x Image Super-Resolution

5,840 (+86)

bsd-3-clause

ashishpatel26/500-AI-Machine-learning-Deep-learning-Computer-vision-NLP-Projects-with-code

500 AI Machine learning Deep learning Computer vision NLP Projects with code

20,585 (+86)

google-ai-edge/mediapipe

Cross-platform, customizable ML solutions for live and streaming media.

27,652 (+83)

apache-2.0

Developer-Y/cs-video-courses

List of Computer Science courses with video lectures.

67,269 (+74)

rerun-io/rerun

Visualize streams of multimodal data. Free, fast, easy to use, and simple to integrate. Built in Rust.

6,646 (+73)

apache-2.0

amusi/CVPR2024-Papers-with-Code

CVPR 2024 论文和开源项目合集

18,328 (+70)

microsoft/AI-For-Beginners

12 Weeks, 24 Lessons, AI for All!

34,898 (+66)

mit

MaaAssistantArknights/MaaAssistantArknights

《明日方舟》小助手，全日常一键长草！| A one-click tool for the daily tasks of Arknights, supporting all clients.

14,234 (+64)

agpl-3.0

d2l-ai/d2l-en

Interactive deep learning book with multi-framework code, math, and discussions. Adopted at 500 universities from 70 countries including Stanford, MIT, Harvard, and Cambridge.

24,004 (+61)

katanaml/sparrow

Data processing with ML, LLM and Vision LLM

3,703 (+57)

gpl-3.0

mlfoundations/open_clip

An open source implementation of CLIP.

10,349 (+56)

open-compass/VLMEvalKit

Open-source evaluation toolkit of large vision-language models (LVLMs), support 160+ VLMs, 50+ benchmarks

1,371 (+49)

apache-2.0

Last week (relative gain)

wasmvision/wasmvision

wasmVision gets you going with computer vision.

118 (+53%)

ai4ce/MSG

[NeurIPS2024] Multiview Scene Graph (topologically representing a scene from unposed images by interconnected place and object nodes)

47 (+27%)

AIEngineersDev/solo-server

Simple server to manage compound AI

25 (+25%)

TensoRaws/FinalRip

a distributed AI video processing tool

38 (+23%)

gpl-3.0

TensoRaws/ccrestoration

an inference lib for image/video restoration with VapourSynth support

26 (+18%)

mit

WangYixuan12/gendp

[CoRL 24] GenDP: 3D Semantic Fields for Category-Level Generalizable Diffusion Policy

33 (+14%)

mit

TruongNV-hut/AIcandy_MobileNet_ImageClassification_gargdlos

MobileNet for Image Classification

28 (+12%)

yiyulics/CSEC

:fire: [CVPR 2024] Color Shift Estimation-and-Correction for Image Enhancement

48 (+12%)

mit

faizan1234567/Brain-Tumors-Segmentation

Multimodal Brain mpMRI segmentation on BraTS 2023 and BraTS 2021 datasets.

54 (+8%)

mit

10x-Engineers/Infinite-ISP

A camera ISP (image signal processor) pipeline that contains modules with simple to complex algorithms implemented at the application level.

150 (+8%)

apache-2.0

jiaowoguanren0615/MobileNetV4

This is a warehouse for MobileNetV4-Pytorch-model, can be used to train your image-datasets for vision tasks.

97 (+8%)

mit

maklachur/Mamba-in-Computer-Vision

Mamba in Vision: A Comprehensive Survey of Techniques and Applications

71 (+8%)

VIS4ROB-lab/hyperion

Symbolic Continuous-Time Gaussian Belief Propagation Framework with Ceres Interoperability

82 (+6%)

bsd-3-clause

aavek/Aeolus-Ocean

An all-weather, day-and-night, collision avoidance simulator that can be implemented as a digital twin for the autonomous COLREG-compliant navigation of maritime vessels.

34 (+6%)

bsd-3-clause

chakravarthi589/Event-based-Vision_Resources

Resources Related to Event-based Vision | Event Cameras | DVS

104 (+6%)

apache-2.0

TruongNV-hut/AIcandy_SSD300_ObjectDetection_urentmnt

SSD300 for Object Detection

36 (+6%)

RuiyangJu/Fracture_Detection_Improved_YOLOv8

ICONIP 2024

73 (+6%)

mit

yuvalH9/UMERegRobust

[ECCV 2024] UMERegRobust - Universal Manifold Embedding Compatible Features for Robust Point Cloud Registration

41 (+5%)

mit

TruongNV-hut/AIcandy_LSTM_Stock_iiyiedys

Stock Price Prediction using LSTM

41 (+5%)

antonilo/vision_locomotion

Project Code for the paper "Learning Visual Locomotion with Cross-Modal Supervision" (ICRA2023)

83 (+5%)

Last month (new repositories)

TruongNV-hut/AIcandy_DQN_FlappyBird_xcrtkuqo

Deep Q network to play flappy bird game

TruongNV-hut/AIcandy_Android_ObjectDetection_vrnthbny

Deploying Android application for object detection

DanialSoleimany/Real-Time-Sign-Language-Detection-Numbers

No description

mit

luccachiang/robots-pretrain-robots

[arXiv 2024] This is the official implementation of paper "Robots Pre-Train Robots: Manipulation-Centric Robotic Representation from Large-Scale Robot Datasets".

mit

Last month (absolute gain)

mediar-ai/screenpipe

rewind.ai x cursor.com = your AI assistant that has all the context. 24/7 screen & voice recording for the age of super intelligence. get your data ready or be left behind

9,104 (+924)

mit

d2l-ai/d2l-zh

《动手学深度学习》：面向中文读者、能运行、可讨论。中英文版被70多个国家的500多所大学用于教学。

63,767 (+888)

apache-2.0

opencv/opencv

Open Source Computer Vision Library

79,199 (+539)

apache-2.0

graphdeco-inria/gaussian-splatting

Original reference implementation of "3D Gaussian Splatting for Real-Time Radiance Field Rendering"

14,725 (+525)

lucidrains/vit-pytorch

Implementation of Vision Transformer, a simple way to achieve SOTA in vision classification with only a single transformer encoder, in Pytorch

20,692 (+499)

mit

HumanSignal/label-studio

Label Studio is a multi-type data labeling and annotation tool with standardized output format

19,405 (+484)

apache-2.0

ashishpatel26/500-AI-Machine-learning-Deep-learning-Computer-vision-NLP-Projects-with-code

500 AI Machine learning Deep learning Computer vision NLP Projects with code

20,585 (+463)

d2l-ai/d2l-en

Interactive deep learning book with multi-framework code, math, and discussions. Adopted at 500 universities from 70 countries including Stanford, MIT, Harvard, and Cambridge.

24,004 (+435)

roboflow/supervision

We write your reusable computer vision tools. 💜

24,239 (+424)

mit

google-ai-edge/mediapipe

Cross-platform, customizable ML solutions for live and streaming media.

27,652 (+375)

apache-2.0

AccumulateMore/CV

✔（已完结）最全面的深度学习笔记【土堆 Pytorch】【李沐动手学深度学习】【吴恩达深度学习】

6,275 (+365)

MaaAssistantArknights/MaaAssistantArknights

《明日方舟》小助手，全日常一键长草！| A one-click tool for the daily tasks of Arknights, supporting all clients.

14,234 (+339)

agpl-3.0

microsoft/AI-For-Beginners

12 Weeks, 24 Lessons, AI for All!

34,898 (+324)

mit

amusi/CVPR2024-Papers-with-Code

CVPR 2024 论文和开源项目合集

18,328 (+311)

Developer-Y/cs-video-courses

List of Computer Science courses with video lectures.

67,269 (+272)

mlfoundations/open_clip

An open source implementation of CLIP.

10,349 (+258)

exadel-inc/CompreFace

Leading free and open-source face recognition system

5,689 (+249)

apache-2.0

autogluon/autogluon

Fast and Accurate ML in 3 Lines of Code

8,060 (+231)

apache-2.0

cvat-ai/cvat

Annotate better with CVAT, the industry-leading data engine for machine learning. Used and trusted by teams at any scale, for data of any scale.

12,647 (+225)

mit

CMU-Perceptual-Computing-Lab/openpose

OpenPose: Real-time multi-person keypoint detection library for body, face, hands, and foot estimation

31,293 (+221)

Last month (relative gain)

TruongNV-hut/AI_U_P_Cl_YOLO5_ObjectDetection_usgmsdsh

Training YOLO5 model with custom data

31 (+675%)

TruongNV-hut/AIcandy_Android_ObjectDetection_vrnthbny

Deploying Android application for object detection

42 (+600%)

TruongNV-hut/AIcandy_DQN_FlappyBird_xcrtkuqo

Deep Q network to play flappy bird game

48 (+586%)

TruongNV-hut/AIcandy_Android_ImageClassification_smdkrohy

Deploying Android application for image classification

27 (+575%)

wasmvision/wasmvision

wasmVision gets you going with computer vision.

118 (+556%)

TruongNV-hut/AIcandy_SSD300_ObjectDetection_urentmnt

SSD300 for Object Detection

36 (+500%)

TruongNV-hut/AIcandy_LSTM_Stock_iiyiedys

Stock Price Prediction using LSTM

41 (+413%)

TensoRaws/FinalRip

a distributed AI video processing tool

38 (+375%)

gpl-3.0

TruongNV-hut/AIcandy_MobileNet_ImageClassification_gargdlos

MobileNet for Image Classification

28 (+367%)

luccachiang/robots-pretrain-robots

[arXiv 2024] This is the official implementation of paper "Robots Pre-Train Robots: Manipulation-Centric Robotic Representation from Large-Scale Robot Datasets".

33 (+200%)

mit

MAC-VO/MAC-VO

MACVO: Metrics-aware Covariance for Learning-based Stereo Visual Odometry

59 (+195%)

bsd-3-clause

koi953215/NaRCan

[NeurIPS 2024] NaRCan: Natural Refined Canonical Image with Integration of Diffusion Prior for Video Editing

136 (+157%)

Correr-Zhou/MagicTailor

Offical implementation of the paper "MagicTailor: Component-Controllable Personalization in Text-to-Image Diffusion Models".

68 (+134%)

apache-2.0

DanialSoleimany/Real-Time-Sign-Language-Detection-Numbers

No description

39 (+117%)

mit

maklachur/Mamba-in-Computer-Vision

Mamba in Vision: A Comprehensive Survey of Techniques and Applications

71 (+97%)

WangYixuan12/gendp

[CoRL 24] GenDP: 3D Semantic Fields for Category-Level Generalizable Diffusion Policy

33 (+94%)

mit

AaronCIH/APGCC

ECCV24 - Improving Point-based Crowd Counting and Localization Based on Auxiliary Point Guidance

48 (+55%)

mit

Joao-M-Silva/padel_analytics

AI-powered padel analytics

112 (+51%)

Melanee-Melanee/Old-Persian-Cuneiform-OCR

an OCR tool to translate Old Persian cuneiform (Achaemenid language)

132 (+50%)

10x-Engineers/Infinite-ISP

A camera ISP (image signal processor) pipeline that contains modules with simple to complex algorithms implemented at the application level.

150 (+50%)

apache-2.0

Last 12-months (new repositories)

mediar-ai/screenpipe

rewind.ai x cursor.com = your AI assistant that has all the context. 24/7 screen & voice recording for the age of super intelligence. get your data ready or be left behind

9,104

mit

roboflow/sports

computer vision and sports

2,529

mit

cambrian-mllm/cambrian

Cambrian-1 is a family of multimodal LLMs with a vision-centric design.

1,765

apache-2.0

GaParmar/img2img-turbo

One-step image-to-image with Stable Diffusion turbo: sketch2image, day2night, and more

1,650

mit

spla-tam/SplaTAM

SplaTAM: Splat, Track & Map 3D Gaussians for Dense RGB-D SLAM (CVPR 2024)

1,577

bsd-3-clause

muskie82/MonoGS

[CVPR'24 Highlight & Best Demo Award] Gaussian Splatting SLAM

1,392

open-compass/VLMEvalKit

Open-source evaluation toolkit of large vision-language models (LVLMs), support 160+ VLMs, 50+ benchmarks

1,371

apache-2.0

robertknight/ocrs

Rust library and CLI tool for OCR (extracting text from images)

1,253

apache-2.0

ogkalu2/comic-translate

Desktop app for automatically translating comics - BDs, Manga, Manhwa, Fumetti and more in a variety of formats (Image, Pdf, Epub, cbr, cbz, etc) and in multiple languages.

1,115

apache-2.0

3DTopia/OpenLRM

An open-source impl. of Large Reconstruction Models

972

apache-2.0

VladimirYugay/Gaussian-SLAM

Gaussian-SLAM: Photo-realistic Dense SLAM with Gaussian Splatting

933

mit

RQLuo/MixTeX-Latex-OCR

MixTeX multimodal LaTeX, ZhEn, and, Table OCR. It performs efficient CPU-based inference in a local offline on Windows.

863

agpl-3.0

AntonioTepsich/Convolutional-KANs

This project extends the idea of the innovative architecture of Kolmogorov-Arnold Networks (KAN) to the Convolutional Layers, changing the classic linear transformation of the convolution to learnable...

781

mit

baegwangbin/DSINE

[CVPR 2024 Oral] Rethinking Inductive Biases for Surface Normal Estimation

728

SkalskiP/top-cvpr-2024-papers

This repository is a curated collection of the most exciting and influential CVPR 2024 papers. 🔥 [Paper + Code + Demo]

663

cc0-1.0

nianticlabs/acezero

[ECCV 2024 - Oral] ACE0 is a learning-based structure-from-motion approach that estimates camera parameters of sets of images by learning a multi-view consistent, implicit scene representation.

660

lpiccinelli-eth/UniDepth

Universal Monocular Metric Depth Estimation

630

Adamdad/kat

Kolmogorov-Arnold Transformer: A PyTorch Implementation with CUDA kernel

604

mit

Jumpat/SegAnyGAussians

The official implementation of SAGA (Segment Any 3D GAussians)

598

apache-2.0

xverse-engine/XV3DGS-UEPlugin

A Unreal Engine 5 (UE5) based plugin aiming to provide real-time visulization, management, editing, and scalable hybrid rendering of Guassian Splatting model.

566

apache-2.0

Last 12-months (absolute gain)

roboflow/supervision

We write your reusable computer vision tools. 💜

24,239 (+17,233)

mit

microsoft/AI-For-Beginners

12 Weeks, 24 Lessons, AI for All!

34,898 (+13,914)

mit

d2l-ai/d2l-zh

《动手学深度学习》：面向中文读者、能运行、可讨论。中英文版被70多个国家的500多所大学用于教学。

63,767 (+12,771)

apache-2.0

mediar-ai/screenpipe

rewind.ai x cursor.com = your AI assistant that has all the context. 24/7 screen & voice recording for the age of super intelligence. get your data ready or be left behind

9,104 (+9,103)

mit

graphdeco-inria/gaussian-splatting

Original reference implementation of "3D Gaussian Splatting for Real-Time Radiance Field Rendering"

14,725 (+7,357)

ashishpatel26/500-AI-Machine-learning-Deep-learning-Computer-vision-NLP-Projects-with-code

500 AI Machine learning Deep learning Computer vision NLP Projects with code

20,585 (+6,846)

opencv/opencv

Open Source Computer Vision Library

79,199 (+6,712)

apache-2.0

Developer-Y/cs-video-courses

List of Computer Science courses with video lectures.

67,269 (+5,479)

lucidrains/vit-pytorch

Implementation of Vision Transformer, a simple way to achieve SOTA in vision classification with only a single transformer encoder, in Pytorch

20,692 (+4,739)

mit

HumanSignal/label-studio

Label Studio is a multi-type data labeling and annotation tool with standardized output format

19,405 (+4,592)

apache-2.0

amusi/CVPR2024-Papers-with-Code

CVPR 2024 论文和开源项目合集

18,328 (+4,227)

d2l-ai/d2l-en

Interactive deep learning book with multi-framework code, math, and discussions. Adopted at 500 universities from 70 countries including Stanford, MIT, Harvard, and Cambridge.

24,004 (+4,156)

AccumulateMore/CV

✔（已完结）最全面的深度学习笔记【土堆 Pytorch】【李沐动手学深度学习】【吴恩达深度学习】

6,275 (+4,148)

MaaAssistantArknights/MaaAssistantArknights

《明日方舟》小助手，全日常一键长草！| A one-click tool for the daily tasks of Arknights, supporting all clients.

14,234 (+3,695)

agpl-3.0

google-ai-edge/mediapipe

Cross-platform, customizable ML solutions for live and streaming media.

27,652 (+3,685)

apache-2.0

voxel51/fiftyone

Refine high-quality datasets and visual AI models

8,892 (+3,477)

apache-2.0

mlfoundations/open_clip

An open source implementation of CLIP.

10,349 (+3,440)

katanaml/sparrow

Data processing with ML, LLM and Vision LLM

3,703 (+3,175)

gpl-3.0

rerun-io/rerun

Visualize streams of multimodal data. Free, fast, easy to use, and simple to integrate. Built in Rust.

6,646 (+3,140)

apache-2.0

CMU-Perceptual-Computing-Lab/openpose

OpenPose: Real-time multi-person keypoint detection library for body, face, hands, and foot estimation

31,293 (+2,663)

Last 12-months (relative gain)

cambrian-mllm/cambrian

Cambrian-1 is a family of multimodal LLMs with a vision-centric design.

1,765 (+44,025%)

apache-2.0

ogkalu2/comic-translate

Desktop app for automatically translating comics - BDs, Manga, Manhwa, Fumetti and more in a variety of formats (Image, Pdf, Epub, cbr, cbz, etc) and in multiple languages.

1,115 (+27,775%)

apache-2.0

aurelio-labs/semantic-router

Superfast AI decision making and intelligent processing of multi-modal data.

2,121 (+23,467%)

mit

VladimirYugay/Gaussian-SLAM

Gaussian-SLAM: Photo-realistic Dense SLAM with Gaussian Splatting

933 (+15,450%)

mit

limuloo/MIGC

[CVPR 2024 Highlight] "MIGC: Multi-Instance Generation Controller for Text-to-Image Synthesis" (Official Implementation)

546 (+13,550%)

ohayonguy/PMRF

Official implementation of Posterior-Mean Rectified Flow: Towards Minimum MSE Photo-Realistic Image Restoration

534 (+13,250%)

mit

lpiccinelli-eth/UniDepth

Universal Monocular Metric Depth Estimation

630 (+12,500%)

roboflow/sports

computer vision and sports

2,529 (+10,896%)

mit

open-compass/VLMEvalKit

Open-source evaluation toolkit of large vision-language models (LVLMs), support 160+ VLMs, 50+ benchmarks

1,371 (+9,040%)

apache-2.0

MMMU-Benchmark/MMMU

This repo contains evaluation code for the paper "MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI"

359 (+8,875%)

apache-2.0

zubair-irshad/Awesome-Robotics-3D

A curated list of 3D Vision papers relating to Robotics domain in the era of large models i.e. LLMs/VLMs, inspired by awesome-computer-vision, including papers, codes, and related websites

554 (+7,814%)

LMD0311/PointMamba

[NeurIPS 2024] PointMamba: A Simple State Space Model for Point Cloud Analysis

361 (+7,120%)

apache-2.0

youssefHosni/Awesome-AI-Data-Guided-Projects

A curated list of data science & AI guided projects to start building your portfolio

337 (+6,640%)

gpl-3.0

SkalskiP/top-cvpr-2024-papers

This repository is a curated collection of the most exciting and influential CVPR 2024 papers. 🔥 [Paper + Code + Demo]

663 (+6,530%)

cc0-1.0

LMD0311/Awesome-World-Model

Collect some World Models for Autonomous Driving papers.

546 (+5,360%)

River-Zhang/SIFU

[CVPR 2024 Highlight] Official repository for paper "SIFU: Side-view Conditioned Implicit Function for Real-world Usable Clothed Human Reconstruction"

209 (+5,125%)

mit

google/diffseg

DiffSeg is an unsupervised zero-shot segmentation method using attention information from a stable-diffusion model. This repo implements the main DiffSeg algorithm and additionally includes an experim...

271 (+4,417%)

mit

IvanDrokin/torch-conv-kan

This project is dedicated to the implementation and research of Kolmogorov-Arnold convolutional networks. The repository includes implementations of 1D, 2D, and 3D convolutions with different kernels...

417 (+4,070%)

mit

KupynOrest/head_detector

Official repo for VGGHeads: A Large-Scale Synthetic Dataset for 3D Human Heads.

147 (+3,575%)

zc-alexfan/hold

[CVPR 2024✨Highlight] Official repository for HOLD, the first method that jointly reconstructs articulated hands and objects from monocular videos without assuming a pre-scanned object template and 3D...

312 (+3,367%)

mit