Trending repositories for topic cuda

Last 3 days (new repositories)

no newly created repositories trending in the last 3 days

Last 3 days (absolute gain)

vllm-project/vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

30,534 (+232)

apache-2.0

sgl-project/sglang

SGLang is a fast serving framework for large language models and vision language models.

6,142 (+62)

apache-2.0

srush/GPU-Puzzles

Solve puzzles. Learn CUDA.

9,942 (+29)

mit

DefTruth/CUDA-Learn-Notes

📚Modern CUDA Learn Notes with PyTorch: Tensor/CUDA Cores, 📖150+ CUDA Kernels with PyTorch bindings, 📖HGEMM/SGEMM (95%~99% cuBLAS performance), 📖100+ LLM/CUDA Blogs.

1,478 (+22)

gpl-3.0

isl-org/Open3D

Open3D: A Modern Library for 3D Data Processing

11,496 (+22)

NVIDIA/cutlass

CUDA Templates for Linear Algebra Subroutines

5,689 (+20)

vosen/ZLUDA

CUDA on non-NVIDIA GPUs

9,762 (+18)

apache-2.0

hashcat/hashcat

World's fastest and most advanced password recovery utility

21,352 (+16)

NVIDIA/cuda-samples

Samples for CUDA Developers which demonstrates features in CUDA Toolkit

6,462 (+15)

replicate/cog

Containers for machine learning

8,094 (+14)

apache-2.0

kaldi-asr/kaldi

kaldi-asr/kaldi is the official location of the Kaldi project.

14,300 (+13)

NVIDIA/cccl

CUDA Core Compute Libraries

1,284 (+12)

PygmalionAI/aphrodite-engine

Large-scale LLM inference engine

1,140 (+11)

agpl-3.0

flashinfer-ai/flashinfer

FlashInfer: Kernel Library for LLM Serving

1,453 (+11)

apache-2.0

NVIDIA/TransformerEngine

A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit floating point (FP8) precision on Hopper and Ada GPUs, to provide better performance with lower memory utilization i...

1,980 (+8)

apache-2.0

OpenNMT/CTranslate2

Fast inference engine for Transformer models

3,412 (+7)

mit

coderonion/awesome-yolo-object-detection

🚀🚀🚀 A collection of some awesome public YOLO object detection series projects.

1,273 (+6)

siliconflow/onediff

OneDiff: An out-of-the-box acceleration library for diffusion models.

1,703 (+6)

apache-2.0

XuehaiPan/nvitop

An interactive NVIDIA-GPU process viewer and beyond, the one-stop solution for GPU process management.

4,829 (+6)

apache-2.0

jan-van-bergen/GPU-Raytracer

GPU Raytracer from scratch in C++/CUDA

828 (+5)

mit

Last 3 days (relative gain)

snurr-group/gRASPA

GPU Monte Carlo Simulation Code with a taste of RASPA

27 (+4%)

mit

TomClabault/HIPRT-Path-Tracer

Unbiased & physically-based GPU HIPRT (C++/HIP) interactive path tracing renderer

80 (+3%)

gpl-3.0

SJTU-IPADS/PhoenixOS

Fast OS-level support for GPU checkpoint and restore

44 (+2%)

apache-2.0

b0nes164/GPUPrefixSums

A nearly complete collection of prefix sum algorithms implemented in CUDA, D3D12, Unity and WGPU. Theoretically portable to all wave/warp/subgroup sizes.

91 (+2%)

DD-DuDa/Cute-Learning

Examples of CUDA implementations by Cutlass CuTe

99 (+2%)

mit

zjhellofss/KuiperLLama

校招、秋招、春招、实习好项目，带你从零动手实现支持LLama2/3和Qwen2.5的大模型推理框架。

228 (+2%)

NickKarpowicz/LightwaveExplorer

An efficient, user-friendly solver for nonlinear light-matter interaction

59 (+2%)

mit

DefTruth/CUDA-Learn-Notes

📚Modern CUDA Learn Notes with PyTorch: Tensor/CUDA Cores, 📖150+ CUDA Kernels with PyTorch bindings, 📖HGEMM/SGEMM (95%~99% cuBLAS performance), 📖100+ LLM/CUDA Blogs.

1,478 (+2%)

gpl-3.0

sgl-project/sglang

SGLang is a fast serving framework for large language models and vision language models.

6,142 (+1%)

apache-2.0

PygmalionAI/aphrodite-engine

Large-scale LLM inference engine

1,140 (+1.0%)

agpl-3.0

NVIDIA/cccl

CUDA Core Compute Libraries

1,284 (+0.9%)

ServerlessLLM/ServerlessLLM

Serverless LLM Serving for Everyone.

352 (+0.9%)

apache-2.0

beam-cloud/beta9

Run serverless GPU workloads with fast cold starts on bare-metal servers, anywhere in the world

488 (+0.8%)

agpl-3.0

zer011b/fdtd3d

fdtd3d is an open source 1D, 2D, 3D FDTD electromagnetics solver with MPI, OpenMP and CUDA support for x64, ARM, ARM64, RISC-V, PowerPC, Wasm architectures

125 (+0.8%)

gpl-2.0

dizcza/docker-hashcat

Latest hashcat docker for CUDA, OpenCL, and POCL. Deployed on Vast.ai

137 (+0.7%)

mit

cryinkfly/SOLIDWORKS-for-Linux

This is a project, where I give you a way to use SOLIDWORKS on Linux!

422 (+0.7%)

mit

js1010/cuhnsw

CUDA implementation of Hierarchical Navigable Small World Graph algorithm

143 (+0.7%)

apache-2.0

gezp/docker-ubuntu-desktop

Docker Image for Ubuntu Desktop which support HW GPU accelerated GUI apps. you can access the Container with ssh or remote desktop, just like Cloud VM.

307 (+0.7%)

mit

virchau13/automatic1111-webui-nix

AUTOMATIC1111/stable-diffusion-webui for CUDA and ROCm on NixOS

155 (+0.6%)

mit

jan-van-bergen/GPU-Raytracer

GPU Raytracer from scratch in C++/CUDA

828 (+0.6%)

mit

Last week (new repositories)

no newly created repositories trending in the last week

Last week (absolute gain)

vllm-project/vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

30,534 (+425)

apache-2.0

sgl-project/sglang

SGLang is a fast serving framework for large language models and vision language models.

6,142 (+121)

apache-2.0

vosen/ZLUDA

CUDA on non-NVIDIA GPUs

9,762 (+70)

apache-2.0

srush/GPU-Puzzles

Solve puzzles. Learn CUDA.

9,942 (+63)

mit

DefTruth/CUDA-Learn-Notes

📚Modern CUDA Learn Notes with PyTorch: Tensor/CUDA Cores, 📖150+ CUDA Kernels with PyTorch bindings, 📖HGEMM/SGEMM (95%~99% cuBLAS performance), 📖100+ LLM/CUDA Blogs.

1,478 (+52)

gpl-3.0

NVIDIA/cutlass

CUDA Templates for Linear Algebra Subroutines

5,689 (+39)

NVIDIA/cuda-samples

Samples for CUDA Developers which demonstrates features in CUDA Toolkit

6,462 (+38)

hashcat/hashcat

World's fastest and most advanced password recovery utility

21,352 (+36)

isl-org/Open3D

Open3D: A Modern Library for 3D Data Processing

11,496 (+36)

cupy/cupy

NumPy & SciPy for GPU

9,497 (+31)

mit

replicate/cog

Containers for machine learning

8,094 (+26)

apache-2.0

NVIDIA/cccl

CUDA Core Compute Libraries

1,284 (+26)

flashinfer-ai/flashinfer

FlashInfer: Kernel Library for LLM Serving

1,453 (+26)

apache-2.0

pytorch/ao

PyTorch native quantization and sparsity for training and inference

1,587 (+23)

bsd-3-clause

rapidsai/cudf

cuDF - GPU DataFrame Library

8,456 (+21)

apache-2.0

kaldi-asr/kaldi

kaldi-asr/kaldi is the official location of the Kaldi project.

14,300 (+21)

NVIDIA/TransformerEngine

1,980 (+20)

apache-2.0

numba/numba

NumPy aware dynamic Python compiler using LLVM

9,989 (+20)

bsd-2-clause

XuehaiPan/nvitop

An interactive NVIDIA-GPU process viewer and beyond, the one-stop solution for GPU process management.

4,829 (+18)

apache-2.0

BBuf/how-to-optim-algorithm-in-cuda

how to optimize some algorithm in cuda.

1,599 (+18)

Last week (relative gain)

SJTU-IPADS/PhoenixOS

Fast OS-level support for GPU checkpoint and restore

44 (+13%)

apache-2.0

snurr-group/gRASPA

GPU Monte Carlo Simulation Code with a taste of RASPA

27 (+13%)

mit

DD-DuDa/Cute-Learning

Examples of CUDA implementations by Cutlass CuTe

99 (+6%)

mit

EldarMuradov/EraEngine

Open source C++ 3D game engine

120 (+5%)

apache-2.0

wangzyon/NVIDIA_SGEMM_PRACTICE

Step-by-step optimization of CUDA SGEMM

242 (+5%)

NVIDIA/numbast

Numbast is a tool to build an automated pipeline that converts CUDA APIs into Numba bindings.

27 (+4%)

apache-2.0

DefTruth/CUDA-Learn-Notes

📚Modern CUDA Learn Notes with PyTorch: Tensor/CUDA Cores, 📖150+ CUDA Kernels with PyTorch bindings, 📖HGEMM/SGEMM (95%~99% cuBLAS performance), 📖100+ LLM/CUDA Blogs.

1,478 (+4%)

gpl-3.0

NickKarpowicz/LightwaveExplorer

An efficient, user-friendly solver for nonlinear light-matter interaction

59 (+4%)

mit

loeeeee/immich-in-lxc

Install Immich in LXC with optional CUDA support

63 (+3%)

zjhellofss/KuiperLLama

校招、秋招、春招、实习好项目，带你从零动手实现支持LLama2/3和Qwen2.5的大模型推理框架。

228 (+3%)

rapidsai/cuvs

cuVS - a library for vector search and clustering on the GPU

225 (+3%)

apache-2.0

TomClabault/HIPRT-Path-Tracer

Unbiased & physically-based GPU HIPRT (C++/HIP) interactive path tracing renderer

80 (+3%)

gpl-3.0

beam-cloud/beta9

Run serverless GPU workloads with fast cold starts on bare-metal servers, anywhere in the world

488 (+3%)

agpl-3.0

wangsiping97/FastGEMV

High-speed GEMV kernels, at most 2.7x speedup compared to pytorch baseline.

90 (+2%)

mit

b0nes164/GPUPrefixSums

A nearly complete collection of prefix sum algorithms implemented in CUDA, D3D12, Unity and WGPU. Theoretically portable to all wave/warp/subgroup sizes.

91 (+2%)

PennyLaneAI/pennylane-lightning

The PennyLane-Lightning plugin provides a fast state-vector simulator written in C++ for use with PennyLane

92 (+2%)

apache-2.0

Natsu-Akatsuki/RangeNetTrt8

tensorrt8 && cuda && libtorch implementation of rangenet++

49 (+2%)

mit

Bruce-Lee-LY/cuda_hgemv

Several optimization methods of half-precision general matrix vector multiplication (HGEMV) using CUDA core.

50 (+2%)

mit

emptysoal/cuda-image-preprocess

Speed up image preprocess with cuda when handle image or tensorrt inference

54 (+2%)

mit

bytedance/flux

A fast communication-overlapping library for tensor parallelism on GPUs.

224 (+2%)

apache-2.0

Last month (new repositories)

SJTU-IPADS/PhoenixOS

Fast OS-level support for GPU checkpoint and restore

apache-2.0

andravin/spio

Efficient CUDA kernels for training convolutional neural networks with PyTorch.

apache-2.0

Last month (absolute gain)

vllm-project/vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

30,534 (+1,652)

apache-2.0

sgl-project/sglang

SGLang is a fast serving framework for large language models and vision language models.

6,142 (+460)

apache-2.0

vosen/ZLUDA

CUDA on non-NVIDIA GPUs

9,762 (+242)

apache-2.0

srush/GPU-Puzzles

Solve puzzles. Learn CUDA.

9,942 (+235)

mit

hashcat/hashcat

World's fastest and most advanced password recovery utility

21,352 (+225)

DefTruth/CUDA-Learn-Notes

📚Modern CUDA Learn Notes with PyTorch: Tensor/CUDA Cores, 📖150+ CUDA Kernels with PyTorch bindings, 📖HGEMM/SGEMM (95%~99% cuBLAS performance), 📖100+ LLM/CUDA Blogs.

1,478 (+172)

gpl-3.0

NVIDIA/cuda-samples

Samples for CUDA Developers which demonstrates features in CUDA Toolkit

6,462 (+170)

NVIDIA/cutlass

CUDA Templates for Linear Algebra Subroutines

5,689 (+160)

ServerlessLLM/ServerlessLLM

Serverless LLM Serving for Everyone.

352 (+141)

apache-2.0

pytorch/ao

PyTorch native quantization and sparsity for training and inference

1,587 (+141)

bsd-3-clause

flashinfer-ai/flashinfer

FlashInfer: Kernel Library for LLM Serving

1,453 (+134)

apache-2.0

isl-org/Open3D

Open3D: A Modern Library for 3D Data Processing

11,496 (+132)

replicate/cog

Containers for machine learning

8,094 (+106)

apache-2.0

cupy/cupy

NumPy & SciPy for GPU

9,497 (+106)

mit

NVlabs/instant-ngp

Instant neural graphics primitives: lightning fast NeRF and more

16,037 (+101)

numba/numba

NumPy aware dynamic Python compiler using LLVM

9,989 (+94)

bsd-2-clause

chrxh/alien

ALIEN is a CUDA-powered artificial life simulation program.

4,952 (+94)

bsd-3-clause

NVIDIA/TransformerEngine

1,980 (+91)

apache-2.0

tracel-ai/cubecl

Multi-platform high-performance compute language extension for Rust.

676 (+87)

apache-2.0

rapidsai/cudf

cuDF - GPU DataFrame Library

8,456 (+86)

apache-2.0

Last month (relative gain)

DD-DuDa/Cute-Learning

Examples of CUDA implementations by Cutlass CuTe

99 (+219%)

mit

NVIDIA/numbast

Numbast is a tool to build an automated pipeline that converts CUDA APIs into Numba bindings.

27 (+80%)

apache-2.0

ServerlessLLM/ServerlessLLM

Serverless LLM Serving for Everyone.

352 (+67%)

apache-2.0

TomClabault/HIPRT-Path-Tracer

Unbiased & physically-based GPU HIPRT (C++/HIP) interactive path tracing renderer

80 (+43%)

gpl-3.0

LambdaLabsML/distributed-training-guide

Best practices & guides on how to write distributed pytorch training code

287 (+41%)

mit

Bruce-Lee-LY/flash_attention_inference

Performance of the C++ interface of flash attention and flash attention v2 in large language model (LLM) inference scenarios.

29 (+26%)

bsd-3-clause

aredden/torch-cublas-hgemm

PyTorch half precision gemm lib w/ fused optional bias + optional relu/gelu

39 (+26%)

loeeeee/immich-in-lxc

Install Immich in LXC with optional CUDA support

63 (+24%)

snurr-group/gRASPA

GPU Monte Carlo Simulation Code with a taste of RASPA

27 (+23%)

mit

EldarMuradov/EraEngine

Open source C++ 3D game engine

120 (+19%)

apache-2.0

b0nes164/GPUPrefixSums

A nearly complete collection of prefix sum algorithms implemented in CUDA, D3D12, Unity and WGPU. Theoretically portable to all wave/warp/subgroup sizes.

91 (+17%)

rapidsai/cuvs

cuVS - a library for vector search and clustering on the GPU

225 (+15%)

apache-2.0

tracel-ai/cubecl

Multi-platform high-performance compute language extension for Rust.

676 (+15%)

apache-2.0

interestingLSY/swiftLLM

A tiny yet powerful LLM inference system tailored for researching purpose. vLLM-equivalent performance with only 2k lines of code (2% of vLLM).

104 (+14%)

apache-2.0

jamjamjon/usls

A Rust library integrated with ONNXRuntime, providing a collection of Computer Vison and Vision-Language models.

50 (+14%)

mit

DefTruth/CUDA-Learn-Notes

📚Modern CUDA Learn Notes with PyTorch: Tensor/CUDA Cores, 📖150+ CUDA Kernels with PyTorch bindings, 📖HGEMM/SGEMM (95%~99% cuBLAS performance), 📖100+ LLM/CUDA Blogs.

1,478 (+13%)

gpl-3.0

beam-cloud/beta9

Run serverless GPU workloads with fast cold starts on bare-metal servers, anywhere in the world

488 (+13%)

agpl-3.0

NVIDIA/nvImageCodec

A nvImageCodec library of GPU- and CPU- accelerated codecs featuring a unified interface

80 (+13%)

apache-2.0

koide3/gtsam_points

A collection of GTSAM factors and optimizers for point cloud SLAM

206 (+12%)

mit

zjin-lcf/HeCBench

No description

217 (+11%)

bsd-3-clause

Last 12-months (new repositories)

sgl-project/sglang

SGLang is a fast serving framework for large language models and vision language models.

6,142

apache-2.0

tracel-ai/cubecl

Multi-platform high-performance compute language extension for Rust.

676

apache-2.0

efeslab/Nanoflow

A throughput-oriented high-performance serving framework for LLMs

636

apache-2.0

laugh12321/TensorRT-YOLO

🚀 你的YOLO部署神器。TensorRT Plugin、CUDA Kernel、CUDA Graphs三管齐下，享受闪电般的推理速度。| Your YOLO Deployment Powerhouse. With the synergy of TensorRT Plugins, CUDA Kernels, and CUDA Graphs, experience lightning-fast i...

626

gpl-3.0

ServerlessLLM/ServerlessLLM

Serverless LLM Serving for Everyone.

352

apache-2.0

LambdaLabsML/distributed-training-guide

Best practices & guides on how to write distributed pytorch training code

287

mit

zjhellofss/KuiperLLama

校招、秋招、春招、实习好项目，带你从零动手实现支持LLama2/3和Qwen2.5的大模型推理框架。

228

bytedance/ABQ-LLM

An acceleration library that supports arbitrary bit-width combinatorial quantization operations

226

apache-2.0

NVIDIA/cuda-checkpoint

CUDA checkpoint and restore utility

225

bytedance/flux

A fast communication-overlapping library for tensor parallelism on GPUs.

224

apache-2.0

koide3/gtsam_points

A collection of GTSAM factors and optimizers for point cloud SLAM

206

mit

Phoenix8215/A-White-Paper-on-Neural-Network-Deployment

模型部署白皮书(CUDA|ONNX|TensorRT|C++)🚀🚀🚀

185

gpl-3.0

msminhas93/nviwatch

NviWatch: A blazingly fast rust based TUI for managing and monitoring NVIDIA GPU processes

176

gpl-3.0

HMUNACHI/cuda-repo

From zero to hero CUDA for accelerating maths and machine learning on GPU.

171

mit

proger/accelerated-scan

Accelerated First Order Parallel Associative Scan

164

mit

lukasHoel/3DGS-LM

3DGS-LM accelerates Gaussian-Splatting optimization by replacing the ADAM optimizer with Levenberg-Marquardt.

149

b0nes164/GPUSorting

State of the art sorting and segmented sorting, including OneSweep. Implemented in CUDA, D3D12, and Unity style compute shaders. Theoretically portable to all wave/warp/subgroup sizes.

143

leimao/CUDA-GEMM-Optimization

CUDA Matrix Multiplication Optimization

141

mit

qdLMF/LIO-SAM-GPU-ScanToMapOpt

A CUDA reimplementation of the line/plane odometry of LIO-SAM. A point cloud hash map (inspired by iVox of Faster-LIO) on GPU is used to accelerate 5-neighbour KNN search.

124

bsd-3-clause

Desilo/liberate-fhe

A Fully Homomorphic Encryption (FHE) library for bridging the gap between theory and practice with a focus on performance and accuracy.

114

bsd-3-clause-clear

Last 12-months (absolute gain)

vllm-project/vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

30,534 (+20,565)

apache-2.0

vosen/ZLUDA

CUDA on non-NVIDIA GPUs

9,762 (+8,255)

apache-2.0

kroma-network/tachyon

Modular ZK(Zero Knowledge) backend accelerated by GPU

7,777 (+7,691)

mit

sgl-project/sglang

SGLang is a fast serving framework for large language models and vision language models.

6,142 (+6,119)

apache-2.0

srush/GPU-Puzzles

Solve puzzles. Learn CUDA.

9,942 (+5,913)

mit

hashcat/hashcat

World's fastest and most advanced password recovery utility

21,352 (+2,653)

replicate/cog

Containers for machine learning

8,094 (+2,207)

apache-2.0

NVIDIA/cutlass

CUDA Templates for Linear Algebra Subroutines

5,689 (+2,181)

cupy/cupy

NumPy & SciPy for GPU

9,497 (+2,156)

mit

NVIDIA/cuda-samples

Samples for CUDA Developers which demonstrates features in CUDA Toolkit

6,462 (+1,974)

rapidsai/cudf

cuDF - GPU DataFrame Library

8,456 (+1,875)

apache-2.0

chrxh/alien

ALIEN is a CUDA-powered artificial life simulation program.

4,952 (+1,809)

bsd-3-clause

isl-org/Open3D

Open3D: A Modern Library for 3D Data Processing

11,496 (+1,758)

XuehaiPan/nvitop

An interactive NVIDIA-GPU process viewer and beyond, the one-stop solution for GPU process management.

4,829 (+1,717)

apache-2.0

NVlabs/instant-ngp

Instant neural graphics primitives: lightning fast NeRF and more

16,037 (+1,622)

pytorch/ao

PyTorch native quantization and sparsity for training and inference

1,587 (+1,569)

bsd-3-clause

DefTruth/CUDA-Learn-Notes

📚Modern CUDA Learn Notes with PyTorch: Tensor/CUDA Cores, 📖150+ CUDA Kernels with PyTorch bindings, 📖HGEMM/SGEMM (95%~99% cuBLAS performance), 📖100+ LLM/CUDA Blogs.

1,478 (+1,474)

gpl-3.0

flashinfer-ai/flashinfer

FlashInfer: Kernel Library for LLM Serving

1,453 (+1,372)

apache-2.0

OpenNMT/CTranslate2

Fast inference engine for Transformer models

3,412 (+1,206)

mit

siliconflow/onediff

OneDiff: An out-of-the-box acceleration library for diffusion models.

1,703 (+1,134)

apache-2.0

Last 12-months (relative gain)

DefTruth/CUDA-Learn-Notes

📚Modern CUDA Learn Notes with PyTorch: Tensor/CUDA Cores, 📖150+ CUDA Kernels with PyTorch bindings, 📖HGEMM/SGEMM (95%~99% cuBLAS performance), 📖100+ LLM/CUDA Blogs.

1,478 (+36,850%)

gpl-3.0

sgl-project/sglang

SGLang is a fast serving framework for large language models and vision language models.

6,142 (+26,604%)

apache-2.0

kroma-network/tachyon

Modular ZK(Zero Knowledge) backend accelerated by GPU

7,777 (+8,943%)

mit

pytorch/ao

PyTorch native quantization and sparsity for training and inference

1,587 (+8,717%)

bsd-3-clause

ServerlessLLM/ServerlessLLM

Serverless LLM Serving for Everyone.

352 (+8,700%)

apache-2.0

zjhellofss/KuiperLLama

校招、秋招、春招、实习好项目，带你从零动手实现支持LLama2/3和Qwen2.5的大模型推理框架。

228 (+5,600%)

bytedance/ABQ-LLM

An acceleration library that supports arbitrary bit-width combinatorial quantization operations

226 (+5,550%)

apache-2.0

Phoenix8215/A-White-Paper-on-Neural-Network-Deployment

模型部署白皮书(CUDA|ONNX|TensorRT|C++)🚀🚀🚀

185 (+2,983%)

gpl-3.0

rapidsai/cuvs

cuVS - a library for vector search and clustering on the GPU

225 (+1,775%)

apache-2.0

flashinfer-ai/flashinfer

FlashInfer: Kernel Library for LLM Serving

1,453 (+1,694%)

apache-2.0

helyim/helyim

seaweedfs implemented in pure Rust

152 (+1,282%)

apache-2.0

msminhas93/nviwatch

NviWatch: A blazingly fast rust based TUI for managing and monitoring NVIDIA GPU processes

176 (+1,254%)

gpl-3.0

Aesthisia/LLMinator

Gradio based tool to run opensource LLM models directly from Huggingface

87 (+1,143%)

mit

ztxtech/Time-Evidence-Fusion-Network

Official implementation of "Time Evidence Fusion Network: Multi-source View in Long-Term Time Series Forecasting" (https://arxiv.org/abs/2405.06419)

66 (+1,000%)

mit

leimao/CUDA-GEMM-Optimization

CUDA Matrix Multiplication Optimization

141 (+907%)

mit

PygmalionAI/aphrodite-engine

Large-scale LLM inference engine

1,140 (+757%)

agpl-3.0

Bruce-Lee-LY/cuda_hgemv

Several optimization methods of half-precision general matrix vector multiplication (HGEMV) using CUDA core.

50 (+733%)

mit

koide3/gtsam_points

A collection of GTSAM factors and optimizers for point cloud SLAM

206 (+692%)

mit

SafeAILab/zkDL

zkDL, an open source toolkit for zero-knowledge proofs of deep learning powered by CUDA

34 (+580%)

bsd-2-clause

vosen/ZLUDA

CUDA on non-NVIDIA GPUs

9,762 (+548%)

apache-2.0