23 results found Sort:

192
3.4k
bsd-2-clause
37
Efficient Triton Kernels for LLM Training
Created 2024-08-06
269 commits to main branch, last one 14 hours ago
95
1.5k
apache-2.0
29
Kernl lets you run PyTorch transformer models several times faster on GPU with a single line of code, and is designed to be easily hackable.
Created 2022-08-05
122 commits to main branch, last one about a year ago
152
1.4k
gpl-3.0
13
🎉 Modern CUDA Learn Notes with PyTorch: CUDA Cores, Tensor Cores, fp32/tf32, fp16/bf16, fp8/int8, flash_attn, rope, sgemm, hgemm, sgemv, warp/block reduce, elementwise, softmax, layernorm, rmsnorm.
Created 2022-12-17
299 commits to main branch, last one 2 days ago
A service for autodiscovery and configuration of applications running in containers
Created 2015-10-22
697 commits to master branch, last one 3 years ago
Playing with the Tigress software protection. Break some of its protections and solve their reverse engineering challenges. Automatic deobfuscation using symbolic execution, taint analysis and LLVM.
Created 2016-10-28
39 commits to master branch, last one 11 months ago
A subset of PyTorch's neural network modules, written in Python using OpenAI's Triton.
Created 2023-07-13
158 commits to main branch, last one 12 days ago
Linux kernel module to support Turbo mode and RGB Keyboard for Acer Predator notebook series
Created 2021-05-13
176 commits to main branch, last one 10 days ago
38
324
apache-2.0
19
FlagGems is an operator library for large language models implemented in Triton Language.
Created 2024-03-21
311 commits to master branch, last one a day ago
7
224
apache-2.0
3
OpenDILab RL HPC OP Lib, including CUDA and Triton kernel
Created 2021-07-05
10 commits to main branch, last one 4 months ago
18
190
gpl-3.0
5
LLVM based static binary analysis framework
Created 2022-03-12
146 commits to master branch, last one about a month ago
11
179
apache-2.0
4
A performance library for machine learning applications.
This repository has been archived (exclude archived)
Created 2023-04-30
234 commits to main branch, last one about a year ago
10
158
gpl-3.0
5
Ozoz dotfiles for bspwm, i3WM
Created 2022-04-16
261 commits to master branch, last one 3 months ago
40
138
apache-2.0
11
ClearML - Model-Serving Orchestration and Repository Solution
Created 2021-04-12
140 commits to main branch, last one 4 months ago
28
138
bsd-2-clause
8
(WIP)The deployment framework aims to provide a simple, lightweight, fast integrated, pipelined deployment framework for algorithm service that ensures reliability, high concurrency and scalability of...
Created 2021-03-07
408 commits to master branch, last one 3 years ago
NVIDIA-accelerated, deep learned model support for image space object detection
Created 2022-03-22
37 commits to main branch, last one about a month ago
NVIDIA-accelerated DNN model inference ROS 2 packages using NVIDIA Triton/TensorRT for both Jetson and x86_64 with CUDA-capable GPU
Created 2021-10-13
40 commits to main branch, last one about a month ago
Triton implementation of FlashAttention2 that adds Custom Masks.
Created 2024-07-20
18 commits to main branch, last one 2 months ago
9
63
mit
12
Triton Operating System
Created 2015-06-02
94,340 commits to master branch, last one 4 years ago
4
58
apache-2.0
7
Binary Ninja plugin that can be used to apply Triton's dead store eliminitation pass on basic blocks or functions.
Created 2022-06-06
21 commits to main branch, last one 3 months ago
Three examples of recommendation system pipelines with NVIDIA Merlin and Redis
Created 2022-11-21
10 commits to master branch, last one about a year ago
A step-by-step guide to setting up Nvidia GPUs with CUDA support running on Docker (and Compose) containers on NixOS host
Created 2021-08-07
18 commits to main branch, last one 3 months ago
⚡ Blazing fast audio augmentation in Python, powered by GPU for high-efficiency processing in machine learning and audio analysis tasks.
Created 2024-01-19
2 commits to main branch, last one 9 months ago