24 results found Sort:

206
3.5k
bsd-2-clause
40
Efficient Triton Kernels for LLM Training
Created 2024-08-06
299 commits to main branch, last one a day ago
94
1.5k
apache-2.0
29
Kernl lets you run PyTorch transformer models several times faster on GPU with a single line of code, and is designed to be easily hackable.
Created 2022-08-05
122 commits to main branch, last one about a year ago
162
1.5k
gpl-3.0
13
📚Modern CUDA Learn Notes with PyTorch: Tensor/CUDA Cores, 📖150+ CUDA Kernels with PyTorch bindings, 📖HGEMM/SGEMM (95%~99% cuBLAS performance), 📖100+ LLM/CUDA Blogs.
Created 2022-12-17
360 commits to main branch, last one a day ago
A service for autodiscovery and configuration of applications running in containers
Created 2015-10-22
697 commits to master branch, last one 3 years ago
Playing with the Tigress software protection. Break some of its protections and solve their reverse engineering challenges. Automatic deobfuscation using symbolic execution, taint analysis and LLVM.
Created 2016-10-28
39 commits to master branch, last one about a year ago
A subset of PyTorch's neural network modules, written in Python using OpenAI's Triton.
Created 2023-07-13
158 commits to main branch, last one 26 days ago
Linux kernel module to support Turbo mode and RGB Keyboard for Acer Predator notebook series
Created 2021-05-13
177 commits to main branch, last one 6 days ago
46
343
apache-2.0
19
FlagGems is an operator library for large language models implemented in Triton Language.
Created 2024-03-21
319 commits to master branch, last one 9 hours ago
7
224
apache-2.0
3
OpenDILab RL HPC OP Lib, including CUDA and Triton kernel
Created 2021-07-05
10 commits to main branch, last one 4 months ago
19
192
gpl-3.0
5
LLVM based static binary analysis framework
Created 2022-03-12
146 commits to master branch, last one about a month ago
11
180
apache-2.0
4
A performance library for machine learning applications.
This repository has been archived (exclude archived)
Created 2023-04-30
234 commits to main branch, last one about a year ago
10
157
gpl-3.0
5
Ozoz dotfiles for bspwm, i3WM
Created 2022-04-16
261 commits to master branch, last one 4 months ago
28
137
bsd-2-clause
8
(WIP)The deployment framework aims to provide a simple, lightweight, fast integrated, pipelined deployment framework for algorithm service that ensures reliability, high concurrency and scalability of...
Created 2021-03-07
408 commits to master branch, last one 3 years ago
40
137
apache-2.0
11
ClearML - Model-Serving Orchestration and Repository Solution
Created 2021-04-12
140 commits to main branch, last one 4 months ago
NVIDIA-accelerated, deep learned model support for image space object detection
Created 2022-03-22
37 commits to main branch, last one about a month ago
NVIDIA-accelerated DNN model inference ROS 2 packages using NVIDIA Triton/TensorRT for both Jetson and x86_64 with CUDA-capable GPU
Created 2021-10-13
40 commits to main branch, last one about a month ago
Triton implementation of FlashAttention2 that adds Custom Masks.
Created 2024-07-20
18 commits to main branch, last one 3 months ago
9
63
mit
12
Triton Operating System
Created 2015-06-02
94,340 commits to master branch, last one 4 years ago
4
58
apache-2.0
7
Binary Ninja plugin that can be used to apply Triton's dead store eliminitation pass on basic blocks or functions.
Created 2022-06-06
21 commits to main branch, last one 4 months ago
Three examples of recommendation system pipelines with NVIDIA Merlin and Redis
Created 2022-11-21
10 commits to master branch, last one about a year ago
A step-by-step guide to setting up Nvidia GPUs with CUDA support running on Docker (and Compose) containers on NixOS host
Created 2021-08-07
18 commits to main branch, last one 4 months ago
⚡ Blazing fast audio augmentation in Python, powered by GPU for high-efficiency processing in machine learning and audio analysis tasks.
Created 2024-01-19
2 commits to main branch, last one 10 months ago
Transformers components but in Triton
Created 2024-10-14
254 commits to main branch, last one 2 days ago