29 results found Sort:

267
4.4k
bsd-2-clause
46
Efficient Triton Kernels for LLM Training
Created 2024-08-06
394 commits to main branch, last one 2 days ago
96
1.6k
apache-2.0
28
Kernl lets you run PyTorch transformer models several times faster on GPU with a single line of code, and is designed to be easily hackable.
Created 2022-08-05
122 commits to main branch, last one about a year ago
A service for autodiscovery and configuration of applications running in containers
Created 2015-10-22
697 commits to master branch, last one 4 years ago
58
952
apache-2.0
24
Quantized Attention that achieves speedups of 2.1-3.1x and 2.7-5.1x compared to FlashAttention2 and xformers, respectively, without lossing end-to-end metrics across various models.
Created 2024-10-03
81 commits to main branch, last one a day ago
Playing with the Tigress software protection. Break some of its protections and solve their reverse engineering challenges. Automatic deobfuscation using symbolic execution, taint analysis and LLVM.
Created 2016-10-28
39 commits to master branch, last one about a year ago
πŸš€πŸš€πŸš€A collection of some wesome public projects about Large Language Model(LLM), Vision Language Model(VLM), Vision Language Action(VLA), AI Generated Content(AIGC), the related Datasets and Applica...
Created 2023-02-15
154 commits to main branch, last one 3 days ago
A subset of PyTorch's neural network modules, written in Python using OpenAI's Triton.
Created 2023-07-13
162 commits to main branch, last one 2 days ago
64
420
apache-2.0
19
FlagGems is an operator library for large language models implemented in Triton Language.
Created 2024-03-21
407 commits to master branch, last one 2 days ago
Linux kernel module to support Turbo mode and RGB Keyboard for Acer Predator notebook series
Created 2021-05-13
202 commits to main branch, last one 2 months ago
A curated list of resources for learning and exploring Triton, OpenAI's programming language for writing efficient GPU code.
Created 2024-12-07
81 commits to main branch, last one a day ago
7
225
apache-2.0
4
OpenDILab RL HPC OP Lib, including CUDA and Triton kernel
Created 2021-07-05
10 commits to main branch, last one 7 months ago
19
211
gpl-3.0
5
LLVM based static binary analysis framework
Created 2022-03-12
146 commits to master branch, last one 4 months ago
πŸ”₯πŸ”₯πŸ”₯ A collection of some awesome public CUDA, cuBLAS, cuDNN, CUTLASS, TensorRT, TensorRT-LLM, Triton, TVM, MLIR and High Performance Computing (HPC) projects.
Created 2023-02-23
28 commits to main branch, last one 3 days ago
12
183
apache-2.0
3
A performance library for machine learning applications.
This repository has been archived (exclude archived)
Created 2023-04-30
234 commits to main branch, last one about a year ago
10
164
gpl-3.0
5
Ozoz dotfiles for bspwm, i3WM
Created 2022-04-16
261 commits to master branch, last one 7 months ago
40
143
apache-2.0
11
ClearML - Model-Serving Orchestration and Repository Solution
Created 2021-04-12
143 commits to main branch, last one about a month ago
NVIDIA-accelerated, deep learned model support for image space object detection
Created 2022-03-22
41 commits to main branch, last one 3 days ago
28
137
bsd-2-clause
8
(WIP)The deployment framework aims to provide a simple, lightweight, fast integrated, pipelined deployment framework for algorithm service that ensures reliability, high concurrency and scalability of...
Created 2021-03-07
408 commits to master branch, last one 3 years ago
NVIDIA-accelerated DNN model inference ROS 2 packages using NVIDIA Triton/TensorRT for both Jetson and x86_64 with CUDA-capable GPU
Created 2021-10-13
44 commits to main branch, last one 3 days ago
Triton implementation of FlashAttention2 that adds Custom Masks.
Created 2024-07-20
18 commits to main branch, last one 6 months ago
Training-free Post-training Efficient Sub-quadratic Complexity Attention. Implemented with OpenAI Triton.
Created 2024-02-29
1,146 commits to deepauto/dev branch, last one 3 days ago
9
66
mit
13
Triton Operating System
Created 2015-06-02
94,340 commits to master branch, last one 4 years ago
4
58
apache-2.0
7
Binary Ninja plugin that can be used to apply Triton's dead store eliminitation pass on basic blocks or functions.
Created 2022-06-06
21 commits to main branch, last one 7 months ago
Three examples of recommendation system pipelines with NVIDIA Merlin and Redis
Created 2022-11-21
10 commits to master branch, last one 2 years ago
Triton Documentation in Chinese Simplified / Triton δΈ­ζ–‡ζ–‡ζ‘£
Created 2024-09-19
57 commits to master branch, last one 2 months ago
A step-by-step guide to setting up Nvidia GPUs with CUDA support running on Docker (and Compose) containers on NixOS host
Created 2021-08-07
18 commits to main branch, last one 7 months ago
⚑ Blazing fast audio augmentation in Python, powered by GPU for high-efficiency processing in machine learning and audio analysis tasks.
Created 2024-01-19
2 commits to main branch, last one about a year ago
Transformers components but in Triton
Created 2024-10-14
254 commits to main branch, last one 3 months ago