23 results found Sort:
- Filter by Primary Language:
- Python (10)
- C++ (4)
- Dockerfile (1)
- Go (1)
- Jupyter Notebook (1)
- LLVM (1)
- Nix (1)
- PureBasic (1)
- C (1)
- Shell (1)
- Cuda (1)
- +
Efficient Triton Kernels for LLM Training
Created
2024-08-06
269 commits to main branch, last one 14 hours ago
Kernl lets you run PyTorch transformer models several times faster on GPU with a single line of code, and is designed to be easily hackable.
Created
2022-08-05
122 commits to main branch, last one about a year ago
🎉 Modern CUDA Learn Notes with PyTorch: CUDA Cores, Tensor Cores, fp32/tf32, fp16/bf16, fp8/int8, flash_attn, rope, sgemm, hgemm, sgemv, warp/block reduce, elementwise, softmax, layernorm, rmsnorm.
Created
2022-12-17
299 commits to main branch, last one 2 days ago
A service for autodiscovery and configuration of applications running in containers
Created
2015-10-22
697 commits to master branch, last one 3 years ago
Playing with the Tigress software protection. Break some of its protections and solve their reverse engineering challenges. Automatic deobfuscation using symbolic execution, taint analysis and LLVM.
Created
2016-10-28
39 commits to master branch, last one 11 months ago
A subset of PyTorch's neural network modules, written in Python using OpenAI's Triton.
Created
2023-07-13
158 commits to main branch, last one 12 days ago
Linux kernel module to support Turbo mode and RGB Keyboard for Acer Predator notebook series
Created
2021-05-13
176 commits to main branch, last one 10 days ago
FlagGems is an operator library for large language models implemented in Triton Language.
Created
2024-03-21
311 commits to master branch, last one a day ago
OpenDILab RL HPC OP Lib, including CUDA and Triton kernel
Created
2021-07-05
10 commits to main branch, last one 4 months ago
LLVM based static binary analysis framework
Created
2022-03-12
146 commits to master branch, last one about a month ago
A performance library for machine learning applications.
This repository has been archived
(exclude archived)
Created
2023-04-30
234 commits to main branch, last one about a year ago
Ozoz dotfiles for bspwm, i3WM
Created
2022-04-16
261 commits to master branch, last one 3 months ago
ClearML - Model-Serving Orchestration and Repository Solution
Created
2021-04-12
140 commits to main branch, last one 4 months ago
(WIP)The deployment framework aims to provide a simple, lightweight, fast integrated, pipelined deployment framework for algorithm service that ensures reliability, high concurrency and scalability of...
Created
2021-03-07
408 commits to master branch, last one 3 years ago
NVIDIA-accelerated, deep learned model support for image space object detection
Created
2022-03-22
37 commits to main branch, last one about a month ago
NVIDIA-accelerated DNN model inference ROS 2 packages using NVIDIA Triton/TensorRT for both Jetson and x86_64 with CUDA-capable GPU
Created
2021-10-13
40 commits to main branch, last one about a month ago
Deploy DL/ ML inference pipelines with minimal extra code.
Created
2020-04-09
493 commits to master branch, last one 3 days ago
Triton implementation of FlashAttention2 that adds Custom Masks.
Created
2024-07-20
18 commits to main branch, last one 2 months ago
Triton Operating System
Created
2015-06-02
94,340 commits to master branch, last one 4 years ago
Binary Ninja plugin that can be used to apply Triton's dead store eliminitation pass on basic blocks or functions.
Created
2022-06-06
21 commits to main branch, last one 3 months ago
Three examples of recommendation system pipelines with NVIDIA Merlin and Redis
Created
2022-11-21
10 commits to master branch, last one about a year ago
A step-by-step guide to setting up Nvidia GPUs with CUDA support running on Docker (and Compose) containers on NixOS host
Created
2021-08-07
18 commits to main branch, last one 3 months ago
⚡ Blazing fast audio augmentation in Python, powered by GPU for high-efficiency processing in machine learning and audio analysis tasks.
Created
2024-01-19
2 commits to main branch, last one 9 months ago