29 results found Sort:
- Filter by Primary Language:
- Python (13)
- C++ (4)
- Dockerfile (1)
- Go (1)
- Jupyter Notebook (1)
- LLVM (1)
- Nix (1)
- PureBasic (1)
- Shell (1)
- C (1)
- TypeScript (1)
- Cuda (1)
- +
Efficient Triton Kernels for LLM Training
Created
2024-08-06
394 commits to main branch, last one 2 days ago
Kernl lets you run PyTorch transformer models several times faster on GPU with a single line of code, and is designed to be easily hackable.
Created
2022-08-05
122 commits to main branch, last one about a year ago
A service for autodiscovery and configuration of applications running in containers
Created
2015-10-22
697 commits to master branch, last one 4 years ago
Quantized Attention that achieves speedups of 2.1-3.1x and 2.7-5.1x compared to FlashAttention2 and xformers, respectively, without lossing end-to-end metrics across various models.
Created
2024-10-03
81 commits to main branch, last one a day ago
Playing with the Tigress software protection. Break some of its protections and solve their reverse engineering challenges. Automatic deobfuscation using symbolic execution, taint analysis and LLVM.
Created
2016-10-28
39 commits to master branch, last one about a year ago
πππA collection of some wesome public projects about Large Language Model(LLM), Vision Language Model(VLM), Vision Language Action(VLA), AI Generated Content(AIGC), the related Datasets and Applica...
Created
2023-02-15
154 commits to main branch, last one 3 days ago
A subset of PyTorch's neural network modules, written in Python using OpenAI's Triton.
Created
2023-07-13
162 commits to main branch, last one 2 days ago
FlagGems is an operator library for large language models implemented in Triton Language.
Created
2024-03-21
407 commits to master branch, last one 2 days ago
Linux kernel module to support Turbo mode and RGB Keyboard for Acer Predator notebook series
Created
2021-05-13
202 commits to main branch, last one 2 months ago
A curated list of resources for learning and exploring Triton, OpenAI's programming language for writing efficient GPU code.
Created
2024-12-07
81 commits to main branch, last one a day ago
OpenDILab RL HPC OP Lib, including CUDA and Triton kernel
Created
2021-07-05
10 commits to main branch, last one 7 months ago
LLVM based static binary analysis framework
Created
2022-03-12
146 commits to master branch, last one 4 months ago
π₯π₯π₯ A collection of some awesome public CUDA, cuBLAS, cuDNN, CUTLASS, TensorRT, TensorRT-LLM, Triton, TVM, MLIR and High Performance Computing (HPC) projects.
Created
2023-02-23
28 commits to main branch, last one 3 days ago
A performance library for machine learning applications.
This repository has been archived
(exclude archived)
Created
2023-04-30
234 commits to main branch, last one about a year ago
Ozoz dotfiles for bspwm, i3WM
Created
2022-04-16
261 commits to master branch, last one 7 months ago
ClearML - Model-Serving Orchestration and Repository Solution
Created
2021-04-12
143 commits to main branch, last one about a month ago
NVIDIA-accelerated, deep learned model support for image space object detection
Created
2022-03-22
41 commits to main branch, last one 3 days ago
(WIP)The deployment framework aims to provide a simple, lightweight, fast integrated, pipelined deployment framework for algorithm service that ensures reliability, high concurrency and scalability of...
Created
2021-03-07
408 commits to master branch, last one 3 years ago
NVIDIA-accelerated DNN model inference ROS 2 packages using NVIDIA Triton/TensorRT for both Jetson and x86_64 with CUDA-capable GPU
Created
2021-10-13
44 commits to main branch, last one 3 days ago
Deploy DL/ ML inference pipelines with minimal extra code.
Created
2020-04-09
503 commits to master branch, last one 2 months ago
Triton implementation of FlashAttention2 that adds Custom Masks.
Created
2024-07-20
18 commits to main branch, last one 6 months ago
Training-free Post-training Efficient Sub-quadratic Complexity Attention. Implemented with OpenAI Triton.
Created
2024-02-29
1,146 commits to deepauto/dev branch, last one 3 days ago
Triton Operating System
Created
2015-06-02
94,340 commits to master branch, last one 4 years ago
Binary Ninja plugin that can be used to apply Triton's dead store eliminitation pass on basic blocks or functions.
Created
2022-06-06
21 commits to main branch, last one 7 months ago
Three examples of recommendation system pipelines with NVIDIA Merlin and Redis
Created
2022-11-21
10 commits to master branch, last one 2 years ago
Triton Documentation in Chinese Simplified / Triton δΈζζζ‘£
Created
2024-09-19
57 commits to master branch, last one 2 months ago
A step-by-step guide to setting up Nvidia GPUs with CUDA support running on Docker (and Compose) containers on NixOS host
Created
2021-08-07
18 commits to main branch, last one 7 months ago
β‘ Blazing fast audio augmentation in Python, powered by GPU for high-efficiency processing in machine learning and audio analysis tasks.
Created
2024-01-19
2 commits to main branch, last one about a year ago
Transformers components but in Triton
Created
2024-10-14
254 commits to main branch, last one 3 months ago