13 results found Sort:
- Filter by Primary Language:
- Python (6)
- Jupyter Notebook (3)
- Rust (2)
- C++ (1)
- +
📚A curated list of Awesome LLM/VLM Inference Papers with codes: WINT8/4, FlashAttention, PagedAttention, MLA, Parallelism etc.
Created
2023-08-27
471 commits to main branch, last one 3 days ago
A nearly-live implementation of OpenAI's Whisper.
Created
2023-05-04
495 commits to main branch, last one 5 days ago
An Optimized Speech-to-Text Pipeline for the Whisper Model Supporting Multiple Inference Engine
Created
2023-12-16
81 commits to main branch, last one 7 months ago
🏋️ A unified multi-backend utility for benchmarking Transformers, Timm, PEFT, Diffusers and Sentence-Transformers with full support of Optimum's hardware optimizations & quantization schemes.
Created
2023-04-26
713 commits to main branch, last one 2 months ago
🚀🚀🚀 This repository lists some awesome public CUDA, cuda-python, cuBLAS, cuDNN, CUTLASS, TensorRT, TensorRT-LLM, Triton, TVM, MLIR, PTX and High Performance Computing (HPC) projects.
Created
2023-02-23
39 commits to main branch, last one a day ago
OpenAI compatible API for TensorRT LLM triton backend
Created
2023-11-06
33 commits to main branch, last one 8 months ago
Deep Learning Deployment Framework: Supports tf/torch/trt/trtllm/vllm and other NN frameworks. Support dynamic batching, and streaming modes. It is dual-language compatible with Python and C++, offeri...
Created
2024-07-04
62 commits to master branch, last one 27 days ago
Higher performance OpenAI LLM service than vLLM serve: A pure C++ high-performance OpenAI LLM service implemented with GPRS+TensorRT-LLM+Tokenizers.cpp, supporting chat and function call, AI agents, d...
Created
2024-08-21
158 commits to master branch, last one 3 days ago
This repository is an AI Bootcamp material that consist of a workflow for LLM
Created
2022-10-31
48 commits to main branch, last one 8 months ago
Chat With RTX Python API
Created
2024-02-23
18 commits to master branch, last one 4 months ago
TensorRT-LLM server with Structured Outputs (JSON) built with Rust
Created
2024-10-04
123 commits to main branch, last one 16 days ago
LLM-Inference-Bench
Created
2024-07-29
85 commits to main branch, last one 3 months ago
Add-in for new Outlook that adds LLM new features (Composition, Summarizing, Q&A). It uses a local LLM via Nvidia TensorRT-LLM
Created
2024-02-21
28 commits to main branch, last one 4 months ago