11 results found Sort:
- Filter by Primary Language:
- Python (4)
- C++ (3)
- Jupyter Notebook (2)
- Rust (1)
- +
📖A curated list of Awesome LLM Inference Paper with codes, TensorRT-LLM, vLLM, streaming-llm, AWQ, SmoothQuant, WINT8/4, Continuous Batching, FlashAttention, PagedAttention etc.
Created
2023-08-27
414 commits to main branch, last one 3 days ago
Local AI API Platform
Created
2023-09-11
1,861 commits to dev branch, last one a day ago
A nearly-live implementation of OpenAI's Whisper.
Created
2023-05-04
450 commits to main branch, last one a day ago
An Optimized Speech-to-Text Pipeline for the Whisper Model Supporting Multiple Inference Engine
Created
2023-12-16
81 commits to main branch, last one 2 months ago
🏋️ A unified multi-backend utility for benchmarking Transformers, Timm, PEFT, Diffusers and Sentence-Transformers with full support of Optimum's hardware optimizations & quantization schemes.
Created
2023-04-26
692 commits to main branch, last one about a month ago
OpenAI compatible API for TensorRT LLM triton backend
Created
2023-11-06
33 commits to main branch, last one 3 months ago
【深度学习模型部署框架】支持tf/torch/trt/trtllm/vllm以及更多nn框架,支持dynamic batching、streaming模式,支持python/c++双语言,可限制,可拓展,高性能。帮助用户快速地将模型部署到线上,并通过http/rpc接口方式提供服务。
Created
2024-07-04
58 commits to master branch, last one 7 days ago
【grps接入trtllm】通过GPRS+TensorRT-LLM+Tokenizers.cpp实现纯C++版高性能OpenAI LLM服务,支持chat和function call模式,支持ai agent,支持分布式多卡推理,支持多模态,支持gradio聊天界面。
Created
2024-08-21
101 commits to master branch, last one 18 days ago
This repository is an AI Bootcamp material that consist of a workflow for LLM
Created
2022-10-31
48 commits to main branch, last one 3 months ago
Chat With RTX Python API
Created
2024-02-23
16 commits to master branch, last one 2 months ago
Add-in for new Outlook that adds LLM new features (Composition, Summarizing, Q&A). It uses a local LLM via Nvidia TensorRT-LLM
Created
2024-02-21
26 commits to main branch, last one about a month ago