Search Results - RepositoryStats

275

3.9k

gpl-3.0

122

📚A curated list of Awesome LLM/VLM Inference Papers with codes: WINT8/4, FlashAttention, PagedAttention, MLA, Parallelism etc.

mla vllm deepseek flash-mla minimax-01 awesome-llm deepseek-r1 deepseek-v3 tensorrt-llm llm-inference flash-attention paged-attention flash-attention-3

Created 2023-08-27

471 commits to main branch, last one 3 days ago

WhisperLive collabora

355

2.7k

mit

33

A nearly-live implementation of OpenAI's Whisper.

obs openai whisper openvino tensorrt dictation translation tensorrt-llm openvino-intel text-to-speech whisper-tensorrt voice-recognition

Created 2023-05-04

495 commits to main branch, last one 5 days ago

WhisperS2T shashikg

49

394

mit

15

An Optimized Speech-to-Text Pipeline for the Whisper Model Supporting Multiple Inference Engine

asr vad whisper tensorrt tensorrt-llm deep-learning speech-to-text speech-recognition voice-activity-detection

Created 2023-12-16

81 commits to main branch, last one 7 months ago

optimum-benchmark huggingface

58

295

apache-2.0

4

🏋️ A unified multi-backend utility for benchmarking Transformers, Timm, PEFT, Diffusers and Sentence-Transformers with full support of Optimum's hardware optimizations & quantization schemes.

pytorch openvino benchmark onnxruntime tensorrt-llm neural-compressor text-generation-inference

Created 2023-04-26

713 commits to main branch, last one 2 months ago

awesome-cuda-and-hpc coderonion

29

249

unknown

7

🚀🚀🚀 This repository lists some awesome public CUDA, cuda-python, cuBLAS, cuDNN, CUTLASS, TensorRT, TensorRT-LLM, Triton, TVM, MLIR, PTX and High Performance Computing (HPC) projects.

Created 2023-02-23

39 commits to main branch, last one a day ago

openai_trtllm npuichigo

28

205

mit

8

OpenAI compatible API for TensorRT LLM triton backend

llm langchain openai-api tensorrt-llm triton-inference-server

Created 2023-11-06

33 commits to main branch, last one 8 months ago

grps NetEase-Media

13

157

apache-2.0

9

Deep Learning Deployment Framework: Supports tf/torch/trt/trtllm/vllm and other NN frameworks. Support dynamic batching, and streaming modes. It is dual-language compatible with Python and C++, offeri...

vllm torch serving tensorrt tensorflow tensorrt-llm dynamic-batching triton-inference-server

Created 2024-07-04

62 commits to master branch, last one 27 days ago

grps_trtllm NetEase-Media

8

130

apache-2.0

4

Higher performance OpenAI LLM service than vLLM serve: A pure C++ high-performance OpenAI LLM service implemented with GPRS+TensorRT-LLM+Tokenizers.cpp, supporting chat and function call, AI agents, d...

llm phi qwq qwen2 llama3 olmocr openai chatglm ai-agent internvl qwen2-vl janus-pro minicpm-v deepseek-r1 internvideo llama-index multi-modal tensorrt-llm function-call

Created 2024-08-21

158 commits to master branch, last one 3 days ago