4 results found Sort:

📖A curated list of Awesome LLM Inference Paper with codes, TensorRT-LLM, vLLM, streaming-llm, AWQ, SmoothQuant, WINT8/4, Continuous Batching, FlashAttention, PagedAttention etc.
Created 2023-08-27
279 commits to main branch, last one 2 days ago
Inferflow is an efficient and highly configurable inference engine for large language models (LLMs).
Created 2023-12-26
116 commits to main branch, last one 2 months ago
🚀 DeepSeek-V2大模型逆向API白嫖测试【特长:GPT4平替】,支持高速流式输出、多轮对话,零配置部署,多路token支持。
Created 2024-05-07
18 commits to master branch, last one 24 days ago
13
128
apache-2.0
6
RAG-GPT, leveraging LLM and RAG technology, learns from user-customized knowledge bases to provide contextually relevant answers for a wide range of queries, ensuring rapid and accurate information re...
Created 2024-04-09
162 commits to main branch, last one 3 days ago