Search Results - RepositoryStats

2 results found Sort:

427

8.2k

mit

High-speed Large Language Model Serving for Local Deployment

llm llama llm-inference local-inference large-language-models

Created 2023-12-15

1,586 commits to main branch, last one about a month ago

206

apache-2.0

[ICLR'25] Fast Inference of MoE Models with CPU-GPU Orchestration

llm mixtral-8x7b llm-inference local-inference mixture-of-experts

Created 2024-02-05

49 commits to main branch, last one 11 months ago