2 results found Sort:

High-speed Large Language Model Serving for Local Deployment
Created 2023-12-15
1,586 commits to main branch, last one 5 days ago
17
194
apache-2.0
8
[ICLR'25] Fast Inference of MoE Models with CPU-GPU Orchestration
Created 2024-02-05
49 commits to main branch, last one 10 months ago