2 results found Sort:
High-speed Large Language Model Serving on PCs with Consumer-grade GPUs
Created
2023-12-15
1,584 commits to main branch, last one 2 months ago
Fast Inference of MoE Models with CPU-GPU Orchestration
Created
2024-02-05
49 commits to main branch, last one 6 months ago