4 results found Sort:
- Filter by Primary Language:
- C++ (1)
- Jupyter Notebook (1)
- Python (1)
- Shell (1)
- +
A high-throughput and memory-efficient inference and serving engine for LLMs
Created
2023-02-09
5,531 commits to main branch, last one 17 hours ago
Large-scale LLM inference engine
Created
2023-06-23
1,253 commits to main branch, last one 3 days ago
Foundation model benchmarking tool. Run any model on any AWS platform and benchmark for performance across instance type and serving stack options.
Created
2024-01-09
1,491 commits to main branch, last one 26 days ago
This Guidance demonstrates how to deploy a machine learning inference architecture on Amazon Elastic Kubernetes Service (Amazon EKS). It addresses the basic implementation requirements as well as ways...
Created
2021-08-11
108 commits to main branch, last one 5 months ago