Search Results - RepositoryStats

2 results found Sort:

579

apache-2.0

⚡ boost inference speed of T5 models by 5x & reduce the model size by 3x.

t5 nlp fast onnx fastt5 python pytorch inference onnxruntime transformer translation quantization deep-learning inference-speed question-answering quantized-onnx-models

Created 2021-03-11

38 commits to master branch, last one 3 years ago

unknown

SepLLM: Accelerate Large Language Models by Compressing One Segment into One Separator

llms inference-speed large-language-models

Created 2024-12-11

65 commits to main branch, last one 3 months ago