Search Results - RepositoryStats

1 result found Sort:

678

apache-2.0

FP16xINT4 LLM inference kernel that can achieve near-ideal ~4x speedups up to medium batchsizes of 16-32 tokens.

llm 4bit kernel quantization

Created 2024-01-17

14 commits to master branch, last one 4 months ago