11 results found Sort:

577
3.6k
gpl-3.0
35
A Golang implemented Redis Server and Cluster. Go 语言实现的 Redis 服务器和分布式集群
Created 2019-06-01
264 commits to master branch, last one 17 days ago
Unified KV Cache Compression Methods for Auto-Regressive Models
Created 2024-06-05
123 commits to main branch, last one 26 days ago
38
423
unknown
6
LLM notes, including model inference, transformer model structure, and llm framework code analysis notes.
Created 2024-09-18
212 commits to main branch, last one 5 days ago
51
419
unknown
5
[NeurIPS'23] H2O: Heavy-Hitter Oracle for Efficient Generative Inference of Large Language Models.
Created 2023-06-12
41 commits to main branch, last one 7 months ago
24
359
apache-2.0
10
LLM KV cache compression made easy
Created 2024-11-06
23 commits to main branch, last one 9 days ago
Awesome-LLM-KV-Cache: A curated list of 📙Awesome LLM KV Cache Papers with Codes.
Created 2024-07-24
17 commits to main branch, last one about a month ago
Block Transformer: Global-to-Local Language Modeling for Fast Inference (NeurIPS 2024)
Created 2024-05-29
33 commits to main branch, last one about a month ago
3
73
apache-2.0
1
Completion After Prompt Probability. Make your LLM make a choice
Created 2023-02-22
448 commits to main branch, last one 2 months ago
4
60
unknown
3
Easy control for Key-Value Constrained Generative LLM Inference(https://arxiv.org/abs/2402.06262)
Created 2024-01-14
54 commits to main branch, last one 11 months ago
9
59
unknown
3
This repository contains an implementation of the LLaMA 2 (Large Language Model Meta AI) model, a Generative Pretrained Transformer (GPT) variant. The implementation focuses on the model architecture ...
Created 2023-10-01
5 commits to main branch, last one about a year ago
Notes about LLaMA 2 model
Created 2023-08-21
4 commits to main branch, last one about a year ago