13 results found Sort:

583
3.6k
gpl-3.0
35
A Golang implemented Redis Server and Cluster. Go 语言实现的 Redis 服务器和分布式集群
Created 2019-06-01
280 commits to master branch, last one 4 days ago
Unified KV Cache Compression Methods for Auto-Regressive Models
Created 2024-06-05
123 commits to main branch, last one 2 months ago
68
683
unknown
7
LLM notes, including model inference, transformer model structure, and llm framework code analysis notes.
Created 2024-09-18
295 commits to main branch, last one 4 hours ago
Achieve the llama3 inference step-by-step, grasp the core concepts, master the process derivation, implement the code.
Created 2025-02-19
9 commits to main branch, last one about a month ago
31
446
apache-2.0
13
LLM KV cache compression made easy
Created 2024-11-06
33 commits to main branch, last one 14 days ago
54
433
unknown
5
[NeurIPS'23] H2O: Heavy-Hitter Oracle for Efficient Generative Inference of Large Language Models.
Created 2023-06-12
41 commits to main branch, last one 9 months ago
Awesome-LLM-KV-Cache: A curated list of 📙Awesome LLM KV Cache Papers with Codes.
Created 2024-07-24
19 commits to main branch, last one about a month ago
Block Transformer: Global-to-Local Language Modeling for Fast Inference (NeurIPS 2024)
Created 2024-05-29
33 commits to main branch, last one 3 months ago
HierarchicalKV is a part of NVIDIA Merlin and provides hierarchical key-value storage to meet RecSys requirements. The key capability of HierarchicalKV is to store key-value feature-embeddings on hig...
Created 2022-06-15
206 commits to master branch, last one 8 days ago
3
76
apache-2.0
1
Completion After Prompt Probability. Make your LLM make a choice
Created 2023-02-22
448 commits to main branch, last one 5 months ago
9
64
unknown
4
This repository contains an implementation of the LLaMA 2 (Large Language Model Meta AI) model, a Generative Pretrained Transformer (GPT) variant. The implementation focuses on the model architecture ...
Created 2023-10-01
5 commits to main branch, last one about a year ago
4
60
unknown
2
Easy control for Key-Value Constrained Generative LLM Inference(https://arxiv.org/abs/2402.06262)
Created 2024-01-14
54 commits to main branch, last one about a year ago
Notes about LLaMA 2 model
Created 2023-08-21
4 commits to main branch, last one about a year ago