Search Results - RepositoryStats

godis HDT3213

583

3.6k

gpl-3.0

35

A Golang implemented Redis Server and Cluster. Go 语言实现的 Redis 服务器和分布式集群

go godis redis golang cluster kv-cache redis-server redis-cluster

Created 2019-06-01

280 commits to master branch, last one 4 days ago

KVCache-Factory Zefan-Cai

126

974

mit

80

Unified KV Cache Compression Methods for Auto-Regressive Models

llm kv-cache kv-cache-compression

Created 2024-06-05

123 commits to main branch, last one 2 months ago

llm_note harleyszhang

68

683

unknown

7

LLM notes, including model inference, transformer model structure, and llm framework code analysis notes.

llm vllm kv-cache llm-inference triton-kernels cuda-programming transformer-models

Created 2024-09-18

295 commits to main branch, last one 4 hours ago

Deepdive-llama3-from-scratch therealoliver

42

562

mit

4

Achieve the llama3 inference step-by-step, grasp the core concepts, master the process derivation, implement the code.

Created 2025-02-19

9 commits to main branch, last one about a month ago

kvpress NVIDIA

31

446

apache-2.0

13

LLM KV cache compression made easy

llm python pytorch kv-cache inference long-context transformers kv-cache-compression large-language-models

Created 2024-11-06

33 commits to main branch, last one 14 days ago

H2O FMInference

54

433

unknown

5

[NeurIPS'23] H2O: Heavy-Hitter Oracle for Efficient Generative Inference of Large Language Models.

gpt-3 kv-cache sparsity heavy-hitters high-throughput large-language-models

Created 2023-06-12

41 commits to main branch, last one 9 months ago

Awesome-LLM-KV-Cache Zefan-Cai

14

257

gpl-3.0

6

Awesome-LLM-KV-Cache: A curated list of 📙Awesome LLM KV Cache Papers with Codes.

llm kv-cache kv-cache-compression kv-cache-quantization

Created 2024-07-24

19 commits to main branch, last one about a month ago

block-transformer itsnamgyu

7

150

mit

5

Block Transformer: Global-to-Local Language Modeling for Fast Inference (NeurIPS 2024)

llm kv-cache llm-inference llm-architecture kv-cache-compression

Created 2024-05-29

33 commits to main branch, last one 3 months ago

HierarchicalKV NVIDIA-Merlin

27

141

apache-2.0

19

HierarchicalKV is a part of NVIDIA Merlin and provides hierarchical key-value storage to meet RecSys requirements. The key capability of HierarchicalKV is to store key-value feature-embeddings on hig...

gpu cuda kv-cache hashtable key-value-store dynamic-embedding embedding-storage recommender-system

Created 2022-06-15

206 commits to master branch, last one 8 days ago

cappr kddubey

3

76

apache-2.0

1

Completion After Prompt Probability. Make your LLM make a choice

kv-cache llamacpp zero-shot huggingface probability llm-inference prompt-engineering text-classification

Created 2023-02-22

448 commits to main branch, last one 5 months ago

LLaMA2 aju22

9

64

unknown

4

This repository contains an implementation of the LLaMA 2 (Large Language Model Meta AI) model, a Generative Pretrained Transformer (GPT) variant. The implementation focuses on the model architecture ...

gpt llm rope llama llama2 kv-cache rms-norm attention transformer natural-language-processing

Created 2023-10-01

5 commits to main branch, last one about a year ago

EasyKV DRSY

4

60

unknown

2

Easy control for Key-Value Constrained Generative LLM Inference(https://arxiv.org/abs/2402.06262)

llm kv-cache cache-eviction cache-management

Created 2024-01-14

54 commits to main branch, last one about a year ago

pytorch-llama-notes hkproj

6

58

unknown

7

Notes about LLaMA 2 model

llama2 rmsprop kv-cache study-notes rotary-position-encoding attention-is-all-you-need

Created 2023-08-21

4 commits to main branch, last one about a year ago