11 results found Sort:
- Filter by Primary Language:
- Python (9)
- Go (1)
- +
A Golang implemented Redis Server and Cluster. Go 语言实现的 Redis 服务器和分布式集群
Created
2019-06-01
264 commits to master branch, last one 17 days ago
Unified KV Cache Compression Methods for Auto-Regressive Models
Created
2024-06-05
123 commits to main branch, last one 26 days ago
LLM notes, including model inference, transformer model structure, and llm framework code analysis notes.
Created
2024-09-18
212 commits to main branch, last one 5 days ago
[NeurIPS'23] H2O: Heavy-Hitter Oracle for Efficient Generative Inference of Large Language Models.
Created
2023-06-12
41 commits to main branch, last one 7 months ago
LLM KV cache compression made easy
Created
2024-11-06
23 commits to main branch, last one 9 days ago
Awesome-LLM-KV-Cache: A curated list of 📙Awesome LLM KV Cache Papers with Codes.
Created
2024-07-24
17 commits to main branch, last one about a month ago
Block Transformer: Global-to-Local Language Modeling for Fast Inference (NeurIPS 2024)
Created
2024-05-29
33 commits to main branch, last one about a month ago
Completion After Prompt Probability. Make your LLM make a choice
Created
2023-02-22
448 commits to main branch, last one 2 months ago
Easy control for Key-Value Constrained Generative LLM Inference(https://arxiv.org/abs/2402.06262)
Created
2024-01-14
54 commits to main branch, last one 11 months ago
This repository contains an implementation of the LLaMA 2 (Large Language Model Meta AI) model, a Generative Pretrained Transformer (GPT) variant. The implementation focuses on the model architecture ...
Created
2023-10-01
5 commits to main branch, last one about a year ago
Notes about LLaMA 2 model
Created
2023-08-21
4 commits to main branch, last one about a year ago