7 results found Sort:

995
12.3k
apache-2.0
96
The official repo of Qwen (通义千问) chat & pretrained large language model proposed by Alibaba Cloud.
Created 2023-08-03
487 commits to main branch, last one 6 days ago
569
7.0k
apache-2.0
75
中文LLaMA-2 & Alpaca-2大模型二期项目 + 64K超长上下文模型 (Chinese LLaMA-2 & Alpaca-2 LLMs with 64K long context models)
Created 2023-07-18
263 commits to main branch, last one about a month ago
398
5.5k
apache-2.0
49
Official release of InternLM2 7B and 20B base and chat models. 200K context support
Created 2023-07-06
220 commits to main branch, last one 8 days ago
📖A curated list of Awesome LLM Inference Paper with codes, TensorRT-LLM, vLLM, streaming-llm, AWQ, SmoothQuant, WINT8/4, Continuous Batching, FlashAttention, PagedAttention etc.
Created 2023-08-27
292 commits to main branch, last one 2 days ago
🎉CUDA 笔记 / 大模型手撕CUDA / C++笔记,更新随缘: flash_attn、sgemm、sgemv、warp reduce、block reduce、dot product、elementwise、softmax、layernorm、rmsnorm、hist etc.
Created 2022-12-17
108 commits to main branch, last one 3 days ago
59
742
apache-2.0
12
FlashInfer: Kernel Library for LLM Serving
Created 2023-07-22
601 commits to main branch, last one 2 days ago
8
88
apache-2.0
1
Train llm (bloom, llama, baichuan2-7b, chatglm3-6b) with deepspeed pipeline mode. Faster than zero/zero++/fsdp.
Created 2023-06-24
27 commits to master branch, last one 4 months ago