11 results found Sort:

1.2k
14.2k
apache-2.0
104
The official repo of Qwen (通义千问) chat & pretrained large language model proposed by Alibaba Cloud.
Created 2023-08-03
497 commits to main branch, last one 12 days ago
576
7.1k
apache-2.0
79
中文LLaMA-2 & Alpaca-2大模型二期项目 + 64K超长上下文模型 (Chinese LLaMA-2 & Alpaca-2 LLMs with 64K long context models)
Created 2023-07-18
264 commits to main branch, last one about a month ago
458
6.5k
apache-2.0
58
Official release of InternLM2.5 base and chat models. 1M context support
Created 2023-07-06
234 commits to main branch, last one 3 days ago
📖A curated list of Awesome LLM Inference Paper with codes, TensorRT-LLM, vLLM, streaming-llm, AWQ, SmoothQuant, WINT8/4, Continuous Batching, FlashAttention, PagedAttention etc.
Created 2023-08-27
414 commits to main branch, last one 3 days ago
162
1.5k
gpl-3.0
13
📚Modern CUDA Learn Notes with PyTorch: Tensor/CUDA Cores, 📖150+ CUDA Kernels with PyTorch bindings, 📖HGEMM/SGEMM (95%~99% cuBLAS performance), 📖100+ LLM/CUDA Blogs.
Created 2022-12-17
360 commits to main branch, last one a day ago
140
1.5k
apache-2.0
19
FlashInfer: Kernel Library for LLM Serving
Created 2023-07-22
816 commits to main branch, last one a day ago
52
310
apache-2.0
10
InternEvo is an open-sourced lightweight training framework aims to support model pre-training without the need for extensive dependencies.
Created 2024-01-16
479 commits to develop branch, last one 3 days ago
9
178
apache-2.0
7
The official CLIP training codebase of Inf-CL: "Breaking the Memory Barrier: Near Infinite Batch Size Scaling for Contrastive Loss". A super memory-efficiency CLIP training scheme.
Created 2024-10-16
29 commits to main branch, last one 22 days ago
8
90
apache-2.0
1
Train llm (bloom, llama, baichuan2-7b, chatglm3-6b) with deepspeed pipeline mode. Faster than zero/zero++/fsdp.
Created 2023-06-24
27 commits to master branch, last one 9 months ago
Triton implementation of FlashAttention2 that adds Custom Masks.
Created 2024-07-20
18 commits to main branch, last one 3 months ago
Performance of the C++ interface of flash attention and flash attention v2 in large language model (LLM) inference scenarios.
Created 2023-08-16
1 commits to master branch, last one 2 months ago