2 results found Sort:

152
1.4k
gpl-3.0
13
🎉 Modern CUDA Learn Notes with PyTorch: CUDA Cores, Tensor Cores, fp32/tf32, fp16/bf16, fp8/int8, flash_attn, rope, sgemm, hgemm, sgemv, warp/block reduce, elementwise, softmax, layernorm, rmsnorm.
Created 2022-12-17
299 commits to main branch, last one 2 days ago
12
212
bsd-3-clause
4
Root Mean Square Layer Normalization
Created 2019-09-24
7 commits to master branch, last one about a year ago