2 results found Sort:
🎉 Modern CUDA Learn Notes with PyTorch: CUDA Cores, Tensor Cores, fp32/tf32, fp16/bf16, fp8/int8, flash_attn, rope, sgemm, hgemm, sgemv, warp/block reduce, elementwise, softmax, layernorm, rmsnorm.
Created
2022-12-17
299 commits to main branch, last one 2 days ago
Root Mean Square Layer Normalization
Created
2019-09-24
7 commits to master branch, last one about a year ago