1 result found Sort:
Implementation of Switch Transformers from the paper: "Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity"
Created
2024-01-22
11 commits to main branch, last one 10 months ago