1 result found Sort:
Large scale 4D parallelism pre-training for 🤗 transformers in Mixture of Experts *(still work in progress)*
Created
2023-06-14
554 commits to main branch, last one 11 months ago