1 result found Sort:
PyTorch implementation of the Differential-Transformer architecture for sequence modeling, specifically tailored as a decoder-only model similar to large language models (LLMs). The architecture incor...
Created
2024-10-08
7 commits to main branch, last one 26 days ago