1 result found Sort:

PyTorch implementation of the Differential-Transformer architecture for sequence modeling, specifically tailored as a decoder-only model similar to large language models (LLMs). The architecture incor...
Created 2024-10-08
7 commits to main branch, last one 26 days ago