Trending repositories for topic transformers
21 Lessons, Get Started Building with Generative AI 🔗 https://microsoft.github.io/generative-ai-for-beginners/
Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)
[NeurIPS 2024 Best Paper][GPT beats diffusion🔥] [scaling laws in visual generation📈] Official impl. of "Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction". An *ultr...
🧑🏫 60+ Implementations/tutorials of deep learning papers with side-by-side notes 📝; including transformers (original, xl, switch, feedback, vit, ...), optimizers (adam, adabelief, sophia, ...), ga...
An Easy-to-use, Scalable and High-performance RLHF Framework (70B+ PPO Full Tuning & Iterative DPO & LoRA & RingAttention & RFT)
Open source real-time translation app for Android that runs locally
AI orchestration framework to build customizable, production-ready LLM applications. Connect components (models, vector DBs, file converters) to pipelines or agents that can interact with your data. W...
State-of-the-art Machine Learning for the web. Run 🤗 Transformers directly in your browser, with no need for a server!
Infinity ∞ : Scaling Bitwise AutoRegressive Modeling for High-Resolution Image Synthesis
Multi-LoRA inference server that scales to 1000s of fine-tuned LLMs
OpenVINO™ is an open source toolkit for optimizing and deploying AI inference
Implementation of Vision Transformer, a simple way to achieve SOTA in vision classification with only a single transformer encoder, in Pytorch
Synthesizing Graphics Programs for Scientific Figures and Sketches with TikZ
⚡️SwanLab - an open-source, modern-design AI training tracking and visualization tool. Supports Cloud / Self-hosted use. Integrated with PyTorch / Transformers / LLaMA Factory / Swift / Ultralytics / ...
Accelerate local LLM inference and finetuning (LLaMA, Mistral, ChatGLM, Qwen, DeepSeek, Mixtral, Gemma, Phi, MiniCPM, Qwen-VL, MiniCPM-V, etc.) on Intel XPU (e.g., local PC with iGPU and NPU, discrete...
SANA: Efficient High-Resolution Image Synthesis with Linear Diffusion Transformer
Run local LLM from Huggingface in React-Native or Expo using onnxruntime.
Official repository of my book "A Hands-On Guide to Fine-Tuning LLMs with PyTorch and Hugging Face"
🤹 MultiTRON: Pareto Front Approximation for Multi-Objective Session-Based Recommender Systems, accepted at ACM RecSys 2024.
Open Source Generative AI Models for Automatic Rhythm Game Beatmap Generation (for acoustic people)
Infinity ∞ : Scaling Bitwise AutoRegressive Modeling for High-Resolution Image Synthesis
Synthesizing Graphics Programs for Scientific Figures and Sketches with TikZ
Super-Efficient RLHF Training of LLMs with Parameter Reallocation
Official Implementation of SegFormer3D: an Efficient Transformer for 3D Medical Image Segmentation (CVPRW 2024)
Inference and fine-tuning examples for vision models from 🤗 Transformers
Official implementation for "UniST: A Prompt-Empowered Universal Model for Urban Spatio-Temporal Prediction" (KDD 2024)
⚡️SwanLab - an open-source, modern-design AI training tracking and visualization tool. Supports Cloud / Self-hosted use. Integrated with PyTorch / Transformers / LLaMA Factory / Swift / Ultralytics / ...
[NeurIPS 2024 Best Paper][GPT beats diffusion🔥] [scaling laws in visual generation📈] Official impl. of "Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction". An *ultr...
[ECCV2024 Oral] Official implementation of the paper "Relation DETR: Exploring Explicit Position Relation Prior for Object Detection"
A text-to-speech (TTS) and Speech-to-Speech (STS) library built on Apple's MLX framework, providing efficient speech synthesis on Apple Silicon.
Implementation of the new SOTA for model based RL, from the paper "Improving Transformer World Models for Data-Efficient RL", in Pytorch
21 Lessons, Get Started Building with Generative AI 🔗 https://microsoft.github.io/generative-ai-for-beginners/
Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)
🧑🏫 60+ Implementations/tutorials of deep learning papers with side-by-side notes 📝; including transformers (original, xl, switch, feedback, vit, ...), optimizers (adam, adabelief, sophia, ...), ga...
[NeurIPS 2024 Best Paper][GPT beats diffusion🔥] [scaling laws in visual generation📈] Official impl. of "Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction". An *ultr...
An Easy-to-use, Scalable and High-performance RLHF Framework (70B+ PPO Full Tuning & Iterative DPO & LoRA & RingAttention & RFT)
AI orchestration framework to build customizable, production-ready LLM applications. Connect components (models, vector DBs, file converters) to pipelines or agents that can interact with your data. W...
Synthesizing Graphics Programs for Scientific Figures and Sketches with TikZ
Implementation of Vision Transformer, a simple way to achieve SOTA in vision classification with only a single transformer encoder, in Pytorch
State-of-the-art Machine Learning for the web. Run 🤗 Transformers directly in your browser, with no need for a server!
Multi-LoRA inference server that scales to 1000s of fine-tuned LLMs
SANA: Efficient High-Resolution Image Synthesis with Linear Diffusion Transformer
[CVPR 2025] Official PyTorch Implementation of MambaVision: A Hybrid Mamba-Transformer Vision Backbone
Open source real-time translation app for Android that runs locally
A text-to-speech (TTS) and Speech-to-Speech (STS) library built on Apple's MLX framework, providing efficient speech synthesis on Apple Silicon.
⚡️SwanLab - an open-source, modern-design AI training tracking and visualization tool. Supports Cloud / Self-hosted use. Integrated with PyTorch / Transformers / LLaMA Factory / Swift / Ultralytics / ...
[CVPR 2025] Official code and models for Encoder-only Mask Transformer (EoMT).
Synthesizing Graphics Programs for Scientific Figures and Sketches with TikZ
A text-to-speech (TTS) and Speech-to-Speech (STS) library built on Apple's MLX framework, providing efficient speech synthesis on Apple Silicon.
Implementation of the new SOTA for model based RL, from the paper "Improving Transformer World Models for Data-Efficient RL", in Pytorch
A holistic way of understanding how Llama and its components run in practice, with code and detailed documentation.
MedViTV2: Medical Image Classification with KAN-Integrated Transformers and Dilated Neighborhood Attention
Production ready LLM model compression/quantization toolkit with hw accelerated inference support for both cpu/gpu via HF, vLLM, and SGLang.
[ICLR2025 Oral] ChartMoE: Mixture of Diversely Aligned Expert Connector for Chart Understanding
An AI-powered natural language & reverse Image Search Engine powered by CLIP & qdrant.
[CVPR 2025] Official PyTorch Implementation of MambaVision: A Hybrid Mamba-Transformer Vision Backbone
Inference and fine-tuning examples for vision models from 🤗 Transformers
[AAAI-2025] This repository contains MAPF-GPT, a deep learning-based model for solving MAPF problems. Trained with imitation learning on trajectories produced by LaCAM, it generates actions under part...
Run local LLM from Huggingface in React-Native or Expo using onnxruntime.
⚡️SwanLab - an open-source, modern-design AI training tracking and visualization tool. Supports Cloud / Self-hosted use. Integrated with PyTorch / Transformers / LLaMA Factory / Swift / Ultralytics / ...
A multi-model framework that generates fully featured osu! beatmaps for all gamemodes from spectrogram inputs.
Official repository of my book "A Hands-On Guide to Fine-Tuning LLMs with PyTorch and Hugging Face"
Vision Transformers Needs Registers. And Gated MLPs. And +20M params. Tiny modality gap ensues!
21 Lessons, Get Started Building with Generative AI 🔗 https://microsoft.github.io/generative-ai-for-beginners/
Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)
An Easy-to-use, Scalable and High-performance RLHF Framework (70B+ PPO Full Tuning & Iterative DPO & LoRA & RingAttention & RFT)
🧑🏫 60+ Implementations/tutorials of deep learning papers with side-by-side notes 📝; including transformers (original, xl, switch, feedback, vit, ...), optimizers (adam, adabelief, sophia, ...), ga...
AI orchestration framework to build customizable, production-ready LLM applications. Connect components (models, vector DBs, file converters) to pipelines or agents that can interact with your data. W...
Multi-LoRA inference server that scales to 1000s of fine-tuned LLMs
[NeurIPS 2024 Best Paper][GPT beats diffusion🔥] [scaling laws in visual generation📈] Official impl. of "Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction". An *ultr...
Accelerate local LLM inference and finetuning (LLaMA, Mistral, ChatGLM, Qwen, DeepSeek, Mixtral, Gemma, Phi, MiniCPM, Qwen-VL, MiniCPM-V, etc.) on Intel XPU (e.g., local PC with iGPU and NPU, discrete...
SANA: Efficient High-Resolution Image Synthesis with Linear Diffusion Transformer
A text-to-speech (TTS) and Speech-to-Speech (STS) library built on Apple's MLX framework, providing efficient speech synthesis on Apple Silicon.
Implementation of Vision Transformer, a simple way to achieve SOTA in vision classification with only a single transformer encoder, in Pytorch
This repository is a curated collection of links to various courses and resources about Artificial Intelligence (AI)
⚡️SwanLab - an open-source, modern-design AI training tracking and visualization tool. Supports Cloud / Self-hosted use. Integrated with PyTorch / Transformers / LLaMA Factory / Swift / Ultralytics / ...
[CVPR 2025] Official PyTorch Implementation of MambaVision: A Hybrid Mamba-Transformer Vision Backbone
State-of-the-art Machine Learning for the web. Run 🤗 Transformers directly in your browser, with no need for a server!
A text-to-speech (TTS) and Speech-to-Speech (STS) library built on Apple's MLX framework, providing efficient speech synthesis on Apple Silicon.
Erwin: A Tree-based Hierarchical Transformer for Large-scale Physical Systems
Vision Transformers Needs Registers. And Gated MLPs. And +20M params. Tiny modality gap ensues!
MedViTV2: Medical Image Classification with KAN-Integrated Transformers and Dilated Neighborhood Attention
Mantis: Lightweight Calibrated Foundation Model for User-Friendly Time Series Classification
Official repository of my book "A Hands-On Guide to Fine-Tuning LLMs with PyTorch and Hugging Face"
A flexible, adaptive classification system for dynamic text classification
Implementation of the new SOTA for model based RL, from the paper "Improving Transformer World Models for Data-Efficient RL", in Pytorch
[ICLR2025 Oral] ChartMoE: Mixture of Diversely Aligned Expert Connector for Chart Understanding
Comprehensive repository of Data Science projects spanning Machine Learning, Deep Learning, and Natural Language Processing. Demonstrates practical applications of algorithms and tools on real-world d...
SlamKit is an open source tool kit for efficient training of SpeechLMs. It was used for "Slamming: Training a Speech Language Model on One GPU in a Day"
An AI-powered natural language & reverse Image Search Engine powered by CLIP & qdrant.
Synthesizing Graphics Programs for Scientific Figures and Sketches with TikZ
VoiceRestore: Flow-Matching Transformers for Universal Speech Restoration
Production ready LLM model compression/quantization toolkit with hw accelerated inference support for both cpu/gpu via HF, vLLM, and SGLang.
A set of scripts and configurations for pretraining of Large Language Models (LLM)
[NeurIPS 2024 Best Paper][GPT beats diffusion🔥] [scaling laws in visual generation📈] Official impl. of "Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction". An *ultr...
Nexa SDK is a comprehensive toolkit for supporting GGML and ONNX models. It supports text generation, image generation, vision-language models (VLM), Audio Language Model, auto-speech-recognition (ASR...
SANA: Efficient High-Resolution Image Synthesis with Linear Diffusion Transformer
[ICLR 2025] From anything to mesh like human artists. Official impl. of "MeshAnything: Artist-Created Mesh Generation with Autoregressive Transformers"
Implementation of Alphafold 3 from Google Deepmind in Pytorch
A collection of 🤗 Transformers.js demos and example applications
A curriculum for learning about foundation models, from scratch to the frontier
[CVPR 2025] Official PyTorch Implementation of MambaVision: A Hybrid Mamba-Transformer Vision Backbone
Infinity ∞ : Scaling Bitwise AutoRegressive Modeling for High-Resolution Image Synthesis
Pytorch implementation of Transfusion, "Predict the Next Token and Diffuse Images with One Multi-Modal Model", from MetaAI
From anything to mesh like human artists. Official impl. of "MeshAnything V2: Artist-Created Mesh Generation With Adjacent Mesh Tokenization"
The PyTorch implementation of Generative Pre-trained Transformers (GPTs) using Kolmogorov-Arnold Networks (KANs) for language modeling
This repository is a curated collection of the most exciting and influential CVPR 2024 papers. 🔥 [Paper + Code + Demo]
MOMENT: A Family of Open Time-series Foundation Models
21 Lessons, Get Started Building with Generative AI 🔗 https://microsoft.github.io/generative-ai-for-beginners/
Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)
🧑🏫 60+ Implementations/tutorials of deep learning papers with side-by-side notes 📝; including transformers (original, xl, switch, feedback, vit, ...), optimizers (adam, adabelief, sophia, ...), ga...
Open source real-time translation app for Android that runs locally
[NeurIPS 2024 Best Paper][GPT beats diffusion🔥] [scaling laws in visual generation📈] Official impl. of "Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction". An *ultr...
State-of-the-art Machine Learning for the web. Run 🤗 Transformers directly in your browser, with no need for a server!
AI orchestration framework to build customizable, production-ready LLM applications. Connect components (models, vector DBs, file converters) to pipelines or agents that can interact with your data. W...
An Easy-to-use, Scalable and High-performance RLHF Framework (70B+ PPO Full Tuning & Iterative DPO & LoRA & RingAttention & RFT)
Implementation of Vision Transformer, a simple way to achieve SOTA in vision classification with only a single transformer encoder, in Pytorch
🤗 PEFT: State-of-the-art Parameter-Efficient Fine-Tuning.
Nexa SDK is a comprehensive toolkit for supporting GGML and ONNX models. It supports text generation, image generation, vision-language models (VLM), Audio Language Model, auto-speech-recognition (ASR...
💡 All-in-one open-source embeddings database for semantic search, LLM orchestration and language model workflows
SANA: Efficient High-Resolution Image Synthesis with Linear Diffusion Transformer
This repository contains demos I made with the Transformers library by HuggingFace.
Accelerate local LLM inference and finetuning (LLaMA, Mistral, ChatGLM, Qwen, DeepSeek, Mixtral, Gemma, Phi, MiniCPM, Qwen-VL, MiniCPM-V, etc.) on Intel XPU (e.g., local PC with iGPU and NPU, discrete...
A MLX port of FLUX based on the Huggingface Diffusers implementation.
A collection of 🤗 Transformers.js demos and example applications
[ICLR 2025] From anything to mesh like human artists. Official impl. of "MeshAnything: Artist-Created Mesh Generation with Autoregressive Transformers"
From anything to mesh like human artists. Official impl. of "MeshAnything V2: Artist-Created Mesh Generation With Adjacent Mesh Tokenization"
This repository is a curated collection of the most exciting and influential CVPR 2024 papers. 🔥 [Paper + Code + Demo]
A library for easily merging multiple LLM experts, and efficiently train the merged LLM.
SANA: Efficient High-Resolution Image Synthesis with Linear Diffusion Transformer
MOMENT: A Family of Open Time-series Foundation Models
Open source real-time translation app for Android that runs locally
Pytorch implementation of Transfusion, "Predict the Next Token and Diffuse Images with One Multi-Modal Model", from MetaAI
A curriculum for learning about foundation models, from scratch to the frontier
Here lies the resources and topics necessary for the role of Data Scientist and Machine Learning
PyTorch Implementation of Jamba: "Jamba: A Hybrid Transformer-Mamba Language Model"
🤖✨ChatMLX is a modern, open-source, high-performance chat application for MacOS based on large language models.
Implementation of π₀, the robotic foundation model architecture proposed by Physical Intelligence
A Faster LayoutReader Model based on LayoutLMv3, Sort OCR bboxes to reading order.
【三年面试五年模拟】AI算法工程师面试秘籍。涵盖AIGC、传统深度学习、自动驾驶、机器学习、计算机视觉、自然语言处理、强化学习、具身智能、元宇宙、AGI等AI行业面试笔试经验与干货知识。