Search Results - RepositoryStats

412

4.7k

apache-2.0

23

Use PEFT or Full-parameter to finetune 400+ LLMs (Qwen2.5, Llama3.2, GLM4, Internlm2.5, Yi1.5, Mistral, Baichuan2, DeepSeek, ...) or 100+ MLLMs (Qwen2-VL, Qwen2-Audio, Llama3.2-Vision, Llava, InternVL...

Created 2023-08-01

1,273 commits to main branch, last one 6 hours ago

MedicalGPT shibing624

513

3.4k

apache-2.0

38

MedicalGPT: Training Your Own Medical GPT Model with ChatGPT Training Pipeline. 训练医疗大模型，实现了包括增量预训练(PT)、有监督微调(SFT)、RLHF、DPO、ORPO。

dpo gpt llm llama chatgpt medical medicalgpt

Created 2023-06-02

540 commits to main branch, last one 4 days ago

HALOs ContextualAI

46

768

apache-2.0

8

A library with extensible implementations of DPO, KTO, PPO, ORPO, and other human-aware loss functions (HALOs).

dpo kto ppo rlhf halos alignment

Created 2023-12-03

212 commits to main branch, last one 9 days ago

LLamaTuner jianzhnie

63

583

apache-2.0

9

Easy and Efficient Finetuning LLMs. (Supported LLama, LLama2, LLama3, Qwen, Baichuan, GLM , Falcon) 大模型高效量化训练+部署.

dpo ppo qwen rlhf llama qlora llama3 chatgpt mixtral

Created 2023-05-25

537 commits to main branch, last one 5 months ago

tensorflow-nlp-tutorial ukairia777

270

534

unknown

5

tensorflow를 사용하여 텍스트 전처리부터, Topic Models, BERT, GPT, LLM과 같은 최신 모델의 다운스트림 태스크들을 정리한 Deep Learning NLP 저장소입니다.

dpo llm nlp sft bert lora llama trainer bert-ner tensorflow huggingface nlp-tutorial transformers keras-tutorial question-answering named-entity-recognition natural-language-processing

Created 2021-12-30

259 commits to main branch, last one 3 months ago

Step-DPO dvlab-research

11

310

unknown

2

Implementation for "Step-DPO: Step-wise Preference Optimization for Long-chain Reasoning of LLMs"

dpo llm math reasoning

Created 2024-06-24

19 commits to main branch, last one 5 months ago

mLoRA TUDB-Labs

54

284

apache-2.0

3

An Efficient "Factory" to Build Multiple LoRA Adapters

dpo gpu llm lora peft rlhf llama mlora llama2 chatglm baichuan finetune

Created 2023-08-24

306 commits to main branch, last one 12 days ago

align-anything PKU-Alignment

55

274

apache-2.0

9

Align Anything: Training All-modality Model with Feedback

dpo rlhf chameleon multimodal large-language-models vision-language-model

Created 2024-07-14

66 commits to main branch, last one 2 days ago

SiLLM armbues

23

234

mit

8

SiLLM simplifies the process of training and running Large Language Models (LLMs) on Apple Silicon by leveraging the MLX framework.

dpo llm mlx lora llm-training apple-silicon llm-inference large-language-models

Created 2024-01-11

485 commits to main branch, last one 2 days ago

SPO RockeyCoss

4

164

mit

5

Aesthetic Post-Training Diffusion Models from Generic Preferences with Step-by-step Preference Optimization

dpo sdxl text-to-image diffusion-models text-to-image-generation

Created 2024-05-26

1 commits to main branch, last one 4 days ago

notus argilla-io

14

163

mit

6

Notus is a collection of fine-tuned LLMs using SFT, DPO, SFT+DPO, and/or any other RLHF techniques, while always keeping a data-first approach

dpo trl zephyr fine-tuning lm-alignment preference-data alignment-handbook

Created 2023-11-16

52 commits to main branch, last one about a year ago

IterComp YangLing0818

9

152

mit

3

IterComp: Iterative Composition-Aware Feedback Learning from Model Gallery for Text-to-Image Generation

dpo rlhf text-to-image reward-modeling

Created 2024-10-09

10 commits to main branch, last one about a month ago

VL-RLHF TideDra

7

140

apache-2.0

4

A RLHF Infrastructure for Vision-Language Models

dpo llm lmm vlm mllm rlhf

Created 2023-12-27

7 commits to main branch, last one about a month ago

NetTrader.Indicator anilca

52

140

unknown

16

Technical anaysis library for .NET

Created 2016-06-30

64 commits to master branch, last one 3 years ago

Vision-LLM-Alignment NiuTrans

6

90

unknown

3

This repository contains the code for SFT, RLHF, and DPO, designed for vision-based LLMs, including the LLaVA models and the LLaMA-3.2-vision models.

dpo llm ppo sft mllm rlhf llava reward vision alignment multi-model llama3-vision

Created 2024-06-29

40 commits to master branch, last one 2 months ago

oat sail-sg

6

72

apache-2.0

5

🌾 OAT: Online AlignmenT for LLMs

dpo llm rlhf alignment llm-aligment distributed-rl dueling-bandits llm-exploration online-alignment thompson-sampling distributed-training

Created 2024-10-15

22 commits to main branch, last one 13 hours ago

CodeUltraFeedback martin-wey

5

67

mit

3

CodeUltraFeedback: aligning large language models to coding preferences

dpo alignment codal-bench llm-as-a-judge code-generation codeultrafeedback large-language-models

Created 2024-01-25

51 commits to main branch, last one 5 months ago

SuperCorrect-llm YangLing0818

1

41

unknown

3

SuperCorrect: Supervising and Correcting Language Models with Error-Driven Insights

dpo llm reflection llm-reasoning

Created 2024-10-11

9 commits to main branch, last one 2 months ago

beta-DPO junkangwu

0

36

unknown

2

[NeurIPS 2024] Official code of $\beta$-DPO: Direct Preference Optimization with Dynamic $\beta$

dpo rlhf alignment preference-alignment

Created 2024-05-22

9 commits to main branch, last one about a month ago

DPO-ST TianduoWang

5

30

mit

3

[ACL 2024] Self-Training with Direct Preference Optimization Improves Chain-of-Thought Reasoning

dpo chain-of-thought math-word-problem

Created 2024-06-04

9 commits to main branch, last one 4 months ago

Simple-Trl-Training sugarandgugu

1

29

unknown

1

基于DPO算法微调语言大模型，简单好上手。

dpo llm trl rlhf simple

Created 2024-03-27

16 commits to master branch, last one 5 months ago

Dutch-LLMs RobinSmits

0

28

mit

3

Various training, inference and validation code and results related to Open LLM's that were pretrained (full or partially) on the Dutch language.

dpo trl peft qwen2 alpaca polylm pytorch open-llama transformers large-language-models

Created 2023-07-02

29 commits to main branch, last one 8 months ago

openpo dannylee1020

0

25

apache-2.0

3

Framework for building synthetic datasets with AI feedback

ai dpo llm rlhf rlaif python evaluation finetuning ai-feedback huggingface llm-evaluation synthetic-data synthetic-data-generation

Created 2024-10-28

231 commits to master branch, last one 18 hours ago