Search Results - RepositoryStats

24

201

apache-2.0

2

LLaMA-TRL: Fine-tuning LLaMA with PPO and LoRA

gpt ppo trl lora peft rlhf gpt-4 llama adapter chatgpt transformer

Created 2023-03-30

66 commits to main branch, last one about a year ago

14

164

mit

6

Notus is a collection of fine-tuned LLMs using SFT, DPO, SFT+DPO, and/or any other RLHF techniques, while always keeping a data-first approach

dpo trl zephyr fine-tuning lm-alignment preference-data alignment-handbook

Created 2023-11-16

52 commits to main branch, last one about a year ago

1

30

unknown

1

基于DPO算法微调语言大模型，简单好上手。

dpo llm trl rlhf simple

Created 2024-03-27

16 commits to master branch, last one 7 months ago

0

28

mit

3

Various training, inference and validation code and results related to Open LLM's that were pretrained (full or partially) on the Dutch language.

dpo trl peft qwen2 alpaca polylm pytorch open-llama transformers large-language-models

Created 2023-07-02

29 commits to main branch, last one 10 months ago

2

26

unknown

1

realize the reinforcement learning training for gpt2 llama bloom and so on llm model

llm trl lora rlhf trlx reward llm-rlhf

Created 2023-04-19

123 commits to main branch, last one about a year ago