3 results found Sort:

Multi-Agent Constrained Policy Optimisation (MACPO; MAPPO-L).
Created 2021-10-06
71 commits to main branch, last one 11 months ago
Code for Paper (Policy Optimization in RLHF: The Impact of Out-of-preference Data)
Created 2023-12-12
1 commits to master branch, last one about a year ago
Codebase to fully reproduce the results of "No Representation, No Trust: Connecting Representation, Collapse, and Trust Issues in PPO" (Moalla et al. 2024). Uses TorchRL and provides extensive tools f...
Created 2024-04-30
5 commits to main branch, last one 4 months ago