10 results found Sort:

What can Large Language Models do in chemistry? A comprehensive benchmark on eight tasks
Created 2023-05-21
62 commits to main branch, last one 4 months ago
5
84
unknown
3
Official repository of MMGenBench
Created 2024-11-18
6 commits to main branch, last one 11 days ago
BackdoorLLM: A Comprehensive Benchmark for Backdoor Attacks on Large Language Models
Created 2024-08-21
79 commits to main branch, last one 3 months ago
6
75
apache-2.0
2
Python SDK for experimenting, testing, evaluating & monitoring LLM-powered applications - Parea AI (YC S23)
Created 2023-07-24
1,067 commits to main branch, last one 2 months ago
How good are LLMs at chemistry?
Created 2023-05-16
1,083 commits to main branch, last one about a month ago
Language Model for Mainframe Modernization
Created 2024-08-02
30 commits to main branch, last one 3 months ago
CompBench evaluates the comparative reasoning of multimodal large language models (MLLMs) with 40K image pairs and questions across 8 dimensions of relative comparison: visual attribute, existence, st...
Created 2024-07-23
4 commits to main branch, last one 3 months ago
The data and implementation for the experiments in the paper "Flows: Building Blocks of Reasoning and Collaborating AI".
Created 2023-08-02
6 commits to main branch, last one 9 months ago
1
26
unknown
2
Restore safety in fine-tuned language models through task arithmetic
Created 2024-02-17
83 commits to main branch, last one 8 months ago
Training and Benchmarking LLMs for Code Preference.
Created 2024-10-22
10 commits to main branch, last one 17 days ago