5 results found Sort:

A complete end-to-end pipeline for LLM interpretability with sparse autoencoders (SAEs) using Llama 3.2, written in pure PyTorch and fully reproducible.
This repository has been archived (exclude archived)
Created 2024-11-21
2 commits to main branch, last one 22 days ago
40
504
mit
5
A library for making RepE control vectors
Created 2024-01-21
27 commits to main branch, last one 7 days ago
This repository collects all relevant resources about interpretability in LLMs
Created 2024-06-30
56 commits to main branch, last one about a month ago
4
36
apache-2.0
8
SANSA - sparse EASE for millions of items
Created 2023-07-11
70 commits to main branch, last one 16 days ago
Evaluate interpretability methods on localizing and disentangling concepts in LLMs.
Created 2024-02-17
13 commits to main branch, last one 2 months ago