5 results found Sort:

A complete end-to-end pipeline for LLM interpretability with sparse autoencoders (SAEs) using Llama 3.2, written in pure PyTorch and fully reproducible.
This repository has been archived (exclude archived)
Created 2024-11-21
2 commits to main branch, last one about a month ago
42
537
mit
5
A library for making RepE control vectors
Created 2024-01-21
28 commits to main branch, last one 19 days ago
This repository collects all relevant resources about interpretability in LLMs
Created 2024-06-30
56 commits to main branch, last one 2 months ago
4
40
apache-2.0
8
SANSA - sparse EASE for millions of items
Created 2023-07-11
71 commits to main branch, last one 19 days ago
Evaluate interpretability methods on localizing and disentangling concepts in LLMs.
Created 2024-02-17
13 commits to main branch, last one 3 months ago