14 results found Sort:
- Filter by Primary Language:
- Python (6)
- Jupyter Notebook (5)
- HTML (1)
- +
Stanford NLP Python Library for Understanding and Improving PyTorch Models via Interventions
Created
2023-02-06
649 commits to main branch, last one 2 days ago
This repository collects all relevant resources about interpretability in LLMs
Created
2024-06-30
56 commits to main branch, last one 3 months ago
Decomposing and Editing Predictions by Modeling Model Computation
Created
2024-04-17
12 commits to main branch, last one 7 months ago
For OpenMOSS Mechanistic Interpretability Team's Sparse Autoencoder (SAE) research.
Created
2024-03-19
522 commits to main branch, last one 4 days ago
Steering vectors for transformer language models in Pytorch / Huggingface
Created
2024-01-18
65 commits to main branch, last one 2 months ago
Mechanistically interpretable neurosymbolic AI (Nature Comput Sci 2024): losslessly compressing NNs to computer code and discovering new algorithms which generalize out-of-distribution and outperform ...
Created
2024-01-13
5 commits to main branch, last one 11 months ago
Interpreting how transformers simulate agents performing RL tasks
Created
2022-12-17
725 commits to main branch, last one about a year ago
Repo accompanying our paper "Do Llamas Work in English? On the Latent Language of Multilingual Transformers".
Created
2024-02-16
24 commits to main branch, last one 10 months ago
🧠 Starter templates for doing interpretability research
Created
2022-10-31
17 commits to main branch, last one about a year ago
Sparse and discrete interpretability tool for neural networks
Created
2022-12-13
20 commits to main branch, last one 11 months ago
Sparse probing paper full code.
Created
2023-05-02
5 commits to main branch, last one about a year ago
Generating and validating natural-language explanations.
Created
2023-01-30
387 commits to main branch, last one 23 days ago
CausalGym: Benchmarking causal interpretability methods on linguistic tasks
Created
2023-10-10
309 commits to main branch, last one 2 months ago
Universal Neurons in GPT2 Language Models
Created
2023-12-26
5 commits to main branch, last one 8 months ago