Statistics for topic interpretability
RepositoryStats tracks 631,350 Github repositories, of these 180 are tagged with the interpretability topic. The most common primary language for repositories using this topic is Python (91). Other languages include: Jupyter Notebook (49)
Stargazers over time for topic interpretability
Most starred repositories for topic interpretability (view more)
Trending repositories for topic interpretability (view more)
Advanced AI Explainability for computer vision. Support for CNNs, Vision Transformers, Classification, Object detection, Segmentation, Image similarity and more.
A curated list of awesome open source libraries to deploy, monitor, version and scale your machine learning
Stanford NLP Python library for Representation Finetuning (ReFT)
An Open-Source Library for the interpretability of time series classifiers
Wanna know what your model sees? Here's a package for applying EigenCAM and generating heatmap from the new YOLO V11 model
An awesome repository & A comprehensive survey on interpretability of LLM attention heads.
Advanced AI Explainability for computer vision. Support for CNNs, Vision Transformers, Classification, Object detection, Segmentation, Image similarity and more.
A curated list of awesome open source libraries to deploy, monitor, version and scale your machine learning
A game theoretic approach to explain the output of any machine learning model.
Implementation of the BatchTopK activation function for training sparse autoencoders (SAEs)
For OpenMOSS Mechanistic Interpretability Team's Sparse Autoencoder (SAE) research.
PyTorch Implementation of CausalFormer: An Interpretable Transformer for Temporal Causal Discovery
[ICLR 2025] Code and Data Repo for Paper "Latent Space Chain-of-Embedding Enables Output-free LLM Self-Evaluation"
Advanced AI Explainability for computer vision. Support for CNNs, Vision Transformers, Classification, Object detection, Segmentation, Image similarity and more.
A game theoretic approach to explain the output of any machine learning model.
A curated list of awesome open source libraries to deploy, monitor, version and scale your machine learning
🔅 Shapash: User-friendly Explainability and Interpretability to Develop Reliable and Transparent Machine Learning Models
A curated list of resources for activation engineering
[ICLR 2025] Code and Data Repo for Paper "Latent Space Chain-of-Embedding Enables Output-free LLM Self-Evaluation"
Implementation of the BatchTopK activation function for training sparse autoencoders (SAEs)
A JAX research toolkit for building, editing, and visualizing neural networks.
An awesome repository & A comprehensive survey on interpretability of LLM attention heads.
Stanford NLP Python library for benchmarking the utility of LLM interpretability methods
A curated list of awesome open source libraries to deploy, monitor, version and scale your machine learning
A game theoretic approach to explain the output of any machine learning model.
Advanced AI Explainability for computer vision. Support for CNNs, Vision Transformers, Classification, Object detection, Segmentation, Image similarity and more.
A JAX research toolkit for building, editing, and visualizing neural networks.
Stanford NLP Python library for Representation Finetuning (ReFT)
Decomposing and Editing Predictions by Modeling Model Computation
[NeurIPS 2024] Knowledge Circuits in Pretrained Transformers
The nnsight package enables interpreting and manipulating the internals of deep learned models.
ADHDeepNet is a model that integrates temporal and spatial characterization, attention modules, and explainability techniques, optimized for EEG data ADAD diagnosis. Neural Architecture Search (NAS), ...