Trending repositories for topic interpretability
Advanced AI Explainability for computer vision. Support for CNNs, Vision Transformers, Classification, Object detection, Segmentation, Image similarity and more.
A game theoretic approach to explain the output of any machine learning model.
A curated list of awesome open source libraries to deploy, monitor, version and scale your machine learning
A library to train, evaluate, interpret, and productionize decision forest models such as Random Forest and Gradient Boosted Decision Trees.
Responsible AI Toolbox is a suite of tools providing model and data exploration and assessment user interfaces and libraries that enable a better understanding of AI systems. These interfaces and libr...
Class activation maps for your PyTorch models (CAM, Grad-CAM, Grad-CAM++, Smooth Grad-CAM++, Score-CAM, SS-CAM, IS-CAM, XGrad-CAM, Layer-CAM)
π Shapash: User-friendly Explainability and Interpretability to Develop Reliable and Transparent Machine Learning Models
A curated list of resources for activation engineering
[NeurIPS 2024 Oral] Aligner: Efficient Alignment by Learning to Correct
A curated list of resources for activation engineering
A library to train, evaluate, interpret, and productionize decision forest models such as Random Forest and Gradient Boosted Decision Trees.
[NeurIPS 2024 Oral] Aligner: Efficient Alignment by Learning to Correct
Responsible AI Toolbox is a suite of tools providing model and data exploration and assessment user interfaces and libraries that enable a better understanding of AI systems. These interfaces and libr...
Advanced AI Explainability for computer vision. Support for CNNs, Vision Transformers, Classification, Object detection, Segmentation, Image similarity and more.
Fit interpretable models. Explain blackbox machine learning.
Class activation maps for your PyTorch models (CAM, Grad-CAM, Grad-CAM++, Smooth Grad-CAM++, Score-CAM, SS-CAM, IS-CAM, XGrad-CAM, Layer-CAM)
A curated list of awesome open source libraries to deploy, monitor, version and scale your machine learning
A game theoretic approach to explain the output of any machine learning model.
π Shapash: User-friendly Explainability and Interpretability to Develop Reliable and Transparent Machine Learning Models
Advanced AI Explainability for computer vision. Support for CNNs, Vision Transformers, Classification, Object detection, Segmentation, Image similarity and more.
A curated list of awesome open source libraries to deploy, monitor, version and scale your machine learning
A game theoretic approach to explain the output of any machine learning model.
Responsible AI Toolbox is a suite of tools providing model and data exploration and assessment user interfaces and libraries that enable a better understanding of AI systems. These interfaces and libr...
π Shapash: User-friendly Explainability and Interpretability to Develop Reliable and Transparent Machine Learning Models
A library to train, evaluate, interpret, and productionize decision forest models such as Random Forest and Gradient Boosted Decision Trees.
An awesome repository & A comprehensive survey on interpretability of LLM attention heads.
The nnsight package enables interpreting and manipulating the internals of deep learned models.
Stanford NLP Python library for understanding and improving PyTorch models via interventions
A curated list of awesome responsible machine learning resources.
Stanford NLP Python library for Representation Finetuning (ReFT)
Class activation maps for your PyTorch models (CAM, Grad-CAM, Grad-CAM++, Smooth Grad-CAM++, Score-CAM, SS-CAM, IS-CAM, XGrad-CAM, Layer-CAM)
Wanna know what your model sees? Here's a package for applying EigenCAM and generating heatmap from the new YOLO V11 model
Zennit is a high-level framework in Python using PyTorch for explaining/exploring neural networks using attribution methods like LRP.
A collection of research materials on explainable AI/ML
[ICLR 2025] Code and Data Repo for Paper "Latent Space Chain-of-Embedding Enables Output-free LLM Self-Evaluation"
A curated list of resources for activation engineering
An awesome repository & A comprehensive survey on interpretability of LLM attention heads.
Wanna know what your model sees? Here's a package for applying EigenCAM and generating heatmap from the new YOLO V11 model
Zennit is a high-level framework in Python using PyTorch for explaining/exploring neural networks using attribution methods like LRP.
A library to train, evaluate, interpret, and productionize decision forest models such as Random Forest and Gradient Boosted Decision Trees.
Modeling Dense Multimodal Interactions Between Biological Pathways and Histology for Survival Prediction - CVPR 2024
An Open-Source Library for the interpretability of time series classifiers
The nnsight package enables interpreting and manipulating the internals of deep learned models.
[NeurIPS 2024 Oral] Aligner: Efficient Alignment by Learning to Correct
Stanford NLP Python library for understanding and improving PyTorch models via interventions
Responsible AI Toolbox is a suite of tools providing model and data exploration and assessment user interfaces and libraries that enable a better understanding of AI systems. These interfaces and libr...
Advanced AI Explainability for computer vision. Support for CNNs, Vision Transformers, Classification, Object detection, Segmentation, Image similarity and more.
This repository introduces MentaLLaMA, the first open-source instruction following large language model for interpretable mental health analysis.
π‘ Adversarial attacks on explanations and how to defend them
Advanced AI Explainability for computer vision. Support for CNNs, Vision Transformers, Classification, Object detection, Segmentation, Image similarity and more.
A game theoretic approach to explain the output of any machine learning model.
A curated list of awesome open source libraries to deploy, monitor, version and scale your machine learning
π Shapash: User-friendly Explainability and Interpretability to Develop Reliable and Transparent Machine Learning Models
The nnsight package enables interpreting and manipulating the internals of deep learned models.
Stanford NLP Python library for Representation Finetuning (ReFT)
Class activation maps for your PyTorch models (CAM, Grad-CAM, Grad-CAM++, Smooth Grad-CAM++, Score-CAM, SS-CAM, IS-CAM, XGrad-CAM, Layer-CAM)
Responsible AI Toolbox is a suite of tools providing model and data exploration and assessment user interfaces and libraries that enable a better understanding of AI systems. These interfaces and libr...
Stanford NLP Python library for understanding and improving PyTorch models via interventions
A curated list of awesome responsible machine learning resources.
A curated list of resources for activation engineering
An awesome repository & A comprehensive survey on interpretability of LLM attention heads.
A library to train, evaluate, interpret, and productionize decision forest models such as Random Forest and Gradient Boosted Decision Trees.
[ICLR 2025] Code and Data Repo for Paper "Latent Space Chain-of-Embedding Enables Output-free LLM Self-Evaluation"
A JAX research toolkit for building, editing, and visualizing neural networks.
[ICLR 2025] Code and Data Repo for Paper "Latent Space Chain-of-Embedding Enables Output-free LLM Self-Evaluation"
A curated list of resources for activation engineering
Implementation of the BatchTopK activation function for training sparse autoencoders (SAEs)
For OpenMOSS Mechanistic Interpretability Team's Sparse Autoencoder (SAE) research.
PyTorch Implementation of CausalFormer: An Interpretable Transformer for Temporal Causal Discovery
Stanford NLP Python library for benchmarking the utility of LLM interpretability methods
Wanna know what your model sees? Here's a package for applying EigenCAM and generating heatmap from the new YOLO V11 model
Materials for the course Principles of AI: LLMs at UPenn (Stat 9911, Spring 2025). LLM architectures, training paradigms (pre- and post-training, alignment), test-time computation, reasoning, safety a...
[NeurIPS 2024 Oral] Aligner: Efficient Alignment by Learning to Correct
The source code of paper: Trend attention fully convolutional network for remaining useful life estimation in the turbofan engine PHM of CMAPSS dataset. Signal selection, Attention mechanism, and Inte...
This is the official repository for HypoGeniC (Hypothesis Generation in Context) and HypoRefine, which are automated, data-driven tools that leverage large language models to generate hypothesis for o...
A system for automating the design of predictive modeling pipelines tailored for clinical prognosis.
[ICLR 2025] Monet: Mixture of Monosemantic Experts for Transformers
An awesome repository & A comprehensive survey on interpretability of LLM attention heads.
A JAX research toolkit for building, editing, and visualizing neural networks.
An awesome repository & A comprehensive survey on interpretability of LLM attention heads.
Stanford NLP Python library for benchmarking the utility of LLM interpretability methods
This is the official repository for HypoGeniC (Hypothesis Generation in Context) and HypoRefine, which are automated, data-driven tools that leverage large language models to generate hypothesis for o...
[NeurIPS 2024] Official implementation of the paper "MambaLRP: Explaining Selective State Space Sequence Models".
[ICLR 2025] Code and Data Repo for Paper "Latent Space Chain-of-Embedding Enables Output-free LLM Self-Evaluation"
Code for the paper "Aligning LLM Agents by Learning Latent Preference from User Edits".
ConceptVectors Benchmark and Code for the paper "Intrinsic Evaluation of Unlearning Using Parametric Knowledge Traces"
Implementation of the BatchTopK activation function for training sparse autoencoders (SAEs)
Materials for the course Principles of AI: LLMs at UPenn (Stat 9911, Spring 2025). LLM architectures, training paradigms (pre- and post-training, alignment), test-time computation, reasoning, safety a...
A curated list of awesome open source libraries to deploy, monitor, version and scale your machine learning
A game theoretic approach to explain the output of any machine learning model.
Advanced AI Explainability for computer vision. Support for CNNs, Vision Transformers, Classification, Object detection, Segmentation, Image similarity and more.
A JAX research toolkit for building, editing, and visualizing neural networks.
Stanford NLP Python library for Representation Finetuning (ReFT)
The nnsight package enables interpreting and manipulating the internals of deep learned models.
Class activation maps for your PyTorch models (CAM, Grad-CAM, Grad-CAM++, Smooth Grad-CAM++, Score-CAM, SS-CAM, IS-CAM, XGrad-CAM, Layer-CAM)
Stanford NLP Python library for understanding and improving PyTorch models via interventions
A curated list of awesome responsible machine learning resources.
An awesome repository & A comprehensive survey on interpretability of LLM attention heads.
Responsible AI Toolbox is a suite of tools providing model and data exploration and assessment user interfaces and libraries that enable a better understanding of AI systems. These interfaces and libr...
π Shapash: User-friendly Explainability and Interpretability to Develop Reliable and Transparent Machine Learning Models
A collection of research materials on explainable AI/ML
Interpretable ML package π for concise, transparent, and accurate predictive modeling (sklearn-compatible).
[ICCV 2021- Oral] Official PyTorch implementation for Generic Attention-model Explainability for Interpreting Bi-Modal and Encoder-Decoder Transformers, a novel method to visualize any Transformer-bas...
Decomposing and Editing Predictions by Modeling Model Computation
[NeurIPS 2024] Knowledge Circuits in Pretrained Transformers
The nnsight package enables interpreting and manipulating the internals of deep learned models.
ADHDeepNet is a model that integrates temporal and spatial characterization, attention modules, and explainability techniques, optimized for EEG data ADAD diagnosis. Neural Architecture Search (NAS), ...
Transparent medical image AI via an imageβtext foundation model grounded in medical literature
[ICLR 2025] Monet: Mixture of Monosemantic Experts for Transformers
Modeling Dense Multimodal Interactions Between Biological Pathways and Histology for Survival Prediction - CVPR 2024
[NeurIPS 2024 Oral] Aligner: Efficient Alignment by Learning to Correct
Wanna know what your model sees? Here's a package for applying EigenCAM and generating heatmap from the new YOLO V11 model
TrustyAI Explainability Toolkit
Time series explainability via self-supervised model behavior consistency
Stanford NLP Python library for understanding and improving PyTorch models via interventions
CausalGym: Benchmarking causal interpretability methods on linguistic tasks
Generating and validating natural-language explanations for the brain.
Sparse and discrete interpretability tool for neural networks