Trending repositories for topic interpretability
A game theoretic approach to explain the output of any machine learning model.
Advanced AI Explainability for computer vision. Support for CNNs, Vision Transformers, Classification, Object detection, Segmentation, Image similarity and more.
A curated list of awesome open source libraries to deploy, monitor, version and scale your machine learning
A curated list of awesome responsible machine learning resources.
The nnsight package enables interpreting and manipulating the internals of deep learned models.
Responsible AI Toolbox is a suite of tools providing model and data exploration and assessment user interfaces and libraries that enable a better understanding of AI systems. These interfaces and libr...
A JAX research toolkit for building, editing, and visualizing neural networks.
🔅 Shapash: User-friendly Explainability and Interpretability to Develop Reliable and Transparent Machine Learning Models
Modeling Dense Multimodal Interactions Between Biological Pathways and Histology for Survival Prediction - CVPR 2024
Stanford NLP Python Library for Understanding and Improving PyTorch Models via Interventions
Class activation maps for your PyTorch models (CAM, Grad-CAM, Grad-CAM++, Smooth Grad-CAM++, Score-CAM, SS-CAM, IS-CAM, XGrad-CAM, Layer-CAM)
[NeurIPS 2024] Knowledge Circuits in Pretrained Transformers
Modeling Dense Multimodal Interactions Between Biological Pathways and Histology for Survival Prediction - CVPR 2024
The nnsight package enables interpreting and manipulating the internals of deep learned models.
Responsible AI Toolbox is a suite of tools providing model and data exploration and assessment user interfaces and libraries that enable a better understanding of AI systems. These interfaces and libr...
Stanford NLP Python Library for Understanding and Improving PyTorch Models via Interventions
A curated list of awesome responsible machine learning resources.
Advanced AI Explainability for computer vision. Support for CNNs, Vision Transformers, Classification, Object detection, Segmentation, Image similarity and more.
A JAX research toolkit for building, editing, and visualizing neural networks.
🔅 Shapash: User-friendly Explainability and Interpretability to Develop Reliable and Transparent Machine Learning Models
A game theoretic approach to explain the output of any machine learning model.
A curated list of awesome open source libraries to deploy, monitor, version and scale your machine learning
Class activation maps for your PyTorch models (CAM, Grad-CAM, Grad-CAM++, Smooth Grad-CAM++, Score-CAM, SS-CAM, IS-CAM, XGrad-CAM, Layer-CAM)
Fit interpretable models. Explain blackbox machine learning.
Advanced AI Explainability for computer vision. Support for CNNs, Vision Transformers, Classification, Object detection, Segmentation, Image similarity and more.
A game theoretic approach to explain the output of any machine learning model.
A curated list of awesome open source libraries to deploy, monitor, version and scale your machine learning
Responsible AI Toolbox is a suite of tools providing model and data exploration and assessment user interfaces and libraries that enable a better understanding of AI systems. These interfaces and libr...
Class activation maps for your PyTorch models (CAM, Grad-CAM, Grad-CAM++, Smooth Grad-CAM++, Score-CAM, SS-CAM, IS-CAM, XGrad-CAM, Layer-CAM)
A curated list of awesome responsible machine learning resources.
The nnsight package enables interpreting and manipulating the internals of deep learned models.
A JAX research toolkit for building, editing, and visualizing neural networks.
🔅 Shapash: User-friendly Explainability and Interpretability to Develop Reliable and Transparent Machine Learning Models
Stanford NLP Python Library for Understanding and Improving PyTorch Models via Interventions
An awesome repository & A comprehensive survey on interpretability of LLM attention heads.
Model explainability that works seamlessly with 🤗 transformers. Explain your transformers model in just 2 lines of code.
Generating and validating natural-language explanations.
For OpenMOSS Mechanistic Interpretability Team's Sparse Autoencoder (SAE) research.
[NeurIPS 2024] Knowledge Circuits in Pretrained Transformers
Generating and validating natural-language explanations.
For OpenMOSS Mechanistic Interpretability Team's Sparse Autoencoder (SAE) research.
The nnsight package enables interpreting and manipulating the internals of deep learned models.
The code of NeurIPS 2021 paper "Scalable Rule-Based Representation Learning for Interpretable Classification" and TPAMI paper "Learning Interpretable Rules for Scalable Data Representation and Classif...
Modeling Dense Multimodal Interactions Between Biological Pathways and Histology for Survival Prediction - CVPR 2024
Responsible AI Toolbox is a suite of tools providing model and data exploration and assessment user interfaces and libraries that enable a better understanding of AI systems. These interfaces and libr...
An awesome repository & A comprehensive survey on interpretability of LLM attention heads.
Stanford NLP Python Library for Understanding and Improving PyTorch Models via Interventions
Awesome Resources for Advanced Computer Vision Topics
Advanced AI Explainability for computer vision. Support for CNNs, Vision Transformers, Classification, Object detection, Segmentation, Image similarity and more.
Class activation maps for your PyTorch models (CAM, Grad-CAM, Grad-CAM++, Smooth Grad-CAM++, Score-CAM, SS-CAM, IS-CAM, XGrad-CAM, Layer-CAM)
A JAX research toolkit for building, editing, and visualizing neural networks.
A game theoretic approach to explain the output of any machine learning model.
Advanced AI Explainability for computer vision. Support for CNNs, Vision Transformers, Classification, Object detection, Segmentation, Image similarity and more.
A curated list of awesome open source libraries to deploy, monitor, version and scale your machine learning
Responsible AI Toolbox is a suite of tools providing model and data exploration and assessment user interfaces and libraries that enable a better understanding of AI systems. These interfaces and libr...
Class activation maps for your PyTorch models (CAM, Grad-CAM, Grad-CAM++, Smooth Grad-CAM++, Score-CAM, SS-CAM, IS-CAM, XGrad-CAM, Layer-CAM)
The nnsight package enables interpreting and manipulating the internals of deep learned models.
A curated list of awesome responsible machine learning resources.
A JAX research toolkit for building, editing, and visualizing neural networks.
An awesome repository & A comprehensive survey on interpretability of LLM attention heads.
Stanford NLP Python Library for Understanding and Improving PyTorch Models via Interventions
A collection of research materials on explainable AI/ML
For OpenMOSS Mechanistic Interpretability Team's Sparse Autoencoder (SAE) research.
🔅 Shapash: User-friendly Explainability and Interpretability to Develop Reliable and Transparent Machine Learning Models
[NeurIPS 2024] Knowledge Circuits in Pretrained Transformers
For OpenMOSS Mechanistic Interpretability Team's Sparse Autoencoder (SAE) research.
Code for the paper "Aligning LLM Agents by Learning Latent Preference from User Edits".
CausalGym: Benchmarking causal interpretability methods on linguistic tasks
The code of NeurIPS 2021 paper "Scalable Rule-Based Representation Learning for Interpretable Classification" and TPAMI paper "Learning Interpretable Rules for Scalable Data Representation and Classif...
An awesome repository & A comprehensive survey on interpretability of LLM attention heads.
Generating and validating natural-language explanations.
ConceptVectors Benchmark and Code for the paper "Intrinsic Evaluation of Unlearning Using Parametric Knowledge Traces"
[NeurIPS 2024] Official implementation of the paper "MambaLRP: Explaining Selective State Space Sequence Models".
The nnsight package enables interpreting and manipulating the internals of deep learned models.
Modeling Dense Multimodal Interactions Between Biological Pathways and Histology for Survival Prediction - CVPR 2024
Time series explainability via self-supervised model behavior consistency
Python library to explain Tree Ensemble models (TE) like XGBoost, using a rule list.
A JAX research toolkit for building, editing, and visualizing neural networks.
An awesome repository & A comprehensive survey on interpretability of LLM attention heads.
For OpenMOSS Mechanistic Interpretability Team's Sparse Autoencoder (SAE) research.
This is the official repository for HypoGeniC (Hypothesis Generation in Context), which is an automated, data-driven tool that leverages large language models to generate hypothesis for open-domain re...
Evaluate interpretability methods on localizing and disentangling concepts in LLMs.
[NeurIPS 2024] Official implementation of the paper "MambaLRP: Explaining Selective State Space Sequence Models".
ConceptVectors Benchmark and Code for the paper "Intrinsic Evaluation of Unlearning Using Parametric Knowledge Traces"
Code for the paper "Aligning LLM Agents by Learning Latent Preference from User Edits".
A curated list of awesome open source libraries to deploy, monitor, version and scale your machine learning
A game theoretic approach to explain the output of any machine learning model.
Advanced AI Explainability for computer vision. Support for CNNs, Vision Transformers, Classification, Object detection, Segmentation, Image similarity and more.
A JAX research toolkit for building, editing, and visualizing neural networks.
Stanford NLP Python Library for Understanding and Improving PyTorch Models via Interventions
A curated list of awesome responsible machine learning resources.
Class activation maps for your PyTorch models (CAM, Grad-CAM, Grad-CAM++, Smooth Grad-CAM++, Score-CAM, SS-CAM, IS-CAM, XGrad-CAM, Layer-CAM)
The nnsight package enables interpreting and manipulating the internals of deep learned models.
The Truth Is In There: Improving Reasoning in Language Models with Layer-Selective Rank Reduction
Responsible AI Toolbox is a suite of tools providing model and data exploration and assessment user interfaces and libraries that enable a better understanding of AI systems. These interfaces and libr...
An awesome repository & A comprehensive survey on interpretability of LLM attention heads.
A collection of research materials on explainable AI/ML
🔅 Shapash: User-friendly Explainability and Interpretability to Develop Reliable and Transparent Machine Learning Models
Interpretable ML package 🔍 for concise, transparent, and accurate predictive modeling (sklearn-compatible).
The Truth Is In There: Improving Reasoning in Language Models with Layer-Selective Rank Reduction
The nnsight package enables interpreting and manipulating the internals of deep learned models.
Decomposing and Editing Predictions by Modeling Model Computation
Stanford NLP Python Library for Understanding and Improving PyTorch Models via Interventions
Modeling Dense Multimodal Interactions Between Biological Pathways and Histology for Survival Prediction - CVPR 2024
Time series explainability via self-supervised model behavior consistency
Scikit-learn friendly library to interpret, and prompt-engineer text datasets using large language models.
Wanna know what your model sees? Here's a package for applying EigenCAM on the new YOLO V8 model
TrustyAI Explainability Toolkit
Sparse and discrete interpretability tool for neural networks
This repository introduces MentaLLaMA, the first open-source instruction following large language model for interpretable mental health analysis.
Tree prompting: easy-to-use scikit-learn interface for improved prompting.
Code for paper: Are Large Language Models Post Hoc Explainers?
Python library to explain Tree Ensemble models (TE) like XGBoost, using a rule list.
Generating and validating natural-language explanations.