Trending repositories for topic interpretability
A game theoretic approach to explain the output of any machine learning model.
Advanced AI Explainability for computer vision. Support for CNNs, Vision Transformers, Classification, Object detection, Segmentation, Image similarity and more.
A curated list of awesome open source libraries to deploy, monitor, version and scale your machine learning
Stanford NLP Python Library for Understanding and Improving PyTorch Models via Interventions
This repository introduces MentaLLaMA, the first open-source instruction following large language model for interpretable mental health analysis.
Responsible AI Toolbox is a suite of tools providing model and data exploration and assessment user interfaces and libraries that enable a better understanding of AI systems. These interfaces and libr...
A curated list of awesome responsible machine learning resources.
For OpenMOSS Mechanistic Interpretability Team's Sparse Autoencoder (SAE) research.
Modeling Dense Multimodal Interactions Between Biological Pathways and Histology for Survival Prediction - CVPR 2024
Class activation maps for your PyTorch models (CAM, Grad-CAM, Grad-CAM++, Smooth Grad-CAM++, Score-CAM, SS-CAM, IS-CAM, XGrad-CAM, Layer-CAM)
For OpenMOSS Mechanistic Interpretability Team's Sparse Autoencoder (SAE) research.
This repository introduces MentaLLaMA, the first open-source instruction following large language model for interpretable mental health analysis.
Modeling Dense Multimodal Interactions Between Biological Pathways and Histology for Survival Prediction - CVPR 2024
Stanford NLP Python Library for Understanding and Improving PyTorch Models via Interventions
Advanced AI Explainability for computer vision. Support for CNNs, Vision Transformers, Classification, Object detection, Segmentation, Image similarity and more.
Responsible AI Toolbox is a suite of tools providing model and data exploration and assessment user interfaces and libraries that enable a better understanding of AI systems. These interfaces and libr...
A curated list of awesome open source libraries to deploy, monitor, version and scale your machine learning
A game theoretic approach to explain the output of any machine learning model.
A curated list of awesome responsible machine learning resources.
Class activation maps for your PyTorch models (CAM, Grad-CAM, Grad-CAM++, Smooth Grad-CAM++, Score-CAM, SS-CAM, IS-CAM, XGrad-CAM, Layer-CAM)
Fit interpretable models. Explain blackbox machine learning.
A game theoretic approach to explain the output of any machine learning model.
Advanced AI Explainability for computer vision. Support for CNNs, Vision Transformers, Classification, Object detection, Segmentation, Image similarity and more.
A curated list of awesome open source libraries to deploy, monitor, version and scale your machine learning
Stanford NLP Python Library for Understanding and Improving PyTorch Models via Interventions
Class activation maps for your PyTorch models (CAM, Grad-CAM, Grad-CAM++, Smooth Grad-CAM++, Score-CAM, SS-CAM, IS-CAM, XGrad-CAM, Layer-CAM)
Responsible AI Toolbox is a suite of tools providing model and data exploration and assessment user interfaces and libraries that enable a better understanding of AI systems. These interfaces and libr...
The nnsight package enables interpreting and manipulating the internals of deep learned models.
This repository introduces MentaLLaMA, the first open-source instruction following large language model for interpretable mental health analysis.
This is the official repository for HypoGeniC (Hypothesis Generation in Context), which is an automated, data-driven tool that leverages large language models to generate hypothesis for open-domain re...
For OpenMOSS Mechanistic Interpretability Team's Sparse Autoencoder (SAE) research.
Modeling Dense Multimodal Interactions Between Biological Pathways and Histology for Survival Prediction - CVPR 2024
Quantus is an eXplainable AI toolkit for responsible evaluation of neural network explanations
This is the official repository for HypoGeniC (Hypothesis Generation in Context), which is an automated, data-driven tool that leverages large language models to generate hypothesis for open-domain re...
For OpenMOSS Mechanistic Interpretability Team's Sparse Autoencoder (SAE) research.
Code for the paper "Aligning LLM Agents by Learning Latent Preference from User Edits".
[NeurIPS 2024] Official implementation of the paper "MambaLRP: Explaining Selective State Space Sequence Models".
Evaluate interpretability methods on localizing and disentangling concepts in LLMs.
Modeling Dense Multimodal Interactions Between Biological Pathways and Histology for Survival Prediction - CVPR 2024
This repository introduces MentaLLaMA, the first open-source instruction following large language model for interpretable mental health analysis.
Stanford NLP Python Library for Understanding and Improving PyTorch Models via Interventions
The nnsight package enables interpreting and manipulating the internals of deep learned models.
Advanced AI Explainability for computer vision. Support for CNNs, Vision Transformers, Classification, Object detection, Segmentation, Image similarity and more.
Quantus is an eXplainable AI toolkit for responsible evaluation of neural network explanations
Class activation maps for your PyTorch models (CAM, Grad-CAM, Grad-CAM++, Smooth Grad-CAM++, Score-CAM, SS-CAM, IS-CAM, XGrad-CAM, Layer-CAM)
A game theoretic approach to explain the output of any machine learning model.
Advanced AI Explainability for computer vision. Support for CNNs, Vision Transformers, Classification, Object detection, Segmentation, Image similarity and more.
A curated list of awesome open source libraries to deploy, monitor, version and scale your machine learning
An awesome repository & A comprehensive survey on interpretability of LLM attention heads.
A curated list of awesome responsible machine learning resources.
Class activation maps for your PyTorch models (CAM, Grad-CAM, Grad-CAM++, Smooth Grad-CAM++, Score-CAM, SS-CAM, IS-CAM, XGrad-CAM, Layer-CAM)
Stanford NLP Python Library for Understanding and Improving PyTorch Models via Interventions
Responsible AI Toolbox is a suite of tools providing model and data exploration and assessment user interfaces and libraries that enable a better understanding of AI systems. These interfaces and libr...
The nnsight package enables interpreting and manipulating the internals of deep learned models.
A JAX research toolkit for building, editing, and visualizing neural networks.
A library to train, evaluate, interpret, and productionize decision forest models such as Random Forest and Gradient Boosted Decision Trees.
Interpretable ML package 🔍 for concise, transparent, and accurate predictive modeling (sklearn-compatible).
A collection of research materials on explainable AI/ML
For OpenMOSS Mechanistic Interpretability Team's Sparse Autoencoder (SAE) research.
[NeurIPS 2024] Official implementation of the paper "MambaLRP: Explaining Selective State Space Sequence Models".
For OpenMOSS Mechanistic Interpretability Team's Sparse Autoencoder (SAE) research.
A toolkit for quantitative evaluation of data attribution methods.
Code for the paper "Aligning LLM Agents by Learning Latent Preference from User Edits".
This is the official repository for HypoGeniC (Hypothesis Generation in Context), which is an automated, data-driven tool that leverages large language models to generate hypothesis for open-domain re...
Time series explainability via self-supervised model behavior consistency
An awesome repository & A comprehensive survey on interpretability of LLM attention heads.
Code for paper: Are Large Language Models Post Hoc Explainers?
TrustyAI Explainability Toolkit
Generating and validating natural-language explanations.
Evaluate interpretability methods on localizing and disentangling concepts in LLMs.
ConceptVectors Benchmark and Code for the paper "Intrinsic Evaluation of Unlearning Using Parametric Knowledge Traces"
Modeling Dense Multimodal Interactions Between Biological Pathways and Histology for Survival Prediction - CVPR 2024
A JAX research toolkit for building, editing, and visualizing neural networks.
The Truth Is In There: Improving Reasoning in Language Models with Layer-Selective Rank Reduction
An awesome repository & A comprehensive survey on interpretability of LLM attention heads.
For OpenMOSS Mechanistic Interpretability Team's Sparse Autoencoder (SAE) research.
This is the official repository for HypoGeniC (Hypothesis Generation in Context), which is an automated, data-driven tool that leverages large language models to generate hypothesis for open-domain re...
Evaluate interpretability methods on localizing and disentangling concepts in LLMs.
[NeurIPS 2024] Official implementation of the paper "MambaLRP: Explaining Selective State Space Sequence Models".
ConceptVectors Benchmark and Code for the paper "Intrinsic Evaluation of Unlearning Using Parametric Knowledge Traces"
Code for the paper "Aligning LLM Agents by Learning Latent Preference from User Edits".
A curated list of awesome open source libraries to deploy, monitor, version and scale your machine learning
A game theoretic approach to explain the output of any machine learning model.
Advanced AI Explainability for computer vision. Support for CNNs, Vision Transformers, Classification, Object detection, Segmentation, Image similarity and more.
A JAX research toolkit for building, editing, and visualizing neural networks.
Stanford NLP Python Library for Understanding and Improving PyTorch Models via Interventions
A curated list of awesome responsible machine learning resources.
Class activation maps for your PyTorch models (CAM, Grad-CAM, Grad-CAM++, Smooth Grad-CAM++, Score-CAM, SS-CAM, IS-CAM, XGrad-CAM, Layer-CAM)
The nnsight package enables interpreting and manipulating the internals of deep learned models.
The Truth Is In There: Improving Reasoning in Language Models with Layer-Selective Rank Reduction
Responsible AI Toolbox is a suite of tools providing model and data exploration and assessment user interfaces and libraries that enable a better understanding of AI systems. These interfaces and libr...
A collection of research materials on explainable AI/ML
An awesome repository & A comprehensive survey on interpretability of LLM attention heads.
🔅 Shapash: User-friendly Explainability and Interpretability to Develop Reliable and Transparent Machine Learning Models
Interpretable ML package 🔍 for concise, transparent, and accurate predictive modeling (sklearn-compatible).
The Truth Is In There: Improving Reasoning in Language Models with Layer-Selective Rank Reduction
Decomposing and Editing Predictions by Modeling Model Computation
Stanford NLP Python Library for Understanding and Improving PyTorch Models via Interventions
Modeling Dense Multimodal Interactions Between Biological Pathways and Histology for Survival Prediction - CVPR 2024
Time series explainability via self-supervised model behavior consistency
Wanna know what your model sees? Here's a package for applying EigenCAM on the new YOLO V8 model
Sparse and discrete interpretability tool for neural networks
Scikit-learn friendly library to interpret, and prompt-engineer text datasets using large language models.
Tree prompting: easy-to-use scikit-learn interface for improved prompting.
This repository introduces MentaLLaMA, the first open-source instruction following large language model for interpretable mental health analysis.
TrustyAI Explainability Toolkit
Code for paper: Are Large Language Models Post Hoc Explainers?
CVPR 2023: Language in a Bottle: Language Model Guided Concept Bottlenecks for Interpretable Image Classification
Python library to explain Tree Ensemble models (TE) like XGBoost, using a rule list.
Repository for our NeurIPS 2022 paper "Concept Embedding Models: Beyond the Accuracy-Explainability Trade-Off" and our NeurIPS 2023 paper "Learning to Receive Help: Intervention-Aware Concept Embeddin...