Statistics for topic interpretability
RepositoryStats tracks 642,777 Github repositories, of these 186 are tagged with the interpretability topic. The most common primary language for repositories using this topic is Python (92). Other languages include: Jupyter Notebook (52)
Stargazers over time for topic interpretability
Most starred repositories for topic interpretability (view more)
Trending repositories for topic interpretability (view more)
A game theoretic approach to explain the output of any machine learning model.
Advanced AI Explainability for computer vision. Support for CNNs, Vision Transformers, Classification, Object detection, Segmentation, Image similarity and more.
A curated list of awesome open source libraries to deploy, monitor, version and scale your machine learning
[ICLR 2025] Code and Data Repo for Paper "Latent Space Chain-of-Embedding Enables Output-free LLM Self-Evaluation"
[ICLR 2025] Code and Data Repo for Paper "Latent Space Chain-of-Embedding Enables Output-free LLM Self-Evaluation"
Wanna know what your model sees? Here's a package for applying EigenCAM and generating heatmap from the new YOLO V11 model
This repository introduces MentaLLaMA, the first open-source instruction following large language model for interpretable mental health analysis.
A game theoretic approach to explain the output of any machine learning model.
Advanced AI Explainability for computer vision. Support for CNNs, Vision Transformers, Classification, Object detection, Segmentation, Image similarity and more.
A curated list of awesome open source libraries to deploy, monitor, version and scale your machine learning
[ICLR 2025] Code and Data Repo for Paper "Latent Space Chain-of-Embedding Enables Output-free LLM Self-Evaluation"
How Do LLMs Acquire New Knowledge? A Knowledge Circuits Perspective on Continual Pre-Training
A curated list of resources for activation engineering
Wanna know what your model sees? Here's a package for applying EigenCAM and generating heatmap from the new YOLO V11 model
Love2D LSP (VS Code / Neovim / Zed / etc.) extension for live coding and live variable tracking
Advanced AI Explainability for computer vision. Support for CNNs, Vision Transformers, Classification, Object detection, Segmentation, Image similarity and more.
A game theoretic approach to explain the output of any machine learning model.
Love2D LSP (VS Code / Neovim / Zed / etc.) extension for live coding and live variable tracking
A curated list of awesome open source libraries to deploy, monitor, version and scale your machine learning
How Do LLMs Acquire New Knowledge? A Knowledge Circuits Perspective on Continual Pre-Training
A curated list of resources for activation engineering
[ICLR 2025] Code and Data Repo for Paper "Latent Space Chain-of-Embedding Enables Output-free LLM Self-Evaluation"
PyTorch Implementation of CausalFormer: An Interpretable Transformer for Temporal Causal Discovery
An awesome repository & A comprehensive survey on interpretability of LLM attention heads.
Love2D LSP (VS Code / Neovim / Zed / etc.) extension for live coding and live variable tracking
Stanford NLP Python library for benchmarking the utility of LLM interpretability methods
A curated list of awesome open source libraries to deploy, monitor, version and scale your machine learning
A game theoretic approach to explain the output of any machine learning model.
Advanced AI Explainability for computer vision. Support for CNNs, Vision Transformers, Classification, Object detection, Segmentation, Image similarity and more.
Stanford NLP Python library for Representation Finetuning (ReFT)
[NeurIPS 2024] Knowledge Circuits in Pretrained Transformers
For OpenMOSS Mechanistic Interpretability Team's Sparse Autoencoder (SAE) research.
ADHDeepNet is a model that integrates temporal and spatial characterization, attention modules, and explainability techniques, optimized for EEG data ADAD diagnosis. Neural Architecture Search (NAS), ...
Evaluate interpretability methods on localizing and disentangling concepts in LLMs.