Trending repositories for topic sre
A curated list of amazingly awesome open-source sysadmin resources.
Linux, Jenkins, AWS, SRE, Prometheus, Docker, Python, Ansible, Git, Kubernetes, Terraform, OpenStack, SQL, NoSQL, Azure, GCP, DNS, Elastic, Network, Virtualization. DevOps Interview Questions
⭐ 【开源书籍】深入讲解内核网络、Kubernetes、ServiceMesh、容器等云原生相关技术。经历实践检验的 DevOps、SRE指南。如发现错误,谢谢提issue
A curated list of Site Reliability and Production Engineering resources.
On-Call Assistant for Prometheus Alerts - Get a head start on fixing alerts with AI investigation
A curated list of awesome DevOps platforms, tools, practices and resources
Enable Self-Service Operations: Give specific users access to your existing tools, services, and scripts
StackStorm (aka "IFTTT for Ops") is event-driven automation for auto-remediation, incident responses, troubleshooting, deployments, and more for DevOps and SREs. Includes rules engine, workflow, 160 i...
NixOS Guide. Learn all about the immutable Nix Operating System and the declarative Nix Expression Language.
A Frida based tool that traces usage of the JNI API in Android apps.
(Chinese Only)Everything I know: DevOps & CloudNative, Linux, Embedded, Homelab, Music, Blockchain, AI, etc...
A curated list of Site Reliability and Production Engineering Tools
On-Call Assistant for Prometheus Alerts - Get a head start on fixing alerts with AI investigation
NixOS Guide. Learn all about the immutable Nix Operating System and the declarative Nix Expression Language.
(Chinese Only)Everything I know: DevOps & CloudNative, Linux, Embedded, Homelab, Music, Blockchain, AI, etc...
A curated list of awesome DevOps platforms, tools, practices and resources
A curated list of amazingly awesome open-source sysadmin resources.
A Frida based tool that traces usage of the JNI API in Android apps.
⭐ 【开源书籍】深入讲解内核网络、Kubernetes、ServiceMesh、容器等云原生相关技术。经历实践检验的 DevOps、SRE指南。如发现错误,谢谢提issue
A curated list of Site Reliability and Production Engineering Tools
Enable Self-Service Operations: Give specific users access to your existing tools, services, and scripts
A curated list of Site Reliability and Production Engineering resources.
Linux, Jenkins, AWS, SRE, Prometheus, Docker, Python, Ansible, Git, Kubernetes, Terraform, OpenStack, SQL, NoSQL, Azure, GCP, DNS, Elastic, Network, Virtualization. DevOps Interview Questions
StackStorm (aka "IFTTT for Ops") is event-driven automation for auto-remediation, incident responses, troubleshooting, deployments, and more for DevOps and SREs. Includes rules engine, workflow, 160 i...
A curated list of amazingly awesome open-source sysadmin resources.
Linux, Jenkins, AWS, SRE, Prometheus, Docker, Python, Ansible, Git, Kubernetes, Terraform, OpenStack, SQL, NoSQL, Azure, GCP, DNS, Elastic, Network, Virtualization. DevOps Interview Questions
On-Call Assistant for Prometheus Alerts - Get a head start on fixing alerts with AI investigation
A curated list of Site Reliability and Production Engineering resources.
⭐ 【开源书籍】深入讲解内核网络、Kubernetes、ServiceMesh、容器等云原生相关技术。经历实践检验的 DevOps、SRE指南。如发现错误,谢谢提issue
A curated list of awesome DevOps platforms, tools, practices and resources
Enable Self-Service Operations: Give specific users access to your existing tools, services, and scripts
StackStorm (aka "IFTTT for Ops") is event-driven automation for auto-remediation, incident responses, troubleshooting, deployments, and more for DevOps and SREs. Includes rules engine, workflow, 160 i...
A curated list of Site Reliability and Production Engineering Tools
NixOS Guide. Learn all about the immutable Nix Operating System and the declarative Nix Expression Language.
At LinkedIn, we are using this curriculum for onboarding our entry-level talents into the SRE role.
CDN Up and Running - Building a CDN from Scratch to Learn about CDN, Nginx, Lua, Prometheus, Grafana, Load balancing, and Containers.
A curated collection of publicly available resources on how technology and tech-savvy organizations around the world practice Site Reliability Engineering (SRE)
A checklist of anyone practicing Site Reliability Engineering
On-Call Assistant for Prometheus Alerts - Get a head start on fixing alerts with AI investigation
Website, courses, documentation, blog and youtube video tracker.
NixOS Guide. Learn all about the immutable Nix Operating System and the declarative Nix Expression Language.
A curated list of Site Reliability and Production Engineering Tools
A curated list of awesome DevOps platforms, tools, practices and resources
Slo-exporter computes standardized SLI and SLO metrics based on events coming from various data sources.
(Chinese Only)Everything I know: DevOps & CloudNative, Linux, Embedded, Homelab, Music, Blockchain, AI, etc...
𝖫𝗂𝗇𝗎𝗑, 𝖩𝖾𝗇𝗄𝗂𝗇𝗌, 𝖠𝖶𝖲, 𝖲𝖱𝖤, 𝖯𝗋𝗈𝗆𝖾𝗍𝗁𝖾𝗎𝗌, 𝖣𝗈𝖼𝗄𝖾𝗋, 𝖯𝗒𝗍𝗁𝗈𝗇, 𝖠𝗇𝗌𝗂𝖻𝗅𝖾, 𝖦𝗂𝗍, 𝖪𝗎𝖻𝖾𝗋𝗇𝖾𝗍𝖾𝗌, 𝖳𝖾𝗋𝗋𝖺𝖿𝗈𝗋𝗆, 𝖮𝗉𝖾𝗇𝖲𝗍𝖺𝖼𝗄, 𝖲𝖰𝖫, 𝖭𝗈𝖲𝖰𝖫, ...
A curated list of amazingly awesome open-source sysadmin resources.
Automatically capture and surface your team's tribal knowledge
Automatic SRE Superpowers within your Kubernetes cluster
⭐ 【开源书籍】深入讲解内核网络、Kubernetes、ServiceMesh、容器等云原生相关技术。经历实践检验的 DevOps、SRE指南。如发现错误,谢谢提issue
Linux Bash Shell Script and Python Script For Ops and Devops
Enable Self-Service Operations: Give specific users access to your existing tools, services, and scripts
A curated list of amazingly awesome open-source sysadmin resources.
Linux, Jenkins, AWS, SRE, Prometheus, Docker, Python, Ansible, Git, Kubernetes, Terraform, OpenStack, SQL, NoSQL, Azure, GCP, DNS, Elastic, Network, Virtualization. DevOps Interview Questions
On-Call Assistant for Prometheus Alerts - Get a head start on fixing alerts with AI investigation
⭐ 【开源书籍】深入讲解内核网络、Kubernetes、ServiceMesh、容器等云原生相关技术。经历实践检验的 DevOps、SRE指南。如发现错误,谢谢提issue
A curated list of Site Reliability and Production Engineering resources.
A curated list of awesome DevOps platforms, tools, practices and resources
A curated collection of publicly available resources on how technology and tech-savvy organizations around the world practice Site Reliability Engineering (SRE)
Enable Self-Service Operations: Give specific users access to your existing tools, services, and scripts
StackStorm (aka "IFTTT for Ops") is event-driven automation for auto-remediation, incident responses, troubleshooting, deployments, and more for DevOps and SREs. Includes rules engine, workflow, 160 i...
At LinkedIn, we are using this curriculum for onboarding our entry-level talents into the SRE role.
A curated list of Site Reliability and Production Engineering Tools
NixOS Guide. Learn all about the immutable Nix Operating System and the declarative Nix Expression Language.
CDN Up and Running - Building a CDN from Scratch to Learn about CDN, Nginx, Lua, Prometheus, Grafana, Load balancing, and Containers.
On-Call Assistant for Prometheus Alerts - Get a head start on fixing alerts with AI investigation
Website, courses, documentation, blog and youtube video tracker.
𝖫𝗂𝗇𝗎𝗑, 𝖩𝖾𝗇𝗄𝗂𝗇𝗌, 𝖠𝖶𝖲, 𝖲𝖱𝖤, 𝖯𝗋𝗈𝗆𝖾𝗍𝗁𝖾𝗎𝗌, 𝖣𝗈𝖼𝗄𝖾𝗋, 𝖯𝗒𝗍𝗁𝗈𝗇, 𝖠𝗇𝗌𝗂𝖻𝗅𝖾, 𝖦𝗂𝗍, 𝖪𝗎𝖻𝖾𝗋𝗇𝖾𝗍𝖾𝗌, 𝖳𝖾𝗋𝗋𝖺𝖿𝗈𝗋𝗆, 𝖮𝗉𝖾𝗇𝖲𝗍𝖺𝖼𝗄, 𝖲𝖰𝖫, 𝖭𝗈𝖲𝖰𝖫, ...
Telegram channels & groups about DevOps, SRE, and Platform Engineering.
A curated list of Platform Engineering Tools
A curated list of awesome DevOps platforms, tools, practices and resources
NixOS Guide. Learn all about the immutable Nix Operating System and the declarative Nix Expression Language.
Welcome To The World of DevOps. An ongoing & curated collection of awesome software, libraries, learning tutorials, tools and resources and cool stuff about DevOps.
A sample tool for users of Microsoft SQL Server to aid in troubleshooting otherwise difficult to diagnose issues. Provided AS-IS - see SUPPORT.md.
Automatic SRE Superpowers within your Kubernetes cluster
An active monitoring software to detect failures before your customers do.
Curated Lists of Learning Resources for DevOps, SRE, Cloud & Engineering Management
A curated list of Site Reliability and Production Engineering Tools
Kaytu's AI platform boosts cloud efficiency by analyzing historical usage and delivering intelligent recommendations—such as optimizing instance sizes—that maintain reliability. Pay for what you need,...
On-Call Assistant for Prometheus Alerts - Get a head start on fixing alerts with AI investigation
Telegram channels & groups about DevOps, SRE, and Platform Engineering.
Highly scalable and available reference architecture for Terragrunt.
A Kubernetes controller that modifies the CPU and/or memory resources of containers depending on whether they're starting up, according to the startup/post-startup settings you supply.
Linux, Jenkins, AWS, SRE, Prometheus, Docker, Python, Ansible, Git, Kubernetes, Terraform, OpenStack, SQL, NoSQL, Azure, GCP, DNS, Elastic, Network, Virtualization. DevOps Interview Questions
A curated list of amazingly awesome open-source sysadmin resources.
⭐ 【开源书籍】深入讲解内核网络、Kubernetes、ServiceMesh、容器等云原生相关技术。经历实践检验的 DevOps、SRE指南。如发现错误,谢谢提issue
A curated list of Site Reliability and Production Engineering resources.
Kaytu's AI platform boosts cloud efficiency by analyzing historical usage and delivering intelligent recommendations—such as optimizing instance sizes—that maintain reliability. Pay for what you need,...
A curated list of awesome DevOps platforms, tools, practices and resources
CDN Up and Running - Building a CDN from Scratch to Learn about CDN, Nginx, Lua, Prometheus, Grafana, Load balancing, and Containers.
On-Call Assistant for Prometheus Alerts - Get a head start on fixing alerts with AI investigation
Enable Self-Service Operations: Give specific users access to your existing tools, services, and scripts
A curated collection of publicly available resources on how technology and tech-savvy organizations around the world practice Site Reliability Engineering (SRE)
NixOS Guide. Learn all about the immutable Nix Operating System and the declarative Nix Expression Language.
At LinkedIn, we are using this curriculum for onboarding our entry-level talents into the SRE role.
StackStorm (aka "IFTTT for Ops") is event-driven automation for auto-remediation, incident responses, troubleshooting, deployments, and more for DevOps and SREs. Includes rules engine, workflow, 160 i...
A curated list of Platform Engineering Tools
Website, courses, documentation, blog and youtube video tracker.
On-Call Assistant for Prometheus Alerts - Get a head start on fixing alerts with AI investigation
Telegram channels & groups about DevOps, SRE, and Platform Engineering.
A Kubernetes controller that modifies the CPU and/or memory resources of containers depending on whether they're starting up, according to the startup/post-startup settings you supply.
𝖫𝗂𝗇𝗎𝗑, 𝖩𝖾𝗇𝗄𝗂𝗇𝗌, 𝖠𝖶𝖲, 𝖲𝖱𝖤, 𝖯𝗋𝗈𝗆𝖾𝗍𝗁𝖾𝗎𝗌, 𝖣𝗈𝖼𝗄𝖾𝗋, 𝖯𝗒𝗍𝗁𝗈𝗇, 𝖠𝗇𝗌𝗂𝖻𝗅𝖾, 𝖦𝗂𝗍, 𝖪𝗎𝖻𝖾𝗋𝗇𝖾𝗍𝖾𝗌, 𝖳𝖾𝗋𝗋𝖺𝖿𝗈𝗋𝗆, 𝖮𝗉𝖾𝗇𝖲𝗍𝖺𝖼𝗄, 𝖲𝖰𝖫, 𝖭𝗈𝖲𝖰𝖫, ...
TerraDagger is a Go package for managing your infrastructure-as-code through containers.
NixOS Guide. Learn all about the immutable Nix Operating System and the declarative Nix Expression Language.
Automatic SRE Superpowers within your Kubernetes cluster
Terraform modules for rapidly building production-grade Kubernetes clusters following SRE practices.
⭐ 【开源书籍】深入讲解内核网络、Kubernetes、ServiceMesh、容器等云原生相关技术。经历实践检验的 DevOps、SRE指南。如发现错误,谢谢提issue
Curated Lists of Learning Resources for DevOps, SRE, Cloud & Engineering Management
An active monitoring software to detect failures before your customers do.