• Resources
  • 413 Site Reliability and Production Engineering Resources & Tools

What is Site Reliability Engineering (SRE)? Fundamentally, it’s what happens when you ask a software engineer to design an operations function. SRE is a people discipline focused on the reliability, availability, and performance of software systems, whether web applications or systems software. SRE is a specialized team role, not a job description. SRE is a subset of Site Reliability Engineering, a methodology for designing, building, and operating large distributed systems reliably.

Site Reliability Engineering is a management philosophy introduced by Google in 2008 to describe its internal operations model. The goal of the site reliability engineering team is to create and maintain a platform that can be easily and frequently deployed and updated without any disruption to either services or users. To achieve this goal, the SRE team usually works closely with other teams, such as developers and designers. On large sites, the SRE team also maintains an organizational structure that allows it to move quickly and coordinate projects.

This post is a curated list of awesome Site Reliability and Production Engineering resources. These resources include books, articles, blogs, newsletters covering various topics such as culture, reliability, monitoring, planning, SLA and many more.

Books

  1. 3 Free SRE Ebooks by Google

  2. Post-Incident Reviews: Learning from Failure for Improved Incident Responses

  3. How to Monitoring the SRE Golden Signals (E-Book)

  4. Engineering Reliable Mobile Applications: Strategies for Developing Resilient Native Mobile Applications

Culture

  1. What is Site Reliability Engineering?

  2. Keys To SRE by Ben Treynor

  3. Google SRE Resources

  4. Notes from Production Engineering by Pedro Canahuati

  5. PostOps: Recovery from Operations

  6. Love DevOps? Wait ’till you meet SRE

  7. How Google Does Planet-Scale Engineering for Planet-Scale Infra

  8. Site Reliability Engineering at Facebook

  9. A History of Site Reliability Engineering at Uber

  10. Case Study: Adopting SRE Principles at StackOverflow

  11. Site Reliability Engineering at Dropbox

  12. Site Reliability Engineers — Keeping Google up and running 24/7

  13. From Sys Admin to Netflix SRE

  14. SRE@Google: Thousands of DevOps Since 2004

  15. Transactional System Administration Is Killing Us and Must be Stopped

  16. A hierarchy of SRE needs

  17. PostOps: A Non-Surgical Tale of Software, Fragility, and Reliability

  18. SRE: An incomplete guide to cultural Narnia

  19. Putting Together Great SRE Teams

  20. Toil: A Word Every Engineer Should Know

  21. Engineering Reliability into Web Sites: Google SRE

  22. DEVOPS & SRE AMA – Building High Performance Organizations

  23. John Allspaw’s AMA on Incident Analysis and Postmortems

  24. Site Reliability Engineering with Paul Newson

  25. How SysAdmins Devalue Themselves

  26. The Softer Side of DevOps

  27. SRE, noun. See also: confidence, trust.

  28. Site Reliability Engineering with Stephen Weinberg

  29. We are the Google Site Reliability team. We make Google’s websites work. Ask us Anything!

  30. We are the Google Site Reliability Engineering team. Ask us Anything!

  31. The Ops Identity Crisis

  32. The Irreproducibility Of Bugs In Large-Scale Production Systems

  33. SE-Radio Episode 276: Björn Rabenstein on Site Reliability Engineering

  34. Microservices, DevOps and Production Complexity

  35. Introducing Google Customer Reliability Engineering

  36. Evolution or Rebellion? The rise of Site Reliability Engineers (SRE)

  37. The difference between Site Reliability Engineering, System Administration, and DevOps

  38. SRE in the Small and in the Large

  39. SBSRE Meetup: Different SRE roles and challenges(Netflix)

  40. Panel: Who/What Is SRE?

  41. Hope Is Not a Strategy

  42. Tenets of SRE

  43. Site Reliability Engineering Demystified

  44. Is Site Reliability Engineering the True ‘Ops’ in DevOps?

  45. SRE vs. DevOps vs. Cloud Native: The Server Cage Match

  46. SRE: What’s The Big Idea?

  47. Building the SRE Culture at LinkedIn

  48. Podcast #111 – SRE: Occasionally Maintaining Infrastructure That You Hate

  49. Splicing SRE DNA Sequences in the Biggest Software Company on the Planet

  50. Why should your app get SRE support? – CRE life lessons

  51. How SREs find the landmines in a service – CRE life lessons

  52. Making the most of an SRE service takeover – CRE life lessons

  53. The Cloudcast #301: SRE and Infrastructure Operations (Podcast)

  54. The SRE model

  55. Onboarding New Site Reliability Engineers

  56. Building Blocks for Site Reliability At Google

  57. Beyond Google SRE: What is Site Reliability Engineering like at Medium?

  58. Intelligent Site Reliability Engineering – A Machine Learning Perspective

  59. A crash course in LinkedIn’s global site operations

  60. Google’s Site Reliability Engineering with Todd Underwood

  61. What is Site Reliability Engineering? (VMware)

  62. A Gentle Introduction to SRE

  63. Understanding Site Reliability Engineering through Movies and Books

  64. GOTO 2017 • Site Reliability Engineering at Google • Christof Leng

  65. The Makeup of Successful Geographically-Distributed SRE Teams

  66. Tech Leadership in SRE

  67. The Azure Podcast: Episode 227 – Azure SRE

  68. The human scalability of “DevOps”

  69. Podcast: Site Reliability Management with Mike Hiraga

  70. How a cat inspired system reliability at Knowlarity

  71. Getting Started with Site Reliability Engineering

  72. Practical Applications of the Dickerson Pyramid by Nat Welch

  73. LinkedIn’s Kurt Andersen Uncovers Blindspots in SRE Implementations

  74. Interview with Betsy Beyer, Stephen Thorne of Google

  75. Less Risk Through Greater Humanity – Dave Rensin

  76. Getting Started with SRE – Stephen Thorne, Google

  77. Building Successful SRE in Large Enterprises

  78. Solving Reliability Fears with Site Reliability Engineering

  79. SRE vs. DevOps: competing standards or close friends?

  80. How to Avoid the 5 SRE Implementation Traps that Catch Even the Best Teams

  81. Reliability Engineering – The Essential Discipline for Complex Systems

  82. The Modern Site Reliability Workbench on Top of OCI

  83. SRE in the Third Age

  84. About SRE and how (not) to apply it

  85. Transitioning a typical engineering ops team into an SRE powerhouse

  86. Making a Lion Bulletproof: SRE in Banking

  87. Identifying and tracking toil using SRE principles

  88. From Ops to SRE: Evolution of the OpenShift Dedicated

RELATED

Other Related Posts

  1. 3 Free Site Reliability Engineering (SRE) Ebooks by Google – 2020
    SRE is what you get when you treat operations as if it’s software problem. 3 Free Ebooks on SRE – Building Secure and Reliable Systems, The Site Reliability Workbook and Site Reliability Engineering.

  2. Problem-Solving Web Design: Strategies for Efficient Websites – 2018
    This ebook is all devoted to strategies and practices of problem-solving web design. We offer you an overview of the practical questions that could arise in the process of creating websites for different purposes.

  3. Other Free Web Design Ebooks and Resources

via getfreeebooks.com

Team

  1. Meeting reliability challenges with SRE principles

  2. A quick introduction to SRE principles

  3. The SRE I Aspire to Be

  4. Taming Operational Load with VMware CRE

  5. SRE Cultural Values

  6. Are we there yet? Thoughts on assessing an SRE team’s maturity

Education

  1. Panel: Educating SRE

  2. From Zero to Hero: Recommended Practices for Training your Ever-Evolving SRE Teams

  3. New to an SRE team?

  4. The Systems Engineering Side of Site Reliability Engineering

  5. Graduating from Bootcamp and interested in becoming a Site Reliability Engineer?

  6. So you want to be a Site Reliability Engineer? 1

  7. Spiraling Ops Debt & the SRE Coding Imperative

  8. So you want to be an SRE?

  9. Career Profiles/Site Reliability Engineer

  10. What is the role of a Site Reliability Engineer?

  11. Lynda.com: DevOps Foundations: Site Reliability Engineering

  12. Incident Management Training: Wheel of Misfortune

  13. The Ultimate Guide to Structuring a 90-Day Onboarding Plan

  14. SRE fundamentals: SLIs, SLAs and SLOs

  15. How to Get Into SRE

  16. Do you have an SRE team yet? How to start and assess your journey

  17. How SRE teams are organized, and how to get started

  18. Why SRE Documents Matter

  19. How to get started with site reliability engineering (SRE)

  20. Duties of a Site Reliability Engineering Manager

  21. Designing distributed systems using NALSD flashcards

  22. Training Site Reliability Engineers: What Your Organization Needs to Create a Learning Program

  23. SRE Classroom: Distributed PubSub workshop

  24. School of SRE: Curriculum for onboarding non-traditional hires and new grads

Hiring

  1. SRE Hiring

  2. Hiring SREs at LinkedIn

  3. Hiring Site Reliability Engineers

  4. Hiring your first SRE

  5. Growing the Site Reliability Team at LinkedIn: Hiring is Hard

  6. Engineering Manager – Site Reliability Engineering Interview Preparation

Reliability

  1. The Realities of the Job of Delivering Reliability

  2. Fail at Scale by Ben Maurer

  3. Embracing Failure: Fault-Injection and Service Reliability

  4. 10 Years of Crashing Google

  5. How we break things at Twitter: failure testing

  6. Reliable Cron across the Planet

  7. Push our limits – reliability testing at Twitter

  8. Weathering the Unexpected

  9. SRE Hour: Tech Talks by Box & Yelp

  10. Simplicity: A Prerequisite for Reliability

  11. The Two Sides to Google Infrastructure for Everyone Else

  12. How Embracing Continuous Release Reduced Change Complexity

  13. Making “Push On Green” a Reality

  14. BeyondCorp: A New Approach to Enterprise Security

  15. Brainstorming Failure by Jeff Smith

  16. The Ripple Effect Of Outages And Downtime Cannot Be Underestimated

  17. The infrastructure behind Twitter: efficiency and optimization

  18. Dickerson’s Hierarchy of Reliability

  19. The Morning Paper on Operability

  20. Production is all that matters

  21. Using load shedding to survive a success disaster – CRE life lessons

  22. How to avoid a self-inflicted DDoS Attack – CRE life lessons

  23. Don’t gamble when it comes to reliability

  24. Resilience Engineering: Learning to Embrace Failure

  25. The Infrastructure Behind Twitter: Scale

  26. Scaling Reliability at Twitter: So You Want to Add a 9

  27. Principles Of Chaos Engineering

  28. Chaos Engineering

  29. Available…or not? That is the question – CRE life lessons

  30. How Google Backs Up The Internet Along With Exabytes Of Other Data

  31. Performance, Scalability, And High Availability: 3 Key Infrastructure Adaptability Requirements

  32. The Production Environment at Google

  33. Reliable releases and rollbacks – CRE life lessons

  34. How release canaries can save your bacon – CRE life lessons

  35. Things I Learned Managing Site Reliability for Some of the World’s Busiest Gambling Sites

  36. Every Day Is Monday in Operations

  37. Under the Hood: Ensuring Site Reliability

  38. Designing reliable systems with cloud infrastructure (Google Cloud Next ’17)

  39. A Google SRE explores GitHub reliability with BigQuery

  40. Know thy enemy: how to prioritize and communicate risks – CRE life lessons

  41. Chaos Engineering resources

  42. CRE life lessons: What is a dark launch, and what does it do for me?

  43. Why you should pick strong consistency, whenever possible

  44. The Network is Reliable

  45. Are You Load Balancing Wrong?

  46. How production engineers support global events on Facebook

  47. Google: A Collection Of Best Practices For Production Services

  48. Canary Analysis Service

  49. Tips for High Availability

  50. Progressive Service Architecture At Auth0

  51. Google Cloud Production Guideline

  52. production readiness

  53. Trust By Design: The Fusion of Operational Maturity and Risk Modeling

  54. Top Seven Myths of Robust Systems

  55. Taming chaos: Preparing for your next incident

  56. PID Loops and the Art of Keeping Systems Stable

  57. Are you ready for production?

  58. Production Checklist for Web Apps on Kubernetes

  59. Finding a problem at the bottom of the Google stack

  60. Rethinking Task Size in SRE

  61. How maintenance windows affect your error budget

  62. The Production Readiness Spectrum

  63. Generic mitigations

Monitoring & Observability & Alerting

  1. A Working Theory-of-Monitoring

  2. The Evolution of Monitoring Systems at Google – Tony Rippy

  3. Monitoring without Infrastructure @ Airbnb

  4. Monitoring distributed systems

  5. Observability at Uber Engineering: Past, Present, Future

  6. The 4 Golden Signals of API Health and Performance in Cloud-Native Applications

  7. My Philosophy on Alerting by Rob Ewaschuk

  8. Time To Detect – Netflix

  9. Why Percentiles Don’t Work the Way you Think

  10. Building Twitter’s Next-Gen Alerting System

  11. Instrumentation: Worst case performance matters

  12. Instrumentation: What does ‘uptime’ mean?

  13. Incidents + Outages at CircleCI: Our Playbook and What We’ve Learned

  14. An introduction to monitoring and alerting with timeseries at scale, with Prometheus

  15. Detecting outliers and anomalies in realtime at Datadog

  16. How to Monitor the SRE Golden Signals

  17. Monitoring in a DevOps World

  18. Monitoring Your Monitoring’s Monitoring

  19. Observability: the new wave or buzzword?

  20. Monitoring Isn’t Observability

  21. Monitoring in the time of Cloud Native

  22. Principles of Monitoring Microservices

  23. The Many Ways Your Monitoring Is Lying to You

  24. GitOps Part 3 – Observability

  25. Want to Debug Latency?

  26. Debugging Latency in Go 1.11

  27. Alerting on SLOs like Pros

  28. Applied Alerting Philosophy

  29. Observations on Observability

  30. Deploys: It’s Not Actually About Fridays

  31. Site Reliability Engineering Best Practices for Data Pipelines

  32. Elastic Observability in SRE and Incident Response

On-Call

  1. Being an On-Call Engineer: A Google SRE Perspective

  2. Inside Atlassian: how our site reliability engineers do incident management

  3. Inside Atlassian: how IT & SRE use ChatOps to run incident management

  4. Incident Response at Heroku

  5. Who’s On Call?

  6. SysAdvent – Day 6 – No More On-Call Martyrs

  7. On Being On Call

  8. The On-Call Handbook

  9. Incident management at Google — adventures in SRE-land

  10. How Spotify and GOV.UK handle on call, and more

  11. Run Book / Operations Manual template

  12. Automating Your Oncall: Open Sourcing Fossor and Ascii Etch

  13. Project STAR*: Streamlining Our On-Call Process

  14. SRE@Xero: Managing Incidents Part I

  15. SRE@Xero: Managing Incidents Part II

  16. How To Establish a High Severity Incident Management Program

  17. How Your Systems Keep Running Day After Day – John Allspaw

  18. On-call doesn’t have to suck

  19. Why, as a Netflix infrastructure manager, am I on call?

  20. Oncall and Sustainable Software Development

  21. On Call Rotations: How Best to Wake Devs Up in the Middle of the Night

  22. Understanding The Role Of The Incident Manager On-Call (IMOC)

  23. 3 Ways to Minimize the Impact of High Severity Incidents

  24. Advice to Management Teams While Enrolling Changes to On-Call Systems

  25. Moving Past Shallow Incident Data

  26. Sustainable On-Call

  27. dotScale 2017 – Aish Raj Dahal – Chaos management during a major incident

  28. Incident Management at Netflix Velocity

  29. Incidents, fixes, and the day after

  30. 10 Steps to Develop an Incident Response Plan You’ll ACTUALLY Use

  31. Checklists: a stupidly simple but valuable operational gift

  32. How to write a status page update

  33. Atlassian Incident Handbook

  34. PagerDuty Incident Response Handbook

  35. Avoiding Burnout for SREs

  36. Better On-Call the SRE way

  37. Managing Incidents at Monzo

  38. Making On-Call Not Suck

  39. How we (Monzo) respond to incidents

  40. How we’ve evolved on-call at Monzo

  41. Code Yellow: When Operations Isn’t Perfect

  42. MTTR is dead, long live CIRT

  43. Extended Dreyfus Model for Incident Lifecycles

  44. Inhumanity of Root Cause Analysis

  45. Incident insights from NASA, NTSB, and the CDC

  46. My week shadowing a GitLab Site Reliability Engineer

  47. How our production team runs the weekly on-call handover

  48. Writing Runbook Documentation When You’re An SRE

  49. Incident response, programs and you(r startup)

  50. An Incident Command Training Handbook

  51. Shrinking the time to mitigate production incidents

  52. Incident writeup as sociological storytelling

Post-Mortem

  1. A collection of post-mortems

  2. Collection of Kubernetes Failure Stories

  3. Blameless PostMortems and a Just Culture

  4. A Tale of Postmortems

  5. Building a Blameless Post-Mortem Culture with Jason Hand

  6. The infinite hows

  7. Failure is Always An Option: How a Blameless Culture Leads to Better Results

  8. SysAdvent – Day 1 – Why You Need a Postmortem Process

  9. Etsy’s Debriefing Facilitation Guide for Blameless Postmortems

  10. Writing Your First Postmortem

  11. How to Write Great Outage Post-Mortems

  12. A collection of postmortem templates

  13. Embracing Feedback

  14. Postmortem Action Items: Plan the Work and Work the Plan

  15. Social Issues In Postmortems

  16. Google Has an Official Process in Place for Learning From Failure–and It’s Absolutely Brilliant

  17. Postmortem culture: how you can learn from failure

  18. re:Work – Postmortem discussion template

  19. Post-mortems to the rescue

  20. Postmortem Action Items: Plan the Work and Work the Plan

  21. Why Every Company Can Benefit from a Blameless Culture

  22. It’s dead, Jim: How we write an incident postmortem

  23. Our incident postmortem template

  24. Learn out of mistakes. Postmortems to the rescue.

  25. Improving Postmortem Practices with Veteran Google SRE, Steve McGhee

Capacity Planning

  1. Capacity Planning

  2. SouthBay SRE: Cloud Capacity Planning

  3. How do you do Capacity Planning

  4. How Back Market SREs prepared for Black Friday

Service Level Agreement

  1. If It’s in the Cloud, Get It on Paper: Cloud Computing Contract Issues

  2. Service Level Agreements in the Cloud: Who cares?

  3. Making a point with SLAs

  4. SysAdvent- Day 20 – How to set and monitor SLAs

  5. SLOs, SLIs, SLAs, oh my – CRE life lessons

  6. Service Levels and Error Budgets

  7. (Un)Reliability Budgets – Finding Balance between Innovation and Reliability

  8. The Calculus of Service Availability

  9. Availability Calculator: Calculate how much downtime should be permitted in your SLA

  10. Best practices to develop SLAs for cloud computing

  11. A Practical Guide to SLAs

  12. Building good SLOs – CRE life lessons

  13. No Grumpy Humans and Other Site Reliability Engineering Lessons from Google

  14. Consequences of SLO violations — CRE life lessons

  15. Service Level Objectives in Practice

  16. SRE Consensus Building

  17. An example escalation policy — CRE life lessons

  18. Error Budget Calculator

  19. Understanding error budget overspend – part one – CRE life lessons

  20. Good housekeeping for error budgets – part two – CRE life lessons

  21. SRE fundamentals: SLIs, SLAs and SLOs

  22. SLOs & You: A Guide To Service Level Objectives

  23. Earning Our Wings: Stories and Findings From Operating a Large-scale Concourse Deployment

  24. Nines are Not Enough: Meaningful Metrics for Clouds

  25. How many nines is my storage system?

  26. Don’t follow the sun.

  27. The Tyranny of the SLA

  28. Backblaze Durability is 99.999999999% — And Why It Doesn’t Matter

  29. DevOpsDays Chicago 2019 – The Art of SLOs

  30. The Art of SLOs Workshop Materials

  31. How to Include Latency in SLO-Based Alerting

  32. Succeeding With Service Level Objectives

  33. Putting customers first with SLIs and SLOs

  34. SRE Leadership: Have Tiered SLAs

  35. How SLOs Enable Fast, Reliable Application Delivery

  36. The Tail at Scale

  37. The Tail at Scale Revisited

  38. Defining SLOs for services with dependencies

Performance

  1. Performance Checklists for SREs

  2. South Bay SRE Meetup – Netflix Cloud Performance Team

  3. Software Performance Analysis Guided By SLOs

  4. A framework for pragmatic performance engineering

Programming

  1. Go Language for Ops and Site Reliability Engineering

  2. Go for SREs using Python

  3. Operability in Go

  4. Go Reliability and Durability at Dropbox

Misc Articles

  1. What is SRE (Site Reliability Engineering)?

  2. Here’s How Google Makes Sure It (Almost) Never Goes Down

  3. Site Reliability Engineers: “solving the most interesting problems”

  4. Site Reliability Engineers: the “world’s most intense pit crew”

  5. Site reliability engineering kicks rote tasks out of IT ops

  6. Notes on Site Reliability Engineering

  7. Adventures in SRE-land: Welcome to Google Mission Control

  8. Book Review: Site Reliability Engineering – How Google Runs Production Systems

  9. Site Reliability Engineers: “We solve cooler problems”

  10. SREcon17: Brave new world of site reliability engineering

  11. Open AWS guide

  12. 20 SRE / Devops / System Engineer Tricks

  13. Commentary on Site Reliability Engineering

  14. Site Reliability Engineering: 4 Things to Know

  15. Looking for SRE Success? Then Find the Intrapreneurs!

  16. What Team Structure is Right for DevOps to Flourish?

  17. Injured on Vacation? Applying Principles from Site Reliability Engineering to a Travel Emergency

  18. Building blameless working environment

  19. SRE Adoption Report

  20. SREs: The Happiest – and Highest Paid – in the Industry

  21. The Role of Site Reliability Engineering, Today and Tomorrow

  22. SRE as a Lifestyle Choice

  23. SRECon EMEA 2019 Recap

  24. Life of an SRE at Google – JC van Winkel

  25. Site Reliability Engineering for Native Mobile Apps – Abhijith Krishnappa
    Case study: Halodoc adaptation of SRE principles for Native Mobile Apps

  26. SRE Best Practices by InfraCloud

Blogs

  1. Brendan Gregg’s Blog
    Highly Technical Blog Posts About Systems Internals, Performance and SRE.

  2. Everything Sysadmin
    Blog Posts About SysAdmin/DevOps/SRE by Tom Limoncelli.

  3. High Scalability
    Technical Blog Posts About Systems Architecture.

  4. rachelbythebay
    Techincal Blog Posts.

  5. Susan J. Fowler
    Various blog posts about SRE, Software Engineering and Microservices.

  6. SysAdvent
    One article for each day of December, ending on the 25th article.

  7. Stephen Thorne’s Blog
    Blog Posts About SRE

  8. Increment
    A digital magazine about how teams build and operate software systems at scale.

  9. GopherSRE
    Blog Posts about Go and SRE.

  10. Cindy Sridharan
    Blog posts about distributed systems and their management.

  11. Blameless Blog
    Blog posts about SRE culture and practices.

  12. Resilience Roundup
    Weekly analysis of Resilience Engineering and Human Factors research designed for software systems

  13. Squadcast Blog
    Blog posts about SRE best practices, reliability, on-call and incident management.

  14. FireHydrant Blog
    Posts about complex systems, incident response, and SRE best practices.

  15. Rootly Blog
    Incident management best practices and guides.

Newsletters

  1. DevOpsLinks
    A weekly newsletter about SRE, SysAdmin and DevOps news, tools, tutorials and opinions.

  2. KubeWeekly
    The weekly newsletters for all things Kubernetes. KubeWeekly is curated by Bob Killen, Chris Short, Craig Box, Kim McMahon and Michael Hausenblas

  3. SRE Weekly
    Weekly Site Reliability Newsletter.

  4. O’Reilly Systems Engineering and Operations Newsletter
    Weekly systems engineering and operations news and insights from industry insiders.

  5. ChaosEngineering.news
    Chaos Engineering newsletter. All things Chaos Engineering, directly to your inbox!

Conferences & Meetups

  1. SRECon Conferences
    The Official SRE Conference.

  2. LISA Conferences
    Prominent Conference About SysAdmin/DevOps/SRE.

  3. SRE Tech Talks
    SRE Talks Hosted by Google.

  4. South Bay Site Reliability Engineering (Sunnyvale, CA) Meetup
    A Group For Individuals Who Tackle Reliability Challenges For Web-Scale Systems.

  5. San Francisco Reliability Engineering
    A Group Of People Who Are Passionate About Reliable, Performant Software Systems.

  6. Site Reliability Engineering Munich, Germany
    SRE Meetup in the greater area of Oktoberfest city.

  7. ADDO – All Day DevOps
    A 24 hour conference that is completely online and free.

  8. Site Reliability Engineering Paris, France
    SRE Meetup in the city of light.

  9. Site Reliability Engineering India
    SRE Meetup India

Twitter

  1. Google SRE Twitter Account
    Google’s SRE Twitter Account.

  2. SREBook
    The Official Twitter Account of Site Reliability Engineering Book.

  3. SREcon
    SRECon’s Official Twitter Account.

  4. SREWorkbook
    The Official Twitter Account of Site Reliability Workbook.

  5. The SRE Dev
    SRE-related Posts from dev.to

  6. Twitter SRE
    The Official Twitter Account of Twitter’s SRE team.

  7. Twitter SRE Weekly
    The Official Twitter Account of SRE Weekly Newsletter.

  8. USENIX Association
    The Official USENIX Twitter Account.

2 years later

Can we pleaseupdate this resource

3 months later