All Episodes

Displaying 1 - 15 of 15 in total

How Experienced SREs Make High-Stakes Decisions in Uncertain Situations

Join us on Site Reliability Engineering Crashcasts as we delve into the critical art of decision-making under uncertainty with expert Victor. In this episode, we expl...

Effective Strategies and Resources for Continuous Learning in SRE

Ready to supercharge your Site Reliability Engineering skills? In this episode, Sheila and Victor delve into the best strategies and resources for continuous learning ...

The Evolution of Containerization: Insights on Docker and Kubernetes

Curious about how containerization has revolutionized application deployment and management? Welcome to Site Reliability Engineering Crashcasts! In this episode, we e...

Designing Highly Available Systems: Insights from Leading Companies

Ever wondered how leading tech companies achieve near-perfect uptime? Tune in to this episode of Site Reliability Engineering Crashcasts as Sheila and Victor break dow...

Comparing Prometheus, Grafana, ELK Stack & Emerging Trends in Observability

Dive into the essentials of monitoring and logging in this episode of Site Reliability Engineering Crashcasts with Sheila and Victor! In this episode, we explore: Th...

Techniques for Performance Troubleshooting and Latency Diagnosis in SRE

Ready to unravel the mysteries of performance troubleshooting and latency diagnosis in SRE? Join host Sheila and expert Victor as they dive deep into essential techniq...

Maximizing SRE Efficiency: Harnessing Automation for Self-Healing Systems

Unlock the potential of automation in Site Reliability Engineering in this episode of Site Reliability Engineering Crashcasts! In this episode, we explore: What auto...

DevOps vs. SRE: Exploring Their Similarities, Differences, and Professional Perspectives

Dive deep into the world of DevOps and Site Reliability Engineering (SRE) with us in this enlightening episode of Site Reliability Engineering Crashcasts! In this epi...

Defining Reliability Beyond 99.999%: SLOs, SLAs, and Error Budgets Explained

Join us on Site Reliability Engineering Crashcasts as we delve into the nuanced world of reliability metrics that go beyond the typical uptime percentages. Hosted by S...

SRE War Stories: Effective Strategies for Troubleshooting Complex Production Issues

Get ready for an action-packed episode of Site Reliability Engineering Crashcasts! Join Sheila and SRE expert Victor as they unravel the thrilling world of war stories...

Mastering Terraform for SRE: Streamline Cloud and Multi-Cloud Management

Unlock the full potential of cloud management with Terraform in our latest episode of Site Reliability Engineering Crashcasts. Join Sheila and Victor as they delve int...

Puppet in SRE: Streamlining Infrastructure Management & Continuous Delivery

We're diving deep into how Puppet can revolutionize your SRE practices. In this episode, we explore: Discover how Puppet streamlines infrastructure management and en...

Chef's Role in SRE Configuration Management: Comparing Infrastructure Automation Tools

Get ready to untangle the complexities of configuration management with Chef in this engaging episode of Site Reliability Engineering Crashcasts! In this episode, we ...

How Ansible Powers Infrastructure as Code and Automation in SRE Practices

Discover how Ansible revolutionizes infrastructure management and powers automation in SRE practices in this exciting episode. In this episode, we explore: Learn wha...

Demystifying SLIs and SLOs: A Guide to Service Level Indicators and Objectives

Dive into the world of Service Level Indicators (SLIs) and Service Level Objectives (SLOs) with our expert guest, Victor, as we unravel these crucial concepts in Softw...

Broadcast by