Jump to content

Search the Community

Showing results for tags 'reliability'.

  • Search By Tags

    Type tags separated by commas.
  • Search By Author

Content Type


Forums

  • General
    • General Discussion
    • Artificial Intelligence
    • DevOpsForum News
  • DevOps & SRE
    • DevOps & SRE General Discussion
    • Databases, Data Engineering & Data Science
    • Development & Programming
    • CI/CD, GitOps, Orchestration & Scheduling
    • Docker, Containers, Microservices, Serverless & Virtualization
    • Infrastructure-as-Code
    • Kubernetes & Container Orchestration
    • Linux
    • Logging, Monitoring & Observability
    • Security, Governance, Risk & Compliance
  • Cloud Providers
    • Amazon Web Services
    • Google Cloud Platform
    • Microsoft Azure

Find results in...

Find results that contain...


Date Created

  • Start

    End


Last Updated

  • Start

    End


Filter by number of...

Joined

  • Start

    End


Group


Website URL


LinkedIn Profile URL


About Me


Cloud Platforms


Cloud Experience


Development Experience


Current Role


Skills


Certifications


Favourite Tools


Interests

Found 9 results

  1. Improve Kubernetes cost and reliability with the new Policy Controller policy bundleView the full article
  2. Microservices architecture has become extremely popular in recent years because it allows for the creation of complex applications as a collection of discrete, independent services. Comprehensive testing, however, is essential to guarantee the reliability and scalability of the software due to the microservices’ increased complexity and distributed nature. Due to its capacity to improve scalability, flexibility, and resilience in complex software systems, microservices architecture has experienced a significant increase in popularity in recent years. The distributed nature of microservices, however, presents special difficulties for testing and quality control. In this thorough guide, we’ll delve into the world of microservices testing and examine its significance, methodologies, and best practices to guarantee the smooth operation of these interconnected parts. View the full article
  3. Most folks working in DevOps or SRE roles are familiar with metrics like mean-time-to-recovery (MTTR). Keeping track of the average time a team takes to respond to incidents is crucial to identifying bottlenecks in the support process. It’s also something executives like to show higher-ups when sharing a snapshot of overall platform performance. However, focusing […] The post 5 Mean-Time Reliability Metrics To Follow appeared first on DevOps.com. View the full article
  4. In cloud native computing, the applications are expected to be resilient, loosely coupled, scalable, manageable and observable. Because of containerization, there is a proliferation of microservices and they ship quickly. Microservices environments are more dynamic. In such an environment, making applications resilient means deploying the applications in a fault tolerant manner, but it also means […] The post LitmusChaos Enhances Developer Experience for Cloud Native Reliability appeared first on DevOps.com. View the full article
  5. When designing and building software, service reliability is always at the top of the list of critical focus areas for development teams. Every team that builds software typically has, either directly or indirectly, service level agreements with their customers. These are, essentially, agreed-upon metrics or performance criteria that teams use to measure and ensure the […] The post How Real-Time Debugging Improves Reliability appeared first on DevOps.com. View the full article
  6. In a recent fireside chat with Mohan Bhatkar, Head of Engineering for the Customer Reliability Platform at Mercari, Inc. sat down with Blameless Co-Founder Ashar Rizqi. They talked about scaling while avoiding silos, exciting day-to-day challenges, instilling a culture of empowerment, and more. Here are their top insights and the lightly edited transcript of their conversation. View the full article
  7. until
    Join Us for a Complimentary Live Webinar Sponsored by Datadog Live Webinar and Q&A: Better Reliability with Service Level Objectives (SLOs) 
Date/Time: Tuesday, November 10 • 10:00 - 10:50 am PST Cost: Free to attend
 Abstract Service Level Objectives (SLOs) are a measurement of the reliability and general experience your end users and customers can expect. In this talk, you’ll learn how to define SLOs by choosing the correct service level indicators (SLIs) and defining appropriate agreements with stakeholders. We’ll explain the key concept of error budgets, which give you a solid, actionable metric for balancing innovation and velocity with reliability and safety. You’ll also learn how to have meaningful conversations around realistic availability, which will enable you to define high quality SLOs for your own organization. This webinar is sponsored by Datadog and hosted by The Linux Foundation. Full Details & Registration
  8. Are you excited about reliability? Is your significant other tired of hearing about distributed systems? Are you the one being paged when systems go down? Have you had “aha!” moments when reading the SRE books? If you answered ‘yes’ to any of these questions, join us for a virtual conference on everything SRE! We’re looking for presenters on topics such as: building reliable systems monitoring and alerting distributed systems chaos engineering automated testing https://www.papercall.io/conf42-sre-2021
  9. There are two main types of reliability work. The first is mitigation, which is a linear fix that’s often referred to as firefighting. In other words, you’re fixing problems as they come. The second is change management, which is a non-linear fix that proactively reduces the defect rates through pro.. View the full article
  • Forum Statistics

    43.1k
    Total Topics
    42.4k
    Total Posts
×
×
  • Create New...