Search the Community

Showing results for tags 'reliability'.

Found 9 results

Sort By
- Date
- Relevancy

cost optimization Improve Kubernetes cost and reliability with the new Policy Controller policy bundle

Google Cloud Platform posted a topic in Kubernetes & Container Orchestration

Improve Kubernetes cost and reliability with the new Policy Controller policy bundleView the full article
- October 16, 2023
- - k8s
  - reliability
  - (and 2 more)
    Tagged with:
    
    k8s
    
    reliability
    
    policy controllers
    
    policies
microservices Comprehensive Guide to Microservices Testing: Ensuring Reliable and Scalable Software

DZone posted a topic in Docker, Containers, Microservices, Serverless & Virtualization

Microservices architecture has become extremely popular in recent years because it allows for the creation of complex applications as a collection of discrete, independent services. Comprehensive testing, however, is essential to guarantee the reliability and scalability of the software due to the microservices’ increased complexity and distributed nature. Due to its capacity to improve scalability, flexibility, and resilience in complex software systems, microservices architecture has experienced a significant increase in popularity in recent years. The distributed nature of microservices, however, presents special difficulties for testing and quality control. In this thorough guide, we’ll delve into the world of microservices testing and examine its significance, methodologies, and best practices to guarantee the smooth operation of these interconnected parts. View the full article
- October 11, 2023
- - 1
- - architecture
  - reliability
  - (and 1 more)
    Tagged with:
    
    architecture
    
    reliability
    
    scalability
5 Mean-Time Reliability Metrics To Follow

Devops.com posted a topic in General Discussion

Most folks working in DevOps or SRE roles are familiar with metrics like mean-time-to-recovery (MTTR). Keeping track of the average time a team takes to respond to incidents is crucial to identifying bottlenecks in the support process. It’s also something executives like to show higher-ups when sharing a snapshot of overall platform performance. However, focusing […] The post 5 Mean-Time Reliability Metrics To Follow appeared first on DevOps.com. View the full article
- July 7, 2022
- - reliability
  - metrics
LitmusChaos Enhances Developer Experience for Cloud Native Reliability

Devops.com posted a topic in DevOps & SRE General Discussion

In cloud native computing, the applications are expected to be resilient, loosely coupled, scalable, manageable and observable. Because of containerization, there is a proliferation of microservices and they ship quickly. Microservices environments are more dynamic. In such an environment, making applications resilient means deploying the applications in a fault tolerant manner, but it also means […] The post LitmusChaos Enhances Developer Experience for Cloud Native Reliability appeared first on DevOps.com. View the full article
- May 18, 2022
- - microservices
  - reliability
How Real-Time Debugging Improves Reliability

Devops.com posted a topic in Development & Programming

When designing and building software, service reliability is always at the top of the list of critical focus areas for development teams. Every team that builds software typically has, either directly or indirectly, service level agreements with their customers. These are, essentially, agreed-upon metrics or performance criteria that teams use to measure and ensure the […] The post How Real-Time Debugging Improves Reliability appeared first on DevOps.com. View the full article
How Mercari Scales Vision, Culture, & Reliability

The Chief I/O posted a topic in DevOps & SRE General Discussion

In a recent fireside chat with Mohan Bhatkar, Head of Engineering for the Customer Reliability Platform at Mercari, Inc. sat down with Blameless Co-Founder Ashar Rizqi. They talked about scaling while avoiding silos, exciting day-to-day challenges, instilling a culture of empowerment, and more. Here are their top insights and the lightly edited transcript of their conversation. View the full article
Webinar: Better Reliability with Service Level Objectives (SLOs)

James posted an event in DevOps Events

Nov 10

Tuesday 10 November 2020, 02:00 AM until 02:50 AM
Join Us for a Complimentary Live Webinar Sponsored by Datadog Live Webinar and Q&A: Better Reliability with Service Level Objectives (SLOs)  Date/Time: Tuesday, November 10 • 10:00 - 10:50 am PST Cost: Free to attend  Abstract Service Level Objectives (SLOs) are a measurement of the reliability and general experience your end users and customers can expect. In this talk, you’ll learn how to define SLOs by choosing the correct service level indicators (SLIs) and defining appropriate agreements with stakeholders. We’ll explain the key concept of error budgets, which give you a solid, actionable metric for balancing innovation and velocity with reliability and safety. You’ll also learn how to have meaningful conversations around realistic availability, which will enable you to define high quality SLOs for your own organization. This webinar is sponsored by Datadog and hosted by The Linux Foundation. Full Details & Registration
- October 28, 2020
- - webinar
  - slo
  - (and 1 more)
    Tagged with:
    
    webinar
    
    slo
    
    reliability
Conf42.com: SRE 2021

James posted an event in DevOps Events

Sep 29

Wednesday 29 September 2021, 11:00 PM
Are you excited about reliability? Is your significant other tired of hearing about distributed systems? Are you the one being paged when systems go down? Have you had “aha!” moments when reading the SRE books? If you answered ‘yes’ to any of these questions, join us for a virtual conference on everything SRE! We’re looking for presenters on topics such as: building reliable systems monitoring and alerting distributed systems chaos engineering automated testing https://www.papercall.io/conf42-sre-2021
- October 24, 2020
- - event
  - sre
  - (and 2 more)
    Tagged with:
    
    event
    
    sre
    
    reliability
    
    systems performance
Structuring Your Teams for Software Reliability

The Chief I/O posted a topic in DevOps & SRE General Discussion

There are two main types of reliability work. The first is mitigation, which is a linear fix that’s often referred to as firefighting. In other words, you’re fixing problems as they come. The second is change management, which is a non-linear fix that proactively reduces the defect rates through pro.. View the full article
- June 4, 2020
- - reliability

Forum Statistics

43.1k
Total Topics

42.4k
Total Posts

Sign In

Search the Community

Search By Tags

Search By Author

Content Type

Forums

Calendars

Find results in...

Find results that contain...

Date Created

Start

End

Last Updated

Start

End

Filter by number of...

Minimum number of comments

Minimum number of replies

Minimum number of reviews

Minimum number of views

Joined

Start

End

Group

Website URL

LinkedIn Profile URL

About Me

Cloud Platforms

Cloud Experience

Development Experience

Current Role

Skills

Certifications

Favourite Tools

Interests

Forum Statistics