Jump to content

Monitoring & Observability

  • Metrics & Time Series Databases (e.g., Prometheus, Grafana, InfluxDB)

  • Logging & Log Management (e.g., ELK Stack, Loki, Splunk)

  • Tracing & Distributed Systems Monitoring (e.g., Jaeger, Zipkin, OpenTelemetry)

  • Alerting & Incident Management (e.g., PagerDuty, Opsgenie)

  • Synthetic Monitoring & Uptime Checks

  1. Amazon Managed Service for Prometheus now supports label-based active series limits within your workspace. This feature helps you manage active series volume across different producers such as applications, services or teams that share a workspace. You can now allocate specific active series limits to different metric producers in your workspace, enabling you to protect your critical metrics. If a sub-set of metrics experience an unexpected surge, only the metrics sharing the same label-based active series limits are throttled. For example, you can set different limits for metrics from different applications using label sets like {app="payment-service", environment="pr…

  2. Amazon CloudWatch agent support for Red Hat OpenShift Service on AWS (ROSA) enables monitoring of applications and infrastructure using familiar CloudWatch tools such as Container Insights and Application Signals. ROSA is a fully-managed cloud service that helps customers to quickly deploy, operate, and scale containerized applications on AWS with the same consistent OpenShift experience they have on-premises. This new capability allows DevOps teams and application owners to gain deep visibility into their ROSA clusters' performance, health, and resource utilization leveraging AWS's native observability tools. CloudWatch agent on ROSA enables the collection and analysi…

  3. AWS recently announced support for a new Apache Flink connector for Prometheus. The new connector, contributed by AWS to the Flink open source project, adds Prometheus and Amazon Managed Service for Prometheus as a new destination for Flink. In this post, we explain how the new connector works. We also show how you can manage your Prometheus metrics data cardinality by preprocessing raw data with Flink to build real-time observability with Amazon Managed Service for Prometheus and Amazon Managed Grafana... View the full article

  4. Amazon CloudWatch Application Signals now supports creating Service Level Objectives (SLOs) using metrics from your service dependencies. You can now monitor the performance of your services' dependencies, and proactively resolve problems through SLO goal setting, thanks to this new ability. Using Application Signals you can create period-based or request-based SLOs that track key metrics like latency and faults for the outgoing requests from your services to their dependencies. You can see how your dependencies perform and how this impacts the reliability of your overall service. For example, if your e-commerce service relies on a payment processor, you can set an SLO…

  5. Amazon Web Services (AWS) today revealed it is streamlining IT incident management by adding generative artificial intelligence (AI) capabilities to the Amazon OpenSearch service. Widely used to troubleshoot IT incidents involving AWS cloud computing environments, the generative AI capabilities are being provided via integrations with Amazon Q Developer, a set of AI agents that automate […] View the full article

    • 0 replies
    • 23 views
  6. John Willis, celebrated for his influential “The DevOps Handbook” and his newest adventure, “Rebels of Reason,” kicked things off by spotlighting the lightning-fast progress of AI. In just two short years—from 2022 to 2024—AI’s capabilities in reasoning and coding have soared dramatically, creating an unstoppable wave that’s transforming industries almost overnight. Willis humorously described how […]View the full article

    • 0 replies
    • 69 views
  7. Started by Logz.io,

    Explore improvements Enhanced Lucene Query Editor Syntax highlighting and auto suggestions to streamline query building and troubleshooting. Date Picker Updates Now located next to the query line, the date picker features an improved timestamp for better readability. Trace Tab Indicator An indicator now appears on the Traces tab whenever trace data is linked to the […]View the full article

    • 0 replies
    • 73 views
  8. I remember a time when digging through logs, events, and dashboards felt like trying to find a single sock in a pile of laundry fresh out of the dryer—frustrating, time-consuming, and somehow, the answer was always just out of reach. That’s where AI Agents come in. Instead of wasting hours sifting through data, you simply […]View the full article

    • 0 replies
    • 57 views
  9. AWS CloudWatch is a widely used observability tool that comes built into AWS. It provides easy access to logs, metrics, and alarms, making it a convenient choice for teams monitoring AWS workloads. But while CloudWatch offers a lot of power, many teams unknowingly misconfigure or misuse it, leading to unexpected costs, limited visibility, and operational […]View the full article

    • 0 replies
    • 59 views
  10. CloudWatch Database Insights announces support of databases hosted on Amazon Relational Database Service (RDS). Database Insights is a database observability solution that provides a curated experience designed for DevOps engineers, application developers, and database administrators (DBAs) to expedite database troubleshooting and gain a holistic view into their database fleet health... View the full article

  11. As an observability leader, at Logz.io, we pride ourselves on continuous innovation. That’s why, last year, we released our AI agents to revolutionize observability by helping businesses, and their engineering and DevOps teams, automate data analysis and root cause analysis. The primary way in which engineering and DevOps teams interact with the agents is by […]View the full article

    • 0 replies
    • 56 views
  12. Observability Costs Are Out of Control – Here’s How to Fix It In today’s cloud-native world, keeping logs, metrics, and traces under control isn’t just about monitoring performance – it’s about managing costs. And if you’re an engineering leader or platform owner, you know that observability budgets can spiral fast. That’s exactly what we tackled […]View the full article

    • 0 replies
    • 51 views
  13. We’re excited to announce a series of upgrades to our AI Agent, Log Management Explore UI and core integrations designed to empower you with even deeper observability and streamlined operations. These updates enhance account visibility, multi-telemetry trace insights, and logging capabilities while ensuring seamless compatibility with OpenTelemetry. Read on to discover how these enhancements can […]View the full article

    • 0 replies
    • 52 views
  14. Today’s distributed, cloud-native systems generate logs at a high rate, making it increasingly difficult to derive actionable insights. AI and Generative AI (GenAI) technologies—particularly large language models (LLMs)— are transforming log management tools by enabling teams to sift through this data, identify anomalies, and deliver real-time, context-rich intelligence to streamline troubleshooting. By applying transformer-based architectures–which […]View the full article

    • 0 replies
    • 56 views
  15. Introduction to OTel In formal terms, OpenTelemetry is an open source framework used for instrumenting, generating, collecting, and exporting telemetry data for applications, services, and infrastructure. It provides vendor-neutral tools, SDKs and APIs for generating, collecting, and exporting telemetry data such as traces, metrics, and logs to any observability backend, including both open source and […]View the full article

    • 0 replies
    • 54 views
  16. We’re thrilled to announce that Logz.io received a Special Mention for Best Use of AI from the 2024 O11ys Awards, a celebration of innovation and excellence in observability. The 2024 O11ys Awards recognized our AI Agent, calling it: This recognition validates our mission to simplify observability with AI, empowering teams to troubleshoot faster, optimize costs, […]View the full article

    • 0 replies
    • 49 views
  17. Introducing Our New Support Help Center We’re thrilled to launch our brand-new and improved Support Help Center, designed to streamline how you interact with our support team and access the resources you need. This enhanced platform empowers users to: This is more than just a support portal—it’s a centralized hub to enhance your experience, provide […]View the full article

    • 0 replies
    • 54 views
  18. Managing modern systems requires a constant balance between operational efficiency and innovation; going a little further, maintaining seamless operations and delivering exceptional customer experiences increasingly depend on ensuring robust observability. For years, the ELK stack (Elasticsearch, Logstash, Kibana) has been the go-to solution for many organizations for log management and observability, offering flexibility control and […]View the full article

    • 0 replies
    • 54 views
  19. Complexity rules the day within the world of data systems and pipelines. A goal for any observability practice is to help reduce complexity and give users and administrators a clear view of what’s happening in any system. This is the path to unified observability, a mature system where monitoring and troubleshooting are streamlined. This has […]View the full article

  20. As digital applications and infrastructures grow increasingly complex, managing and understanding log data has become increasingly vital in achieving practical observability, enabling organizations to detect, diagnose, and prevent issues across their systems. However, traditional log analysis methods often struggle with the volume and complexities of modern log data in cloud-native environments. Further, many organizations have […]View the full article

  21. In today’s rapidly evolving digital landscape, organizations heavily rely on their applications and systems to deliver optimal performance. As such, driving down the key metric of Mean Time to Resolution (MTTR) is clearly one of the biggest challenges facing observability practitioners today. According to the 2024 Observability Pulse Report, based on our annual survey of […]View the full article

    • 0 replies
    • 52 views
  22. These days, one of the most important decisions that organizations can make as it relates to their observability strategy is: “How much data do we want to retain in Hot storage to ensure we have everything needed for real time analysis — without running up associated costs?” The reality is, for their immediate troubleshooting efforts, […]View the full article

    • 0 replies
    • 35 views
  23. A critical component of any monitoring and observability system is alerting. But alerts in and of themselves aren’t enough—when something goes wrong, time is of the essence, and your team needs to figure out not just what’s going on but how to fix it, and fast. Additionally, constantly chasing down alerts can be the bane […]View the full article

    • 0 replies
    • 31 views
  24. Explore Upgrades and Improvements New Visualization FeaturesWe’re rolling out new visualization capabilities in the Explore log management interface that are available now in some accounts and will be added to all in the coming weeks and months. With these updates you can: Warm Tier: There is now a new option for log storage and access […]View the full article

    • 0 replies
    • 34 views
  25. Too much time spent troubleshooting? You’re not alone. Manual investigation, jumping between dashboards, and piecing together scattered data are time-consuming and frustrating. That’s why we built the Logz.io AI Agent—to simplify root cause analysis (RCA), optimize performance, and help teams de-risk deployments. In our recent webinar, we explored how AI is reshaping observability workflows. The […]View the full article

  26. Cloud native technologies have made it harder to understand how systems are behaving. Logs are the answer, but they can be voluminous and complex in any environment. How do you make sense of them? Logz.io co-founder and CTO Asaf Yigal recently participated in a webinar hosted by LeadDev.com on this topic alongside fellow experts Tanvi […]View the full article

  27. Explore Upgrades and Improvements We’ve improved the filter pane to include: Additionally, a new time-picker option lets you mix absolute and relative times and manually set the date and time to the second. Additionally, you can view your data in either UTC or your local time zone. Saved searches from Explore can now be used […]View the full article

    • 0 replies
    • 33 views
  28. Logz.io introduces its AI Agent in Beta, using GenAI to revolutionize observability. The AI Agent simplifies monitoring with automated data analysis and root cause detection, accelerating issue resolution by 3-5x for beta users—marking a critical step toward fully autonomous observability. Today, we’re thrilled to announce the launch of the Logz.io AI Agent, as we blaze […]View the full article

  29. Started by Logz.io,

    In conversations about cloud observability today, discussions often shift from “what’s possible” to “what’s practical.” Too often, these conversations highlight the shortcomings of current observability processes, tools and financial models. As observability data workloads continue to grow at an unprecedented pace, traditional dashboarding and alerting-based approaches are struggling to keep up. This hinders decision-making, extends […]View the full article

  30. At Logz.io, we believe that observability should be simple, smart, and fast—powered by AI to help teams move with confidence. This Fall, our users recognized that commitment by awarding Logz.io 15 badges on G2 across multiple categories and global markets. From ease of use to fast implementation, users and businesses alike are experiencing how AI-driven […]View the full article

    • 0 replies
    • 22 views
  31. Observability is critical to innovation, but for many organizations, soaring costs make it impossible to achieve. Logz.io is here to change that. The Problem with Traditional Observability Pricing You don’t have to look very far these days to see that the issue of ever-increasing costs is getting in the way of many organizations achieving their […]View the full article

  32. The emergence of generative AI in observability tools was inevitable, but there’s already been an extreme degree of hype in the market. Monitoring, DevOps and ITOps have never been immune to trends, and with GenAI capabilities, the propagandahype machine is running out of control. Organizations looking to ride the wave of GenAI undoubtedly recall the […]View the full article

  33. When Gartner publishes their annual observability industry research, it’s always exciting to find your company named among the most successful and high-profile providers in this space. That’s why Logz.io is thrilled to find itself listed as a Visionary for the third consecutive year in the Gartner® Magic Quadrant™ for Observability Platforms (previously known as the […]View the full article

    • 0 replies
    • 24 views
  34. “Logz.io Observability IQ Assistant helps us to find the root cause of the issues faster, and it reduces a lot of the manual processes that we were doing before.” That’s the assessment of Senior DevOps Engineer and Logz.io user Armin Morattab when discussing the impact of AI on his day-to-day job. He dives deep on […]View the full article

    • 0 replies
    • 48 views
  35. Your team is responsible for ensuring the reliability and performance of your organization’s critical applications and infrastructure. What keeps you up at night? Your applications are more complex, distributed and cloud-native than ever, meaning that understanding what’s happening under the hood has never been more complex than it is now. Is it system bugs, or […]View the full article

    • 0 replies
    • 27 views
  36. At Logz.io, we’ve found that for most organizations observability challenges start with log management. Today more than ever, log management is a highly complex practice that involves mountains of ephemeral data, and the related obstacles are preventing people from achieving their observability goals, full stop. That’s why we designed our new log management UI to […]View the full article

    • 0 replies
    • 95 views
  37. Kubernetes has just reached its 10th anniversary, signifying the maturity of the containers movement. Now it’s time to explore the next frontier in cloud-native evolution: WebAssembly, a.k.a. WASM or Wasm. Moving beyond containers and Kubernetes, WASM bears the promise to revolutionize the cloud landscape with unparalleled performance, portability, and security. In a recent episode of […]View the full article

    • 0 replies
    • 64 views
  38. Started by Logz.io,

    Despite advances in the world of observability, log management hasn’t evolved much in recent years. In our webinar, Your Faster, Easier Log Management UI is Here, we share our vision for log management and provide a demo of Logz.io Explore, the new UI for our Log Management solution. NOTE: Watch to the end of the […]View the full article

    • 0 replies
    • 61 views
  39. There’s no question that achieving end-to-end observability is among the most challenging tasks facing engineering and ops teams today. A quick look back at the 2024 Observability Pulse survey throws this conclusion into stark relief as: Logz.io is committed to making observability smarter, faster, and easier — from data ingestion, to troubleshooting, to managing costs. […]View the full article

    • 0 replies
    • 60 views
  40. The business of observability is all about data: what you’re observing in the data, how you’re visualizing it, what it indicates about the state of your environment, and how to address issues that may occur. Creating your own perspective for observability, and understanding what you’re seeing, can be difficult. The use of both generative and […]View the full article

    • 0 replies
    • 50 views
  41. Redis is no longer open source. In March 2024 the project was relicensed, leaving its vast community confused. But the community did not give up, and started work to fork Redis to keep it open. On my recent OpenObservabilty Talks episode, I delved into Valkey, a prominent fork of Redis. Valkey was established under The […]View the full article

    • 0 replies
    • 57 views
  42. There’s so much hype around the use of AI in observability — but how does that translate into making tangible progress with your day-to-day tasks? At Logz.io we’ve introduced an AI-based chatbot assistant to the Open 360™ platform that automatically delves into your stack, fine-tunes your workflows and enables conversation directly with your systems and […]View the full article

    • 0 replies
    • 84 views
  43. It may sound complicated and daunting, but so much of observability is about discovering the unknown unknowns in your critical systems. The capabilities of observability engineering can help you make those discoveries. Most organizations have some form of monitoring, alerting and troubleshooting, which can be adequate to a point but fall short when trying to […]View the full article

    • 0 replies
    • 92 views
  44. Intro – LogMetrics Feature At Logz.io we provide an observability platform with the ability to ship logs, metrics, and traces and then interact with them using our app. LogMetrics is an integral part of our observability offering, which bridges the gap between logs and metrics. It provides the seamless conversion of one type of signal […]View the full article

    • 0 replies
    • 69 views
  45. In the past few years we’ve been witnessing tectonic shifts in the open source realm, with established projects taken off open source or otherwise turning to the dark side. On the other hand, we’ve seen active forks aiming to keep these projects open gaining momentum. What does it mean for the Free and Open Source […]View the full article

    • 0 replies
    • 79 views
  46. At Logz.io, we believe the future of observability will center on the rapid advancement of automation, innovations around artificial intelligence, and streamlining processes that currently remain far too complex. This is no different than many other areas of technology, but the opportunities in observability are vast, and we see all of these areas connecting and […]View the full article

    • 0 replies
    • 70 views
  47. In the fast-paced realm of DevOps and Site Reliability Engineering (SRE), success starts with effective monitoring. Understanding the fundamental metrics is crucial for identifying and mitigating issues proactively. In this article, we’ll delve into the leading metrics frameworks — R.E.D., U.S.E., and the “Four Golden Signals” — which will provide you with a solid foundation […]View the full article

    • 0 replies
    • 88 views
  48. Generative AI and large language models (LLM) are fundamentally changing the way we interact with data, especially in the realm of Kubernetes and observability. These technologies are reshaping our field, and there is a lot to understand and unpack so organizations like yours can make sense of it all. What data is important, and what […]View the full article

    • 0 replies
    • 89 views
  49. AI has been the biggest macro-trend in technology for some time now, and the observability space is no exception to this rule. Just look at the findings of the 2024 Observability Pulse Report; it’s evident that organizations are hungry for AI capabilities that help address pervasive issues of observability process maturity, the talent shortage, ever-increasing […]View the full article

    • 0 replies
    • 83 views