Jump to content

Search the Community

Showing results for tags 'amazon cloudwatch logs'.

  • Search By Tags

    Type tags separated by commas.
  • Search By Author

Content Type


Forums

There are no results to display.

There are no results to display.


Find results in...

Find results that contain...


Date Created

  • Start

    End


Last Updated

  • Start

    End


Filter by number of...

Joined

  • Start

    End


Group


Website URL


LinkedIn Profile URL


About Me


Cloud Platforms


Cloud Experience


Development Experience


Current Role


Skills


Certifications


Favourite Tools


Interests

  1. You can use Amazon Data Firehose to aggregate and deliver log events from your applications and services captured in Amazon CloudWatch Logs to your Amazon Simple Storage Service (Amazon S3) bucket and Splunk destinations, for use cases such as data analytics, security analysis, application troubleshooting etc. By default, CloudWatch Logs are delivered as gzip-compressed objects. You might want the data to be decompressed, or want logs to be delivered to Splunk, which requires decompressed data input, for application monitoring and auditing. AWS released a feature to support decompression of CloudWatch Logs in Firehose. With this new feature, you can specify an option in Firehose to decompress CloudWatch Logs. You no longer have to perform additional processing using AWS Lambda or post-processing to get decompressed logs, and can deliver decompressed data to Splunk. Additionally, you can use optional Firehose features such as record format conversion to convert CloudWatch Logs to Parquet or ORC, and dynamic partitioning to automatically group streaming records based on keys in the data (for example, by month) and deliver the grouped records to corresponding Amazon S3 prefixes. In this post, we look at how to enable the decompression feature for Splunk and Amazon S3 destinations. We start with Splunk and then Amazon S3 for new streams, then we address migration steps to take advantage of this feature and simplify your existing pipeline. Decompress CloudWatch Logs for Splunk You can use subscription filter in CloudWatch log groups to ingest data directly to Firehose or through Amazon Kinesis Data Streams. Note: For the CloudWatch Logs decompression feature, you need a HTTP Event Collector (HEC) data input created in Splunk, with indexer acknowledgement enabled and the source type. This is required to map to the right source type for the decompressed logs. When creating the HEC input, include the source type mapping (for example, aws:cloudtrail). To create a Firehose delivery stream for the decompression feature, complete the following steps: Provide your destination settings and select Raw endpoint as endpoint type. You can use a raw endpoint for the decompression feature to ingest both raw and JSON-formatted event data to Splunk. For example, VPC Flow Logs data is raw data, and AWS CloudTrail data is in JSON format. Enter the HEC token for Authentication token. To enable decompression feature, deselect Transform source records with AWS Lambda under Transform records. Select Turn on decompression and Turn on message extraction for Decompress source records from Amazon CloudWatch Logs. Select Turn on message extraction for the Splunk destination. Message extraction feature After decompression, CloudWatch Logs are in JSON format, as shown in the following figure. You can see the decompressed data has metadata information such as logGroup, logStream, and subscriptionFilters, and the actual data is included within the message field under logEvents (the following example shows an example of CloudTrail events in the CloudWatch Logs). When you enable message extraction, Firehose will extract just the contents of the message fields and concatenate the contents with a new line between them, as shown in following figure. With the CloudWatch Logs metadata filtered out with this feature, Splunk will successfully parse the actual log data and map to the source type configured in HEC token. Additionally, If you want to deliver these CloudWatch events to your Splunk destination in real time, you can use zero buffering, a new feature that was launched recently in Firehose. You can use this feature to set up 0 seconds as the buffer interval or any time interval between 0–60 seconds to deliver data to the Splunk destination in real time within seconds. With these settings, you can now seamlessly ingest decompressed CloudWatch log data into Splunk using Firehose. Decompress CloudWatch Logs for Amazon S3 The CloudWatch Logs decompression feature for an Amazon S3 destination works similar to Splunk, where you can turn off data transformation using Lambda and turn on the decompression and message extraction options. You can use the decompression feature to write the log data as a text file to the Amazon S3 destination or use with other Amazon S3 destination features like record format conversion using Parquet or ORC, or dynamic partitioning to partition the data. Dynamic partitioning with decompression For Amazon S3 destination, Firehose supports dynamic partitioning, which enables you to continuously partition streaming data by using keys within data, and then deliver the data grouped by these keys into corresponding Amazon S3 prefixes. This enables you to run high-performance, cost-efficient analytics on streaming data in Amazon S3 using services such as Amazon Athena, Amazon EMR, Amazon Redshift Spectrum, and Amazon QuickSight. Partitioning your data minimizes the amount of data scanned, optimizes performance, and reduces costs of your analytics queries on Amazon S3. With the new decompression feature, you can perform dynamic partitioning without any Lambda function for mapping the partitioning keys on CloudWatch Logs. You can enable the Inline parsing for JSON option, scan the decompressed log data, and select the partitioning keys. The following screenshot shows an example where inline parsing is enabled for CloudTrail log data with a partitioning schema selected for account ID and AWS Region in the CloudTrail record. Record format conversion with decompression For CloudWatch Logs data, you can use the record format conversion feature on decompressed data for Amazon S3 destination. Firehose can convert the input data format from JSON to Apache Parquet or Apache ORC before storing the data in Amazon S3. Parquet and ORC are columnar data formats that save space and enable faster queries compared to row-oriented formats like JSON. You can use the features for record format conversion under the Transform and convert records settings to convert the CloudWatch log data to Parquet or ORC format. The following screenshot shows an example of record format conversion settings for Parquet format using an AWS Glue schema and table for CloudTrail log data. When the dynamic partitioning settings are configured, record format conversion works along with dynamic partitioning to create the files in the output format with a partition folder structure in the target S3 bucket. Migrate existing delivery streams for decompression If you want to migrate an existing Firehose stream that uses Lambda for decompression to this new decompression feature of Firehose, refer to the steps outlined in Enabling and disabling decompression. Pricing The Firehose decompression feature decompress the data and charges per GB of decompressed data. To understand decompression pricing, refer to Amazon Data Firehose pricing. Clean up To avoid incurring future charges, delete the resources you created in the following order: Delete the CloudWatch Logs subscription filter. Delete the Firehose delivery stream. Delete the S3 buckets. Conclusion The decompression and message extraction feature of Firehose simplifies delivery of CloudWatch Logs to Amazon S3 and Splunk destinations without requiring any code development or additional processing. For an Amazon S3 destination, you can use Parquet or ORC conversion and dynamic partitioning capabilities on decompressed data. For more information, refer to the following resources: Record Transformation and Format Conversion Enabling and disabling decompression Message extraction after decompression of CloudWatch Logs About the Authors Ranjit Kalidasan is a Senior Solutions Architect with Amazon Web Services based in Boston, Massachusetts. He is a Partner Solutions Architect helping security ISV partners co-build and co-market solutions with AWS. He brings over 25 years of experience in information technology helping global customers implement complex solutions for security and analytics. You can connect with Ranjit on LinkedIn. Phaneendra Vuliyaragoli is a Product Management Lead for Amazon Data Firehose at AWS. In this role, Phaneendra leads the product and go-to-market strategy for Amazon Data Firehose. View the full article
  2. Amazon CloudWatch Logs now offers customer to use Internet Protocol version 6 (IPv6) addresses for their new and existing domains. Customers moving to IPV6 can simplify their network stack by running their CloudWatch log groups on a dual-stack network that supports both IPv4 and IPv6. View the full article
  3. Amazon CloudWatch Logs is excited to announce support for creating account-level subscription filters using the put-account-policy API. This new capability enables you to deliver real-time log events that are ingested into Amazon CloudWatch Logs to an Amazon Kinesis Data Stream, Amazon Kinesis Data Firehose, or AWS Lambda for custom processing, analysis, or delivery to other destinations using a single account level subscription filter. View the full article
  4. Introduction We have observed a growing adoption of container services among both startups and established companies. This trend is driven by the ease of deploying applications and migrating from on-premises environments to the cloud. One platform of choice for many of our customers is Amazon Elastic Container Service (Amazon ECS). The powerful simplicity of Amazon ECS allows customers to scale from managing a single task to overseeing their entire enterprise application portfolio and to reach thousands of tasks. Amazon ECS eliminates the management overhead associated with running your own container orchestration service. When working with customers, we have observed that there is a valuable opportunity to enhance the utilization of Amazon ECS events. Lifecycle events offer troubleshooting insights by linking service events with metrics and logs. Amazon ECS displays the latest 100 events, making it tricky to retrospectively review them. Using Amazon CloudWatch Container Insights resolves this by storing Amazon ECS lifecycle events in Amazon CloudWatch Log Group. This integration lets you analyze events retroactively, enhancing operational efficiency. Amazon EventBridge, a serverless event bus, which connects applications seamlessly. Along with Container Insights, Amazon ECS can serve as an Event source while Amazon CloudWatch Logs act as the Target in Amazon EventBridge. This enables post-incident analysis using Amazon CloudWatch Logs Insights. Our post explains how to effectively analyze Amazon ECS service events via Container Insights or Amazon EventBridge or both using Amazon CloudWatch Logs Insights Queries. These queries significantly enhance your development and operational workflows. Prerequisites To be able to work through the techniques that will be presented in this technical guide you must have the below feature enabled in your account. An Amazon ECS Cluster with active workload. Amazon EventBridge configured to stream events to either Amazon CloudWatch Logs directly or having Amazon ECS CloudWatch Container Insights enabled. Here is an elaborated guide to set up Amazon EventBridge to stream events to Amazon CloudWatch Logs or Container Insights. Walkthrough Useful lifecycle events patterns The events that the Elastic Container Service (Amazon ECS) emits can be categorized into four groups: Container instance state change events – These events are triggered when there is a change in the state of an Amazon ECS container instance. This can happen due to various reasons, such as starting or stopping a task, upgrading the Amazon ECS agent, or other scenarios. Task state change events – These events are emitted whenever there is a change in the state of a task, such as when it transitions from pending to running or from running to stopped. Additionally, events are triggered when a container within a task stops or when a termination notice is received for AWS Fargate Spot capacity. Service action events – These events provide information about the state of the service and are categorized as info, warning, or error. They are generated when the service reaches a steady state, when the service consistently cannot place a task, when the Amazon ECS APIs are throttled, or when there are insufficient resources to place a task. Service deployment state change events – These events are emitted when a deployment is in progress, completed, or fails. They are typically triggered by the circuit breaker logic and rollback settings. For a more detailed explanation and examples of these events and their potential use cases, please refer to the Amazon ECS events documentation. Let’s dive into some real-world examples of how to use events for operational support. We’ve organized these examples into four categories based on event patterns: Task Patterns, Service Action Patterns, Service Deployment Patterns, and ECS Container Instance Patterns. Each category includes common use cases and demonstrates specific queries and results. Running Amazon CloudWatch Logs Insights query Follow below steps to run an Amazon CloudWatch Logs Insights queries, which will be covered in latter section of this post: Open the Amazon CloudWatch console and choose Logs, and then choose Logs Insights. Choose log groups containing Amazon ECS events and performance logs to query. Enter the desired query and choose Run to view the results. Task event patterns Scenario 1: In this scenario, the operations team encounters a situation where they need to investigate the cause of HTTP status 5XX (server-side issue) errors that have been observed in their environment. To do so, they reach out to confirm whether an Amazon ECS task correctly followed its intended task lifecycle. The team suspects that a task’s lifecycle events might be contributing to the 5XX errors, and they need to narrow down the exact source of these issues to implement effective troubleshooting and resolution. Required query Query Inputs: detail.containers.0.taskArn: Intended Task ARN fields time as Timestamp, `detail-type` as Type, detail.lastStatus as `Last Status`, detail.desiredStatus as `Desired Status`, detail.stopCode as StopCode, detail.stoppedReason as Reason | filter detail.containers.0.taskArn = "arn:aws:ecs:us-east-1:111122223333:task/CB-Demo/6e81bd7083ad4d559f8b0b147f14753f" | sort @timestamp desc | limit 10 Result: Let’s see how service events can aide confirmation of Task lifecycle, from the results we can see Last Status of task progressed as shown in the following: PROVISIONING > PENDING > ACTIVATING > RUNNING > DEACTIVATING > STOPPING > DEPROVISIONING > STOPPED This confirms to documented task life cycle flow and task was first DEACTIVATED and then STOPPED, we can see that stoppage of this task was initiated by Scheduler ServiceSchedulerInitiated because of reason Task failed container health checks. Similarly, query can also fetch check lifecycle details of a task failing load balancer health checks, result will be as shown in the following: In below query replace detail.containers.0.taskArn with intended Task ARN: fields time as Timestamp, `detail-type` as Type, detail.lastStatus as `Last Status`, detail.desiredStatus as `Desired Status`, detail.stopCode as StopCode, detail.stoppedReason as Reason | filter detail.containers.0.taskArn = "arn:aws:ecs:us-east-1:111122223333:task/CB-Demo/649e1d63f0db482bafa0087f6a3aa5ed" | sort @timestamp desc | limit 10 Let’s see an example of another task which was stopped manually by calling StopTask, because action was UserInitiated and reason is Task stopped by user: As an addition in both cases we can see how Desired State (irrespective of who initiated stop task) drives Last Status of Task. Task Lifecycle for reference: Scenario 2: Let’s consider the scenario where you may encounter frequent task failures within a service, necessitating a means to diagnose the root causes behind these issues. Tasks might be terminating due to various reasons, such as resource limitations or application errors. To address this, you can query for the stop reasons for all tasks in the service to uncover underlying issues. Required Query Query Inputs: detail.group : Your intended service name filter `detail-type` = "ECS Task State Change" and detail.desiredStatus = "STOPPED" and detail.group = "service:circuit-breaker-demo" |fields detail.stoppingAt as stoppingAt, detail.stoppedReason as stoppedReason,detail.taskArn as Task | sort @timestamp desc | limit 200 TIP: In case if you have service autoscaling enabled and there are frequent scaling events for service you can further add another filter to above query to filter out events related to scaling to focus solely on other stop reason. filter detail-type = "ECS Task State Change" and detail.desiredStatus = "STOPPED" and detail.stoppedReason not like "Scaling activity initiated by" and detail.group = "service:circuit-breaker-demo" |fields detail.stoppingAt as stoppingAt, detail.stoppedReason as stoppedReason,detail.taskArn as Task | sort @timestamp desc | limit 200 Result: In the results, we can see the task stop reasons for tasks within the service, along with their respective task IDs. By analyzing these stop reasons, you can identify the specific issues leading to task terminations. Depending on the stop reasons, potential solutions might involve application tuning, adjusting resource allocations, optimizing task definitions, or fine-tuning scaling strategies. Scenario 3: Let’s consider a scenario where your security team needs critical information about the usage of specific network interfaces, MAC addresses, or attachment IDs. It’s important to note that Amazon ECS automatically provisions and deprovisions Elastic Network Interfaces (ENIs) when tasks start and stop. However, once a task is stopped, there are no readily available records or associations to trace back to a specific Task ID using Elastic Network Interface (ENI) or Media Access Control (MAC) assigned to ENI information. This poses a challenge in meeting the security team’s request for such data, as the automatic nature of ENI management in Amazon ECS may limit historical tracking capabilities for these identifiers. Required Query Query Inputs: detail.attachments.1.details.1.value :Intended mac address of the ENI Additional: Replace Task ARNs and Cluster ARN Details fields @timestamp, `detail.attachments.1.details.1.value` as ENIId,`detail.attachments.1.status` as ENIStatus, `detail.lastStatus` as TaskStatus | filter `detail.attachments.1.details.1.value` = "eni-0e2b348058ae3d639" | parse @message "arn:aws:ecs:us-east-1:111122223333:task/CB-Demo/*\"" as TaskId | parse @message "arn:aws:ecs:us-east-1:111122223333:cluster/*\"," as Cluster | parse @message "service:*\"," as Service | display @timestamp, ENIId, ENIStatus, TaskId, Service, Cluster, TaskStatus To Look up by ENI ID, replace value of detail.attachments.1.details.2.value with intended MAC address: fields @timestamp, `detail.attachments.1.details.1.value` as ENIId, `detail.attachments.1.details.2.value` as MAC ,`detail.attachments.1.status` as ENIStatus, `detail.lastStatus` as TaskStatus | filter `detail.attachments.1.details.2.value` = '12:eb:5f:5a:83:93' | parse @message "arn:aws:ecs:us-east-1:111122223333:task/CB-Demo/*\"" as TaskId | parse @message "arn:aws:ecs:us-east-1:111122223333:cluster/*\"," as Cluster | parse @message "service:*\"," as Service | display @timestamp, ENIId, MAC, ENIStatus, TaskId, Service, Cluster, TaskStatus Result: By ENI Id, in results we can details of task/service/cluster for which ENI was provisioned and the state of task to correlate. Just like ENI, we can query by MAC address, with the same details as ENI: Service action event patterns Scenario 4: You may encounter a situation where you need to identify and prioritize resolution for services with the highest number of faults. To achieve this, you want to query and determine the top N services that are experiencing issues. Required Query: filter `detail-type` = "ECS Service Action" and @message like /(?i)(WARN)/ | stats count(detail.eventName) as countOfWarnEvents by resources.0 as serviceArn, detail.eventName as eventFault | sort countOfWarnEvents desc | limit 20 Result: By filtering for WARN events and aggregating service-specific occurrences, you can pinpoint the services that require immediate attention. Prioritizing resolution efforts, for example, the service ecsdemo-auth-no-sd, in this case, is facing the SERVICE_TASK_START_IMPAIRED error. This ensures that you can focus your resources on mitigating the most impactful issues and enhancing the overall reliability of your microservices ecosystem: Service deployment event patterns Scenario 5: Since we are aware that any Amazon ECS service comes with an event type of INFO, WARN, or ERROR, we can use this as a search pattern to analysis our workloads for troubled services. Required Query: fields @timestamp as Time, `resources.0` as Service, `detail-type` as `lifecycleEvent`, `detail.reason` as `failureReason`, @message | filter `detail.eventType` = "ERROR" | sort @timestamp desc | display Time, Service, lifecycleEvent, failureReason | limit 100 Result: In results below the ecsdemo-backend service is failing to successfully deploy tasks, which activates the Amazon ECS circuit breaker mechanism that stops the deployment of the service. Using the expand arrow to the left of the table, we can get more details about the event: Service deployment event patterns Scenario 6: In this scenario, you have received a notification from the operations team indicating that, following a recent deployment to an Amazon ECS service, the previous version of the application is still visible. They are experiencing a situation where the new deployment did not replace the old one as expected, leading to confusion and potential issues. The operations team seeks to understand the series of events that occurred during the deployment process to determine what went wrong, identify the source of the issue, and implement the necessary corrective measures to ensure a successful deployment. Required Query Query Inputs: resources.0 : intended service ARN fields time as Timestamp, detail.deploymentId as DeploymentId , detail.eventType as Severity, detail.eventName as Name, detail.reason as Detail, `detail-type` as EventType | filter `resources.0` ="arn:aws:ecs:us-east-1:12345678910:service/CB-Demo/circuit-breaker-demo" | sort @timestamp desc | limit 10 Result: Let’s analyze service events to understand what went wrong during a deployment, by examining the sequence of events, a clear timeline emerges: We can see that service was initially in steady state (line 7) and there was good deployment (ecs-svc/6629184995452776901 in line 6). A new deployment (ecs-svc/4503003343648563919) occurs, possibly with a code bug (line 5). Task from this deployment was failing to start (line 3). This problematic deployment triggers a circuit breaker logic that initiates a rollback to the previously known good deployment (ecs-svc/6629184995452776901 in line 4). The service eventually returns to a steady state (lines 1 and 2). This sequence of events not only provides a chronological view of what happened but also offers specific insights into the deployments involved and the potential reasons for the issue. By analyzing these service events, the operations team can pinpoint the problematic deployment (i.e., ecs-svc/4503003343648563919) and investigate further to identify and address the underlying code issues, ensuring a more reliable deployment process in the future. ECS container instance event patterns: Scenario 7: You want to track the history of an Amazon ECS Agent updates for container instances in the cluster. A trackable history ensures compliance with security standards by verifying that the agent has the necessary patches and updates installed, and it also allows for the verification of rollbacks in the event of problematic updates. This information is valuable for operational efficiency and service reliability. Required Query: fields @timestamp, detail.agentUpdateStatus as agentUpdateStatus, detail.containerInstanceArn as containerInstanceArn,detail.versionInfo.agentVersion as agentVersion | filter `detail-type` = "ECS Container Instance State Change" | sort @timestamp desc | limit 200 Result: As we can see from the results, the Agent on Container Instance was at v 1.75.0. On Update Agent trigger, the process to update agent started at sequence 9 and finally completed at sequence 1. Initially, the container instance operated with ECS Agent version 1.75.0. Subsequently, at sequence 9, an update operation was initiated, indicating the presence of a new Amazon ECS Agent version. After a series of update actions, the Agent Update successfully concluded at sequence 1. This information offers a clear snapshot of the version transition and update procedure, underlining the importance of tracking Amazon ECS Agent updates to ensure the security, reliability, and functionality of the ECS cluster. Cleaning up Once you’ve completed with exploring sample queries, please ensure you disable any Amazon EventBridge rules and Amazon ECS CloudWatch Container Insights, so that you do not incur any further cost. Conclusion In this post, we’ve explored ways to harness the full potential of Amazon ECS events, a valuable resource for troubleshooting. Amazon ECS provides useful information about tasks, services, deployments, and container instances. Analyzing ECS events in Amazon CloudWatch Logs enables you to identify patterns over time, correlate events with other logs, discover recurring issues, and conduct various forms of analysis. We’ve outlined straightforward yet powerful methods for searching and utilizing Amazon ECS events. This includes tracking the lifecycle of tasks to swiftly diagnose unexpected stoppages, identifying tasks with specific network details to bolster security, pinpointing problematic services, understanding deployment issues, and ensuring the Amazon ECS agent is up-to-date for reliability. This broader perspective on your system’s operations equips you to proactively address problems, gain insights into your container performance, facilitate smooth deployments, and fortify your system’s security. Additional references Now that we have covered the basics of these lifecycle events, let’s look at best practices for querying these lifecycle events in the Amazon CloudWatch Log Insights console for troubling shooting purposes. To learn more about the Amazon CloudWatch query domain-specific language (DSL) visit the documentation (CloudWatch Logs Insights query syntax). You can further setup Anomaly Detection by further processing Amazon ECS events event bridge, which is explained in detail in Amazon Elastic Container Service Anomaly Detector using Amazon EventBridge. View the full article
  5. We are excited to announce regular expression support for Amazon CloudWatch Logs filter pattern syntax, making it easier to search and match relevant logs in the AWS GovCloud (US) Regions. Customers use filter pattern syntax today to search logs, extract metrics using metric filters, and send specific logs to other destinations with subscription filters. With today’s launch, customers will be able to further customize these operations to meet their needs with flexible and powerful regular expressions within filter patterns. Now customers can define one filter to match multiple IP subnets or HTTP status codes using a regular expression such as ‘{ $.statusCode=%4[0-9]{2}% }’ rather than having to define multiple filters to cater to each variation, reducing the configuration and management overhead on their logs. View the full article
  6. To make it easy to interact with your operational data, Amazon CloudWatch is introducing today natural language query generation for Logs and Metrics Insights. With this capability, powered by generative artificial intelligence (AI), you can describe in English the insights you are looking for, and a Logs or Metrics Insights query will be automatically generated. This feature provides three main capabilities for CloudWatch Logs and Metrics Insights: Generate new queries from a description or a question to help you get started easily. Query explanation to help you learn the language including more advanced features. Refine existing queries using guided iterations. Let’s see how these work in practice with a few examples. I’ll cover logs first and then metrics. Generate CloudWatch Logs Insights queries with natural language In the CloudWatch console, I select Log Insights in the Logs section. I then select the log group of an AWS Lambda function that I want to investigate. I choose the Query generator button to open a new Prompt field where I enter what I need using natural language: Tell me the duration of the 10 slowest invocations Then, I choose Generate new query. The following Log Insights query is automatically generated: fields @timestamp, @requestId, @message, @logStream, @duration | filter @type = "REPORT" and @duration > 1000 | sort @duration desc | limit 10 I choose Run query to see the results. I find that now there’s too much information in the output. I prefer to see only the data I need, so I enter the following sentence in the Prompt and choose Update query. Show only timestamps and latency The query is updated based on my input and only the timestamp and duration are returned: fields @timestamp, @duration | filter @type = "REPORT" and @duration > 1000 | sort @duration desc | limit 10 I run the updated query and get a result that is easier for me to read. Now, I want to know if there are any errors in the log. I enter this sentence in the Prompt and generate a new query: Count the number of ERROR messages As requested, the generated query is counting the messages that contain the ERROR string: fields @message | filter @message like /ERROR/ | stats count() I run the query and find out that there are more errors than I expected. I need more information. I use this prompt to update the query and get a better distribution of the errors: Show the errors per hour The updated query uses the bin() function to group the result in one hour intervals. fields @timestamp, @message | filter @message like /ERROR/ | stats count(*) by bin(1h) Let’s see a more advanced query about memory usage. I select the log groups of a few Lambda functions and type: Show invocations with the most over-provisioned memory grouped by log stream Before generating the query, I choose the gear icon to toggle the options to include my prompt and an explanation as comment. Here’s the result (I split the explanation over multiple lines for readability): # Show invocations with the most over-provisioned memory grouped by log stream fields @logStream, @memorySize/1000/1000 as memoryMB, @maxMemoryUsed/1000/1000 as maxMemoryUsedMB, (@memorySize/1000/1000 - @maxMemoryUsed/1000/1000) as overProvisionedMB | stats max(overProvisionedMB) as maxOverProvisionedMB by @logStream | sort maxOverProvisionedMB desc # This query finds the amount of over-provisioned memory for each log stream by # calculating the difference between the provisioned and maximum memory used. # It then groups the results by log stream and calculates the maximum # over-provisioned memory for each log stream. Finally, it sorts the results # in descending order by the maximum over-provisioned memory to show # the log streams with the most over-provisioned memory. Now, I have the information I need to understand these errors. On the other side, I also have EC2 workloads. How are those instances running? Let’s look at some metrics. Generate CloudWatch Metrics Insights queries with natural language In the CloudWatch console, I select All metrics in the Metrics section. Then, in the Query tab, I use the Editor. If you prefer, the Query generator is available also in the Builder. I choose Query generator like before. Then, I enter what I need using plain English: Which 10 EC2 instances have the highest CPU utilization? I choose Generate new query and get a result using the Metrics Insights syntax. SELECT AVG("CPUUtilization") FROM SCHEMA("AWS/EC2", InstanceId) GROUP BY InstanceId ORDER BY AVG() DESC LIMIT 10 To see the graph, I choose Run. Well, it looks like my EC2 instances are not doing much. This result shows how those instances are using the CPU, but what about storage? I enter this in the prompt and choose Update query: How about the most EBS writes? The updated query replaces the average CPU utilization with the sum of bytes written to all EBS volumes attached to the instance. It keeps the limit to only show the top 10 results. SELECT SUM("EBSWriteBytes") FROM SCHEMA("AWS/EC2", InstanceId) GROUP BY InstanceId ORDER BY SUM() DESC LIMIT 10 I run the query and, by looking at the result, I have a better understanding of how storage is being used by my EC2 instances. Try entering some requests and run the generated queries over your logs and metrics to see how this works with your data. Things to know Amazon CloudWatch natural language query generation for logs and metrics is available in preview in the US East (N. Virginia) and US West (Oregon) AWS Regions. There is no additional cost for using natural language query generation during the preview. You only pay for the cost of running the queries according to CloudWatch pricing. When generating a query, you can include your original request and an explanation of the query as comments. To do so, choose the gear icon in the bottom right corner of the query edit window and toggle those options. This new capability can help you generate and update queries for logs and metrics, saving you time and effort. This approach allows engineering teams to scale their operations without worrying about specific data knowledge or query expertise. Use natural language to analyze your logs and metrics with Amazon CloudWatch. — Danilo View the full article
  7. Searching through log data to find operational or business insights often feels like looking for a needle in a haystack. It usually requires you to manually filter and review individual log records. To help you with that, Amazon CloudWatch has added new capabilities to automatically recognize and cluster patterns among log records, extract noteworthy content and trends, and notify you of anomalies using advanced machine learning (ML) algorithms trained using decades of Amazon and AWS operational data. Specifically, CloudWatch now offers the following: The Patterns tab on the Logs Insights page finds recurring patterns in your query results and lets you analyze them in detail. This makes it easier to find what you’re looking for and drill down into new or unexpected content in your logs. The Compare button in the time interval selector on the Logs Insights page lets you quickly compare the query result for the selected time range to a previous period, such as the previous day, week, or month. In this way, it takes less time to see what has changed compared to a previous stable scenario. The Log Anomalies page in the Logs section of the navigation pane automatically surfaces anomalies found in your logs while they are processed during ingestion. Let’s see how these work in practice with a typical troubleshooting journey. I will look at some application logs to find key patterns, compare two time periods to understand what changed, and finally see how detecting anomalies can help discover issues. Finding recurring patterns in the logs In the CloudWatch console, I choose Logs Insights from the Logs section of the navigation pane. To start, I have selected which log groups I want to query. In this case, I select a log group of a Lambda function that I want to inspect and choose Run query. In the Pattern tab, I see the patterns that have been found in these log groups. One of the patterns seems to be an error. I can select it to quickly add it as a filter to my query and focus on the logs that contain this pattern. For now, I choose the magnifying glass icon to analyze the pattern. In the Pattern inspect window, a histogram with the occurrences of the pattern in the selected time period is shown. After the histogram, samples from the logs are provided. The variable parts of the pattern (such as numbers) have been extracted as “tokens.” I select the Token values tab to see the values for a token. I can select a token value to quickly add it as a filter to the query and focus on the logs that contain this pattern with this specific value. I can also look at the Related patterns tab to see other logs that typically occurred at the same time as the pattern I am analyzing. For example, if I am looking at an ERROR log that was always written alongside a DEBUG log showing more details, I would see that relationship there. Comparing logs with a previous period To better understand what is happening, I choose the Compare button in the time interval selector. This updates the query to compare results with a previous period. For example, I choose Previous day to see what changed compared to yesterday. In the Patterns tab, I notice that there has actually been a 10 percent decrease in the number of errors, so the current situation might not be too bad. I choose the magnifying glass icon on the pattern with severity type ERROR to see a full comparison of the two time periods. The graph overlaps the occurrences of the pattern over the two periods (now and yesterday in this case) inside the selected time range (one hour). Errors are decreasing but are still there. To reduce those errors, I make some changes to the application. I come back after some time to compare the logs, and a new ERROR pattern is found that was not present in the previous time period. My update probably broke something, so I roll back to the previous version of the application. For now, I’ll keep it as it is because the number of errors is acceptable for my use case. Detecting anomalies in the log I am reassured by the decrease in errors that I discovered comparing the logs. But how can I know if something unexpected is happening? Anomaly detection for CloudWatch Logs looks for unexpected patterns in the logs as they are processed during ingestion and can be enabled at log group level. I select Log groups in the navigation pane and type a filter to see the same log group I was looking at before. I choose Configure in the Anomaly detection column and select an Evaluation frequency of 5 minutes. Optionally, I can use a longer interval (up to 60 minutes) and add patterns to process only specific log events for anomaly detection. After I activate anomaly detection for this log group, incoming logs are constantly evaluated against historical baselines. I wait for a few minutes and, to see what has been found, I choose Log anomalies from the Logs section of the navigation pane. To simplify this view, I can suppress anomalies that I am not interested in following. For now, I choose one of the anomalies in order to inspect the corresponding pattern in a way similar to before. After this additional check, I am convinced there are no urgent issues with my application. With all the insights I collected with these new capabilities, I can now focus on the errors in the logs to understand how to solve them. Things to know Amazon CloudWatch automated log pattern analytics is available today in all commercial AWS Regions where Amazon CloudWatch Logs is offered excluding the China (Beijing), the China (Ningxia), and Israel (Tel Aviv) Regions. The patterns and compare query features are charged according to existing Logs Insights query costs. Comparing a one-hour time period against another one-hour time period is equivalent to running a single query over a two-hour time period. Anomaly detection is included as part of your log ingestion fees, and there is no additional charge for this feature. For more information, see CloudWatch pricing. Simplify how you analyze logs with CloudWatch automated log pattern analytics. — Danilo View the full article
  8. We are excited to announce regular expression support for Amazon CloudWatch Logs filter pattern syntax, making it easier to search and match relevant logs. Customers use filter pattern syntax today to search logs, extract metrics using metric filters, and send specific logs to other destinations with subscription filters. With today’s launch, customers will be able to further customize these operations to meet their needs with flexible and powerful regular expressions within filter patterns. Now customers can define one filter to match multiple IP subnets or HTTP status codes using a regular expression such as ‘{ $.statusCode=%4[0-9]{2}% }’ rather than having to define multiple filters to cater to each variation, reducing the configuration and management overhead on their logs. View the full article
  9. Amazon CloudWatch Logs is excited to announce a new Logs Insights command, dedup, which enables customers to eliminate duplicate results when analyzing logs. Customers frequently want to query their logs and view only unique results based on one or more fields. You can now use the new dedup command in your Amazon CloudWatch Logs Insights queries to view unique results based on one or more fields. For example, you can view the most recent error message for each hostname by executing the dedup command on the hostname field. View the full article
  10. Amazon CloudWatch Logs is excited to announce support for account level data protection policy configuration, you can now create a data protection policy that will be applied to all existing and future log groups within your AWS account. View the full article
  11. We are excited to announce Amazon CloudWatch Logs Live Tail, a new interactive log analytics experience feature that helps you detect and debug anomalies in applications. You can now view your logs interactively in real-time as they’re ingested, which helps you to analyze and resolve issues across your systems and applications. View the full article
  12. We are excited to announce Amazon CloudWatch Logs data protection is now available in Middle East (UAE), Asia Pacific (Hyderabad), Europe (Spain), Europe (Zurich), and Asia Pacific (Melbourne). Data protection is a feature that leverages pattern matching and machine learning capabilities to detect and protect sensitive log data-in-transit. Amazon CloudWatch Logs enables you to centralize the logs from all of your systems, applications, and AWS services, in a single, highly scalable service. With log data protection in Amazon CloudWatch Logs, you can now detect and protect sensitive log data in-transit such as, Credit Card Numbers or Government ID’s logged by your systems, and applications. View the full article
  13. Amazon CloudWatch Logs now supports ingesting enriched metadata introduced in Amazon Virtual Private Cloud (Amazon VPC) flow logs as part of versions 3 to 5 additional to the default fields. This launch includes metadata fields that provide more insights about the network interface, traffic type, and the path of egress traffic to the destination. View the full article
  14. AWS IoT Core announces General Availability of the capability to send device logs from Internet of Things (IoT) devices to Amazon CloudWatch Logs in batches, enabling you to optimize the cost of using CloudWatch Log Action in IoT Rules. View the full article
  15. Starting today, Amazon CloudWatch Logs is removing the 5 requests per second log stream quota when calling Amazon CloudWatch Logs PutLogEvents API. There will be no new per log stream quota. With this change we have removed the need for splitting your log ingestion across multiple log streams to prevent log stream throttling. View the full article
  16. Today we are announcing Amazon CloudWatch Logs data protection, a new set of capabilities for Amazon CloudWatch Logs that leverage pattern matching and machine learning (ML) to detect and protect sensitive log data in transit. While developers try to prevent logging sensitive information such as Social Security numbers, credit card details, email addresses, and passwords, sometimes it gets logged. Until today, customers relied on manual investigation or third-party solutions to detect and mitigate sensitive information from being logged. If sensitive data is not redacted during ingestion, it will be visible in plain text in the logs and in any downstream system that consumed those logs. Enforcing prevention across the organization is challenging, which is why quick detection and prevention of access to sensitive data in the logs is important from a security and compliance perspective. Starting today, you can enable Amazon CloudWatch Logs data protection to detect and mask sensitive log data as it is ingested into CloudWatch Logs or as it is in transit. Customers from all industries that want to take advantage of native data protection capabilities can benefit from this feature. But in particular, it is useful for industries under strict regulations that need to make sure that no personal information gets exposed. Also, customers building payment or authentication services where personal and sensitive information may be captured can use this new feature to detect and mask sensitive information as it’s logged. Getting Started You can enable a data protection policy for new or existing log groups from the AWS Management Console, AWS Command Line Interface (CLI), or AWS CloudFormation. From the console, select any log group and create a data protection policy in the Data protection tab. When you create the policy, you can specify the data you want to protect. Choose from over 100 managed data identifiers, which are a repository of common sensitive data patterns spanning financial, health, and personal information. This feature provides you with complete flexibility in choosing from a wide variety of data identifiers that are specific to your use cases or geographical region. You can also enable audit reports and send them to another log group, an Amazon Simple Storage Service (Amazon S3) bucket, or Amazon Kinesis Firehose. These reports contain a detailed log of data protection findings. If you want to monitor and get notified when sensitive data is detected, you can create an alarm around the metric LogEventsWithFindings. This metric shows how many findings there are in a particular log group. This allows you to quickly understand which application is logging sensitive data. When sensitive information is logged, CloudWatch Logs data protection will automatically mask it per your configured policy. This is designed so that none of the downstream services that consume these logs can see the unmasked data. From the AWS Management Console, AWS CLI, or any third party, the sensitive information in the logs will appear masked. Only users with elevated privileges in their IAM policy (add logs:Unmask action in the user policy) can view unmasked data in CloudWatch Logs Insights, logs stream search, or via FilterLogEvents and GetLogEvents APIs. You can use the following query in CloudWatch Logs Insights to unmask data for a particular log group: fields @timestamp, @message, unmask(@message) | sort @timestamp desc | limit 20 Available Now Data protection is available in US East (Ohio), US East (N. Virginia), US West (N. California), US West (Oregon), Africa (Cape Town), Asia Pacific (Hong Kong), Asia Pacific (Jakarta), Asia Pacific (Mumbai), Asia Pacific (Osaka), Asia Pacific (Seoul), Asia Pacific (Singapore), Asia Pacific (Sydney), Asia Pacific (Tokyo), Canada (Central), Europe (Frankfurt), Europe (Ireland), Europe (London), Europe (Milan), Europe (Paris), Europe (Stockholm), Middle East (Bahrain), and South America (São Paulo) AWS Regions. Amazon CloudWatch Logs data protection pricing is based on the amount of data that is scanned for masking. You can check the CloudWatch Logs pricing page to learn more about the pricing of this feature in your Region. Learn more about data protection on the CloudWatch Logs User Guide. — Marcia View the full article
  17. We are excited to announce data protection in Amazon CloudWatch Logs, a new set of capabilities that leverage pattern matching and machine learning capabilities to detect and protect sensitive log data-in-transit. Amazon CloudWatch Logs enables you to centralize the logs from all of your systems, applications, and AWS services, in a single, highly scalable service. With log data protection in Amazon CloudWatch Logs, you can now detect and protect sensitive log data-in-transit logged by your systems, and applications. View the full article
  18. Amazon CloudWatch Logs now supports exporting logs to Amazon Simple Storage Service (S3) buckets encrypted using Server side encryption with KMS (SSE-KMS) keys. View the full article
  19. Amazon Managed Service for Prometheus now provides Alert Manager & Ruler logs to help customers troubleshoot their alerting pipeline and configuration in Amazon CloudWatch Logs. Amazon Managed Service for Prometheus is a fully managed Prometheus-compatible monitoring service that makes it easy to monitor and alarm on operational metrics at scale. Prometheus is a popular Cloud Native Computing Foundation open source project for monitoring and alerting that is optimized for container environments. The Alert Manager allows customers to group, route, deduplicate, and silence alarms before routing them to end users via Amazon Simple Notification Service (Amazon SNS). The Ruler allows customers to define recording and alerting rules, which are queries that are evaluated at regular intervals. With Alert Manager and Ruler logs, customers can troubleshoot issues in their alerting pipelines including missing Amazon SNS topic permissions, misconfigured alert manager routes, and rules that fail to execute. View the full article
  20. Amazon DocumentDB (with MongoDB compatibility) is a database service that is purpose-built for JSON data management at scale, fully managed and integrated with AWS, and enterprise-ready with high durability. View the full article
  21. Developers can now access Amazon CloudWatch Logs within Visual Studio using the AWS Toolkit for Visual Studio. Directly from the IDE, it is now possible to search and filter log groups, log streams, and events. Additionally, log groups can be accessed from their associated resources, and log events can be downloaded to a file. View the full article
  22. Amazon CloudWatch Logs announces Dimension support for Metric Filters. CloudWatch Logs Metric Filters allow you to create filter patterns to search for and match terms, phrases, or values in your CloudWatch Logs log events, and turn these into metrics that you can graph in CloudWatch Metrics or use to create a CloudWatch Alarm. Now with Dimension support for Metric Filters you can create metrics from JSON or space-delimited logs with up to 3 dimensions, where a dimension is a key-value pair that is part of the identity of a metric. View the full article
  23. You can now view and manage all Amazon CloudWatch Logs transactional API service quotas with Service Quotas. Service Quotas consolidates the default values and your account specific quotas for CloudWatch Logs in one single view with the Service Quotas console. With the CloudWatch Logs and Service Quotas integration you can now easily view and adjust your quotas. View the full article
  24. You can now publish the Redis slow log from your Amazon ElastiCache for Redis clusters to Amazon CloudWatch Logs and Amazon Kinesis Data Firehose. The Redis slow log provides visibility into the execution time of commands in your Redis cluster, enabling you to continuously monitor the performance of these operations. You can choose to send these logs in either JSON or text format to Amazon CloudWatch Logs and Amazon Kinesis Data Firehose. View the full article
  25. Amazon CloudWatch Logs now supports two subscription filters per log group, enabling you to deliver a real-time feed of log events from CloudWatch Logs to an Amazon Kinesis Data Stream, Amazon Kinesis Data Firehose, or AWS Lambda for custom processing, analysis, or delivery to other systems. View the full article
  • Forum Statistics

    67.4k
    Total Topics
    65.3k
    Total Posts
×
×
  • Create New...