Search the Community
Showing results for tags 'troubleshooting'.
-
Consider this scenario: You fire up your Docker containers, hit an API endpoint, and … bam! It fails. Now what? The usual drill involves diving into container logs, scrolling through them to understand the error messages, and spending time looking for clues that will help you understand what’s wrong. But what if you could get a summary of what’s happening in your containers and potential issues with the proposed solutions already provided? In this article, we’ll dive into a solution that solves this issue using AI. AI can already help developers write code, so why not help developers understand their system, too? Signal0ne is a Docker Desktop extension that scans Docker containers’ state and logs in search of problems, analyzes the discovered issues, and outputs insights to help developers debug. We first learned about Signal0ne as the winning submission in the 2023 Docker AI/ML Hackathon, and we’re excited to show you how to use it to debug more efficiently. Introducing Signal0ne Docker extension: Streamlined debugging for Docker The magic of the Signal0ne Docker extension is its ability to shorten feedback loops for working with and developing containerized applications. Forget endless log diving — the extension offers a clear and concise summary of what’s happening inside your containers after logs and states are analyzed by an AI agent, pinpointing potential issues and even suggesting solutions. Developing applications these days involves more than a block of code executed in a vacuum. It is a complex system of dependencies, and different user flows that need debugging from time to time. AI can help filter out all the system noise and focuses on providing data about certain issues in the system so that developers can debug faster and better. Docker Desktop is one of the most popular tools used for local development with a huge community, and Docker features like Docker Debug enhance the community’s ability to quickly debug and resolve issues with their containerized apps. Signal0ne Docker extension’s suggested solutions and summaries can help you while debugging your container or editing your code so that you can focus on bringing value as a software engineer. The term “developer experience” is often used, but this extension focuses on one crucial aspect: shortening development time. This translates directly to increased productivity, letting you build containerized applications faster and more efficiently. How does the Docker Desktop extension work? Between AI co-pilots, highly integrated in IDEs that help write code, and browser AI chats that help understand software development concepts in a Q&A way, there is one piece missing: logs and runtime system data. The Signal0ne Docker Desktop extension consists of three components: two hosted on the user’s local system (UI and agent) and one in the Signal0ne cloud backend service. The agent scans the user’s local environment in the search of containers with invalid states, runtime issues, or some warnings or errors in the logs, after issue discovery, it collects additional data from container definition for enhanced analysis. After the Signal0ne agent discovery, data is sent to the backend service, where a combination of pre-trained LLM and solution search retrieval service performs the analysis. The analysis of the issues can be seen from the Signal0ne extension UI, including: Short log summary — Outlines what is happening within a particular container, including logs on which analysis was based can be accessed from the sources dropdown if you wish. Solutions summary — One paragraph about possible solutions to the analyzed issue with some explanation of why this particular solution was proposed. Solution sources — Links to websites where the issue has already been solved by the community (for example, a GitHub issue or StackOverflow post). Watch this short video demonstration for more details: Technical details The issue analysis pipeline is powered by the LangChain tooling framework, which utilizes open source Mistral 7b LLM with LLM-optimized web search that browses sources of existing solutions on the open web (Figure 1). Figure 1: Signal0ne architecture overview. The following section shows sample issue analysis pipeline output: Logs from Docker container: “$ irb\n>> \"12341:asdf\"[/\\d+/]\n# => \"12341\" \n>> \"12341:asdf\"[/\\d*/]\n# => \"12341\" \n>> \"12341:asdf\"[0..5]\n# => \"12341:\" \n>> \"12341:asdf\"[0...5]\n# => \"12341\" \n>> \"12341:asdf\"[0, ':']\nTypeError: can't convert String into Integer\n from (irb):5:in” Summary: The user is trying to extract a substring from the string 12341:asdf using different Ruby methods, such as [/\\\\d+/], [/\\\\d\\\\*/], [0..5], [0...5], and [0, ':']. However, when attempting to use the indexing method [0, ':'], an error occurs because the method expects an integer argument but receives a string instead. Predicted solutions: “To extract a substring from the given string 12341:asdf without encountering an error when using indexing methods, consider the following options: 1. Using regular expressions: To extract a substring using regular expressions, you can use the scan method with a regular expression pattern. For instance, to extract the digits before the colon, you can use the following code snippet: str = "12341:asdf" substring = str.scan(/[\\\\d]+/) puts substring.first # Output: ["12341"] puts substring.first[0] # Output: 12341 2. Slicing: Another alternative to indexing is slicing, which allows you to extract a specified range of characters from a string. To extract the digits before the colon using slicing, you can use the following code snippet: str = "12341:asdf" substring = str[0...str.index(":")] puts substring # Output: "12341" puts substring.to_i # Output: 12341 These two methods should help you extract the desired substring from the given string without encountering an error. Getting started Getting started with the Signal0ne Docker Desktop extension is a straightforward process that allows developers to leverage the benefits of unified development. Here are the steps for installing Signal0ne Docker extension: 1. Install Docker Desktop. 2. Choose Add Extensions in the left sidebar. The Browse tab will appear by default (Figure 2). Figure 2: Signal0ne extension installation from the marketplace. 3. In the Filters drop-down, select the Utility tools category. 4. Find Signal0ne and then select Install (Figure 3). Figure 3: Extension installation process. 5. Log in after the extension is installed (Figure 4). Figure 4: Signal0ne extension login screen. 6. Start developing your apps, and, if you face some issues while debugging, have a look at the Signal0ne extension UI. The issue analysis will be there to help you with debugging. Make sure the Signal0ne agent is enabled by toggling on (Figure 5): Figure 5: Agent settings tab. Figure 6 shows the summary and sources: Figure 6: Overview of the inspected issue. Proposed solutions and sources are shown in Figures 7 and 8. Solutions sources will redirect you to a webpage with predicted solution: Figure 7: Overview of proposed solutions to the encountered issue. Figure 8: Overview of the list of helpful links. If you want to contribute to the project, you can leave feedback via the Like or Dislike button in the issue analysis output (Figure 9). Figure 9: You can leave feedback about analysis output for further improvements. To explore Signal0ne Docker Desktop extension without utilizing your containers, consider experimenting with dummy containers using this docker compose to observe how logs are being analyzed and how helpful the output is with the insights: services: broken_bulb: # c# application that cannot start properly image: 'Signal0neai/broken_bulb:dev' faulty_roger: # image: 'Signal0neai/faulty_roger:dev' smoked_server: # nginx server hosting the website with the miss-configuration image: 'Signal0neai/smoked_server:dev' ports: - '8082:8082' invalid_api_call: # python webserver with bug image: 'Signal0neai/invalid_api_call:dev' ports: - '5000:5000' broken_bulb: This service uses the image Signal0neai/broken_bulb:dev. It’s a C# application that throws System.NullReferenceException during the startup. Thanks to that application, you can observe how Signal0ne discovers the failed container, extracts the error logs, and analyzes it. faulty_roger: This service uses the image Signal0neai/faulty_roger:dev. It is a Python API server that is trying to connect to an unreachable database on localhost. smoked_server: This service utilizes the image Signal0neai/smoked_server:dev. The smoked_server service is an Nginx instance that is throwing 403 forbidden while the user is trying to access the root path (http://127.0.0.1:8082/). Signal0ne can help you debug that. invalid_api_call: API service with a bug in one of the endpoints, to generate an error call http://127.0.0.1:5000/create-table after running the container. Follow the analysis of Signal0ne and try to debug the issue. Conclusion Debugging containerized applications can be time-consuming and tedious, often involving endless scrolling through logs and searching for clues to understand the issue. However, with the introduction of the Signal0ne Docker extension, developers can now streamline this process and boost their productivity significantly. By leveraging the power of AI and language models, the extension provides clear and concise summaries of what’s happening inside your containers, pinpoints potential issues, and even suggests solutions. With its user-friendly interface and seamless integration with Docker Desktop, the Signal0ne Docker extension is set to transform how developers debug and develop containerized applications. Whether you’re a seasoned Docker user or just starting your journey with containerized development, this extension offers a valuable tool that can save you countless hours of debugging and help you focus on what matters most — building high-quality applications efficiently. Try the extension in Docker Desktop today, and check out the documentation on GitHub. Learn more Subscribe to the Docker Newsletter. Get the latest release of Docker Desktop. Vote on what’s next! Check out our public roadmap. Have questions? The Docker community is here to help. New to Docker? Get started. View the full article
-
- docker extensions
- ai
-
(and 3 more)
Tagged with:
-
Private Service Connect is a Cloud Networking offering that creates a private and secure connection from your VPC networks to a service producer, and is designed to help you consume services faster, protect your data, and simplify service management. However, like all complex networking setups, sometimes things don’t work as planned. In this post, you will find useful tips that can help you to tackle issues related to Private Service Connect, even before reaching out to Cloud Support. Introduction to Private Service Connect Before we get into the troubleshooting bits, let’s briefly discuss the basics of Private Service Connect. Understanding your setup is key for isolating the problem. Private Service Connect is similar to private services access, except that the service producer VPC network doesn't connect to your (consumer) network using VPC network peering. A Private Service Connect service producer can be Google, a third-party, or even yourself. When we talk about consumers and producers, it's important to understand what type of Private Service Connect is configured on the consumer side and what kind of managed service it intends to connect with on the producer side. Consumers are the ones who want the services, while producers are the ones who provide them. The various types of Private Service Connect configurations are: Private Service Connect endpoints are configured as forwarding rules which are allocated with an IP address and it is mapped to a managed service by targeting a Google API bundle or a service attachment. These managed services can be diverse, ranging from global Google APIs to Google Managed Services, third-party services, and even in-house, intra-organization services. When a consumer creates an endpoint that references a Google APIs bundle, the endpoint's IP address is a global internal IP address – the consumer picks an internal IP address that's outside all subnets of the consumer's VPC network and connected networks. When a consumer creates an endpoint that references a service attachment, the endpoint's IP address is a regional internal IP address in the consumer's VPC network – from a subnet in the same region as the service attachment. Private Service Connect backends are configured with a special Network Endpoint Group of the type Private Service Connect which refers to a locational Google API, or to a published service service attachment. A service attachment is your link to a compatible producer load balancer. And Private Service Connect interfaces, a special type of network interface that allows service producers to initiate connections to service consumers. How Private Service Connect works Network Address Translation (NAT) is the underlying network technology that powers up Private Service Connect using Google Cloud’s software-defined networking stack called Andromeda. Let's break down how Private Service Connect works to access a published service based on an internal network-passthrough load balancer using a connect endpoint. In this scenario, you set up a Private Service Connect endpoint on the consumer side by configuring a forwarding rule that targets a service attachment. This endpoint has an IP address within your VPC network. When a VM instance in the VPC network sends traffic to this endpoint, the host’s networking stack will apply client-side load balancing to send the traffic to a destination host based on the location, load and health.The packets are encapsulated and routed through Google Cloud’s network fabric.At the destination host, the packet processor will apply Source Network Address Translation (SNAT) and Destination Network Address Translation (DNAT) using the NAT subnet configured and the producer IP address of the service, respectively.The packet is delivered to the VM instance serving as the load balancer’s backend.All of this is orchestrated by Andromeda’s control plane; with a few exceptions, there are no middle box or intermediaries involved in this process, enabling you to achieve line rate performance. For additional details, see Private Service Connect architecture and performance. With this background, you should be already able to identify the main components where issues could occur: the source host, the network fabric, the destination host, and the control-plane. Know your troubleshooting toolsThe Google Cloud console provides you with the following tools to troubleshoot most of the Private Service Connect issues that you might encounter. Connectivity TestConnectivity Tests is a diagnostics tool that lets you check connectivity between network endpoints. It analyzes your configuration and, in some cases, performs live data-plane analysis between the endpoints. Configuration Analysis supports Private Service Connect: Consumers can check connectivity from their source systems to PSC endpoints (or consumer load balancers using PSC NEG backends), while producers can verify that their service is operational for consumers. Live Data Plane Analysis supports both Private Service Connect endpoints for published services and Google APIs: Verify reachability and latency between hosts by sending probe packets over the data plane. This feature provides baseline diagnostics of latency and packet loss. In cases where Live Data Plane Analysis is not available, consumers can coordinate with a service producer to collect simultaneous packet captures at the source and destination using tcpdump. Cloud Logging Cloud Logging is a fully managed service that allows you to store, search, analyze, monitor, and alert on logging data and events. Audit logs allow you to monitor Private Service Connect activity. Use them to track intentional or unintentional changes to Private Service Connect resources, find any errors or warnings and monitor changes in connection status for the endpoint. These are mostly useful when troubleshooting issues during the setup or updates in the configuration. In this example, you can track endpoint connection status changes (pscConnectionStatus) by examining audit logs for your GCE forwarding rule resource: code_block <ListValue: [StructValue([('code', 'resource.type="gce_forwarding_rule"\r\nprotoPayload.methodName="LogPscConnectionStatusUpdate"'), ('language', ''), ('caption', <wagtail.rich_text.RichText object at 0x3ef828a856a0>)])]> VPC Flow Logs to monitor Private Service Connect traffic. Consumers can enable VPC Flow Logs at the client subnet to monitor traffic flow directed to the Private Service Connect endpoint. This allows the consumer to validate traffic egressing the VM instance. Producers can enable VPC Flow Logs at the target load balancer subnet to monitor traffic ingressing their VM instances backends. Consider that VPC Flow Logs are sampled and may not capture short-lived connections. To get more detailed information, run a packet capture using tcpdump. Cloud Monitoring Another member of the observability stack, Cloud Monitoring can help you to gain visibility into the performance of Private Service Connect. Another member of the observability stack, Cloud Monitoring can help you to gain visibility into the performance of Private Service Connect. Producer metrics to monitor Published services. Take a look at the utilization of service attachment resources like NAT ports, connected forwarding rules and connections by service attachment ID to correlate with connectivity and performance issues. See if there are any dropped packets at the producer side (Preview feature). Received packets dropped count are related to NAT resource exhaustion. Sent packets dropped count indicate that a service backend is sending packets to a consumer after the NAT translation state has expired. When this occurs, make sure you are following the NAT subnets recommendations. A packet capture could bring more insights on the nature of the dropped packets. Using this MQL query, producers can monitor NAT subnet capacity for a specific service attachment: code_block <ListValue: [StructValue([('code', 'fetch gce_service_attachment\r\n| metric\r\n \'compute.googleapis.com/private_service_connect/producer/used_nat_ip_addresses\'\r\n| filter (resource.region == "us-central1"\r\n && resource.service_attachment_id == "[SERVICE_ATTACHMENT_ID]")\r\n| group_by 1m,\r\n [value_used_nat_ip_addresses_mean: mean(value.used_nat_ip_addresses)]\r\n| every 1m\r\n| group_by [resource.service_attachment_id],\r\n [value_used_nat_ip_addresses_mean_mean:\r\n mean(value_used_nat_ip_addresses_mean)]'), ('language', ''), ('caption', <wagtail.rich_text.RichText object at 0x3ef828a856d0>)])]> Consumer metrics to monitor endpoints. You can track the number of connections created, opened and closed from clients to the Private Service Connect endpoint. If you see packet drops, take a look at the producer metrics as well. For more information, see Monitor Private Service Connect connections. TIP: Be proactive and set alerts to inform you when you are close to exhausting a known limit (including Private Service Connect quotas). In this example, you can use this MQL query to track PSC Internal LB Forwarding Rules quota usage. code_block <ListValue: [StructValue([('code', "fetch compute.googleapis.com/VpcNetwork\r\n| metric\r\n 'compute.googleapis.com/quota/psc_ilb_consumer_forwarding_rules_per_producer_vpc_network/usage'\r\n| group_by 1m, [value_usage_mean: mean(value.usage)]\r\n| every 1m\r\n| group_by [], [value_usage_mean_aggregate: aggregate(value_usage_mean)]"), ('language', ''), ('caption', <wagtail.rich_text.RichText object at 0x3ef828a85130>)])]> Read the manualConsult the Google Cloud documentation to learn about the limitations and supported configurations. Follow the Private Service Connect guides. Especially for new deployments, it is common to misconfigure a component or find that it is not compatible or supported yet. Ensure that you have gone through the right configuration steps, and go through the limitations and compatibility matrix.Take a look at the VPC Release notes. See if there are any known issues related to Private Service Connect, and look for any new features that could have introduced unwanted behavior. Common issuesSelecting the right tool depends on the specific situation you encounter and where you are in the life cycle of your Private Service Connect journey. Before you start, gather consumer and producer project details, and that in fact, this is a Private Service Connect issue, and not a Private services access problem. Generally, you can face issues during setup or update of any related component or additional capability, or the issues could be present during runtime, when everything is configured but you run into connectivity or performance issues. Issues during setupMake sure that you are following the configuration guide and you have an understanding of the scope and limitations. Check for any error message or warning in the Logs Explorer.Verify that the setup is compatible and supported as per the configuration guides.See if there is any related quota exceeded like the Private Service Connect forwarding rules.Confirm whether there is an organization policy that could prevent the configuration of Private Service Connect components.Issues during runtimeIsolate the issue to the consumer or the producer side of the connection. If you are on the consumer side, check if your endpoint or backend is accepted in the connection status at the Private Service Connect page. Otherwise, review in the producer side the accept/reject connection list and the connection reconciliation setup.If your endpoint is unreachable, check bypassing DNS resolution and run a Connectivity Test to validate routes and firewalls from the source endpoint IP address to the PSC endpoint as destination. On the service producer side, check if the producer service is reachable within the producer VPC network, and from an IP address in the Private Service Connect NAT subnet.If there is a performance issue like network latency or packet drops, check if Live Data Plane Analysis is available to determine a baseline and isolate an issue with the application or service. Also, check the Metrics Explorer for any connections or port exhaustion and packet drops.Working with Cloud SupportOnce that you have pinpointed the issue and you have analyzed the problem, you may need to reach out to Cloud Support for further assistance. To facilitate a smooth experience, be sure to explain your needs, clearly describe the business impact and give enough context with all the information collected. View the full article
-
- best practices
- troubleshooting
-
(and 1 more)
Tagged with:
-
In today’s rapidly evolving digital landscape, organizations heavily rely on their applications and systems to deliver optimal performance. As such, driving down the key metric of Mean Time to Resolution (MTTR) is clearly one of the biggest challenges facing observability practitioners today. To watch the webinar, please fill out the form below: View the full article
-
It’s every on-call’s nightmare—awakened by a text at 3 a.m. from your alert system that says there’s a problem with the cluster. You need to quickly determine if the issue is with the Amazon EKS managed control plane or the new custom application you just rolled out last week. Even though you installed the default dashboards the blogs recommended, you’re still having difficulty understanding the meaning of the metrics you are looking at. If only you had a dashboard that was focused on the most common problems seen in the field—one where you understood what everything means right away, letting you quickly scan for even obscure issues efficiently… View the full article
-
- eks
- kubernetes
-
(and 2 more)
Tagged with:
-
Valencia, Spain, May 16, 2022 — Sosivio, a predictive troubleshooting platform built specifically for Kubernetes, is sponsoring this year’s KubeCon Europe both in-person and virtually from May 16th-20th in Valencia, Spain. We can all agree Kubernetes is the de facto container orchestrator, but Kubernetes environments are difficult to manage at scale. It’s not a question of […] The post Sosivio’s Predictive Troubleshooting for Kubernetes Gives Answers, Not Data appeared first on DevOps.com. View the full article
-
- troubleshooting
- kubernetes
-
(and 1 more)
Tagged with:
-
Single view of performance and events accelerates Kubernetes troubleshooting by up to 10x VALENCIA, SPAIN, (KubeCon + CloudNativeCon Europe), May 16, 2022 — Sysdig, the unified container and cloud security leader, announced the availability of Sysdig Advisor, a Kubernetes troubleshooting feature that consolidates and prioritizes relevant performance details in Sysdig Monitor. By providing a single view […] The post Sysdig Introduces Sysdig Advisor to Drastically Simplify Kubernetes Troubleshooting appeared first on DevOps.com. View the full article
-
- kubernetes
- sysdig
-
(and 1 more)
Tagged with:
-
Forum Statistics
70.4k
Total Topics68.3k
Total Posts