As modern software systems become increasingly distributed, interconnected, and complex, ensuring production reliability and performance is becoming harder and more stressful. Seemingly nondescript changes to our infrastructure or application can have massive impacts on system uptime, health, and performance, all while the cost of production incidents continues to grow. While we need telemetry data – […]View the full article