Jump to content

GitLab: How we reduced 502 errors by caring about PID 1 in Kubernetes


James

Recommended Posts

Quote

Our SRE on call was getting paged daily that one of our SLIs was burning through our SLOs for the GitLab Pages service. It was intermittent and short-lived, but enough to cause user-facing impact which we weren't comfortable with. This turned into alert fatigue because there wasn't enough time for the SRE on call to investigate the issue and it wasn't actionable since it recovered on its own.

We decided to open up an investigation issue for these alerts. We had to find out what the issue was since we were showing 502 errors to our users and we needed a DRI that wasn't on call to investigate.

Herehttps://about.gitlab.com/blog/2022/05/17/how-we-removed-all-502-errors-by-caring-about-pid-1-in-kubernetes/?utm_id=FAUN_Kaptain321_Link_title

Link to comment
Share on other sites

  • James changed the title to GitLab: How we reduced 502 errors by caring about PID 1 in Kubernetes

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...