Jump to content

GKE at 65,000 nodes: Evaluating performance for simulated mixed AI workloads

Featured Replies

Posted

At Google Cloud, we’re continuously working on Google Kubernetes Engine (GKE) scalability so it can run increasingly demanding workloads. Recently, we announced that GKE can support a massive 65,000-node cluster, up from 15,000 nodes. This signals a new era of possibilities, especially for AI workloads and their ever-increasing demand for large-scale infrastructure.

In this blog post, we explore a benchmark that simulates these massive AI workloads on a 65,000-node GKE cluster. As we look to develop and deploy even larger LLMs on GKE, we regularly run this benchmark against our infrastructure as a continuous integration (CI) test. We look at its results in detail, as well as the challenges we faced and ways to mitigate them...

View the full article

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...