Search the Community
Showing results for tags 'ray'.
-
Patent-search platform provider IPRally is growing quickly, servicing global enterprises, IP law firms, and multiple national patent and trademark offices. As the company grows, so do its technology needs. It continues to train its models for greater accuracy, adding 200,000 searchable records for customer access weekly, and mapping new patents. With millions of patent documents published annually – and the technical complexity of those documents increasing — it can take even the most seasoned patent professional several hours of research to resolve a case with traditional patent search tools. In 2018, Finnish firm IPRally set out to tackle this problem with a graph-based approach. “Search engines for patents were mostly complicated boolean ones, where you needed to spend hours building a complicated query,” says Juho Kallio, CTO and co-founder of the 50-person firm. “I wanted to build something important and challenging.” Using machine learning (ML) and natural language processing (NLP), the company has transformed the text from over 120 million global patent documents into document-level knowledge graphs embedded into a searchable vector space. Now, patent researchers can receive relevant results in seconds with AI-selected highlights of key information and explainable results. To meet those needs, IPRally built a customized ML platform using Google Kubernetes Engine (GKE) and Ray, an open-source ML framework, balancing efficiency, performance and streamlining machine learning operations (MLOps). The company uses open-source KubeRay to deploy and manage Ray on GKE, which enables them to leverage cost-efficient NVIDIA GPU Spot instances for exploratory ML research and development. It also uses Google Cloud data building blocks, including Cloud Storage and Compute Engine persistent disks. Next on the horizon is expanding to big data solutions with Ray Data and BigQuery. “Ray on GKE has the ability to support us in the future with any scale and any kind of distributed complex deep learning,” says Kallio. A custom ML platform built for performance and efficiency The IPRally engineering team’s primary focus is on R&D and how it can continue to improve its Graph AI to make technical knowledge more accessible. With just two DevOps engineers and one MLOps engineer, IPRally was able to build its own customized ML platform with GKE and Ray as key components. A big proponent of open source, IPRally transitioned everything to Kubernetes when their compute needs grew. However, they didn’t want to have to manage Kubernetes themselves. That led them to GKE, with its scalability, flexibility, open ecosystem, and its support for a diverse set of accelerators. All told, this provides IPRally the right balance of performance and cost, as well as easy management of compute resources and the ability to efficiently scale down capacity when they don’t need it. “GKE provides the scalability and performance we need for these complex training and serving needs and we get the right granularity of control over data and compute,” says Kallio. One particular GKE capability that Kallio highlights is container image streaming, which has significantly accelerated their start-up time. “We have seen that container image streaming in GKE has a significant impact on expediting our application startup time. Image streaming helps us accelerate our start-up time for a training job after submission by 20%,” he shares. “And, when we are able to reuse an existing pod, we can start up in a few seconds instead of minutes.” The next layer is Ray, which the company uses to scale the distributed, parallelized Python and Clojure applications it uses for machine learning. To more easily manage Ray, IPRally uses KubeRay, a specialized tool that simplifies Ray cluster management on Kubernetes. IPRally uses Ray for the most advanced tasks like massive preprocessing of data and exploratory deep learning in R&D. “Interoperability between Ray and GKE autoscaling is smooth and robust. We can combine computational resources without any constraints,” says Kallio. The heaviest ML loads are mainly deployed on G2 VMs featuring eight NVIDIA L4 GPUs featuring up to eight NVIDIA L4 Tensor Core GPUs, which deliver cutting-edge performance-per-dollar for AI inference workloads. And by leveraging them within GKE, IPRally facilitates the creation of nodes on-demand, scales GPU resources as needed, thus optimizing its operational costs. There is a single Terraform-provisioned Kubernetes cluster in each of the regions that IPRally searches for the inexpensive spot instances. GKE and Ray then step in for compute orchestration and automated scaling. To further ease MLOps, IPRally built its own thin orchestration layer, IPRay, atop KubeRay and Ray. This layer provides a command line tool for data scientists to easily provision a templated Ray cluster that scales efficiently up and down and that can run jobs in Ray without needing to know Terraform. This self-service layer reduces friction and allows both engineers and data scientists to focus on their higher-value work. Technology paves the way for strong growth Through this selection of Google Cloud and open-source frameworks, IPRally has shown that a startup can build an enterprise-grade ML platform without spending millions of dollars. Focusing on providing a powerful MLOps and automation foundation from its earliest days has paid dividends in efficiency and the team’s ability to focus on R&D. “Crafting a flexible ML infrastructure from the best parts has been more than worth it,” shares Jari Rosti, an ML engineer at IPRally. “Now, we’re seeing the benefits of that investment multiply as we adapt the infrastructure to the constantly evolving ideas of modern ML. That’s something other young companies can achieve as well by leveraging Google Cloud and Ray.” Further, the company has been saving 70% of ML R&D costs by using Spot instances. These affordable instances offer the same quality VMs as on-demand instances but are subject to interruption. But because IPRally’s R&D workloads are fault-tolerant, they are a good fit for Spot instances. IPRally closed a €10m A round investment last year, and it’s forging on with ingesting and processing IP documentation from around the globe, with a focus on improving its graph neural network models and building the best AI platform for patent searching. With 3.4 million patents filed in 2022, the third consecutive year of growth, data will keep flowing and IPRally can continue helping intellectual property professionals find every relevant bit of information. "With Ray on GKE, we've built an ML foundation that is a testament to how powerful Google Cloud is with AI," says Kallio. “And now, we’re prepared to explore far more advanced deep learning and to keep growing.” View the full article
-
- gke
- kubernetes
-
(and 2 more)
Tagged with:
-
When developers are innovating quickly, security can be an afterthought. That’s even true for AI/ML workloads, where the stakes are high for organizations trying to protect valuable models and data. When you deploy an AI workload on Google Kubernetes Engine (GKE), you can benefit from the many security tools available in Google Cloud infrastructure. In this blog, we share security insights and hardening techniques for training AI/ML workloads on one framework in particular — Ray. Ray needs security hardening As a distributed compute framework for AI applications, Ray has grown in popularity in recent years, and deploying it on GKE is a popular choice that provides flexibility and configurable orchestration. You can read more on why we recommend GKE for Ray. However, Ray lacks built-in authentication and authorization, which means that if you can successfully send a request to the Ray cluster head, it will execute arbitrary code on your behalf. So how do you secure Ray? The authors state that security should be enforced outside of the Ray cluster, but how do you actually harden it? Running Ray on GKE can help you achieve a more secure, scalable, and reliable Ray deployment by taking advantage of existing global Google infrastructure components including Identity-Aware Proxy (IAP). We’re also making strides in the Ray community to make safer defaults for running Ray with Kubernetes using KubeRay. One focus area has been improving Ray component compliance with the restricted Pod Security Standards profile and by adding security best practices, such as running the operator as non-root to help prevent privilege escalation. Security separation supports multi-cluster operation One key advantage of running Ray inside Kubernetes is the ability to run multiple Ray clusters, with diverse workloads, managed by multiple teams, inside a single Kubernetes cluster. This gives you better resource sharing and utilization because nodes with accelerators can be used by several teams, and spinning up Ray on an existing GKE cluster saves waiting on VM provisioning time before workloads can begin execution. Security plays a supporting role in landing those multi-cluster advantages by using Kubernetes security features to help keep Ray clusters separate. The goal is to avoid accidental denial of service or accidental cross-tenant access. Note that the security separation here is not “hard” multitenancy — it is only sufficient for clusters running trusted code and teams that trust each other with their data. If further isolation is required, consider using separate GKE clusters. The architecture is shown in the following diagram. Different Ray clusters are separated by namespaces within the GKE cluster, allowing authorized users to make calls to their assigned Ray cluster, without accessing others. Diagram: Ray on GKE Architecture How to secure Ray on GKE At Google Cloud, we’ve been working on improving the security of KubeRay components, and making it easier to spin up a multi-team environment with the help of Terraform templates including sample security configurations that you can reuse. Below, we’ve summarized fundamental security best practices included in our sample templates: Namespaces: Separate Ray clusters into distinct namespaces by placing one Ray cluster per namespace to take advantage of Kubernetes policies based on the namespace boundary. Role-based access control (RBAC): Practice least privilege by creating a Kubernetes Service Account (KSA) per Ray cluster namespace, avoid using the default KSA associated with each Ray cluster namespace, and minimizing permissions down to no RoleBindings until deemed necessary. Optionally, consider setting automountServiceAccountToken:false on the KSA to ensure the KSA’s token is not available to the Ray cluster Pods, since Ray jobs are not expected to call the Kubernetes API. Resource quotas: Harden against denial of service due to resource exhaustion by setting limits for resource quotas (especially for CPUs, GPUs, TPUs, and memory) on your Ray cluster namespace. NetworkPolicy: Protect the Ray API as a critical measure to Ray security, since there is no authentication or authorization for submitting jobs. Use Kubernetes NetworkPolicy with GKE Dataplane V2 to control which traffic reaches the Ray components. Security context: Comply with Kubernetes Pod Security Standards by configuring Pods to run with hardened settings preventing privilege escalation, running as root, and restricting potentially dangerous syscalls. Workload identity federation: If necessary, secure access from your Ray deployment Pods to other Google Cloud services with workload identity federation such as Cloud Storage, by leveraging your KSA in a Google Cloud IAM policy. Additional security tools The following tools and references can provide additional security for your Ray clusters on GKE: Identity-Aware Policy (IAP): Control access to your Ray cluster with Google’s distributed global endpoint with IAP providing user and group authorization, with Ray deployed as an Kubernetes Ingress or Gateway service. Pod Security Standards (PSS): Turn Pod Security Standards on for each of your namespaces in order to prevent common insecure misconfigurations such as HostVolume mounts. If you need more policy customization, you can also use Policy Controller. GKE Sandbox: Leverage GKE Sandbox Pods based on gVisor to add a second security layer around Pods, further reducing the possibility of breakouts for your Ray clusters. Currently available for CPUs (also GPUs with some limitations). Cluster hardening: By default, GKE Autopilot already applies a lot of cluster hardening best practices, but there are some additional ways to lock down the cluster. The Ray API can be further secured by removing access from the Internet by using private nodes. Organization policies: Ensure your organization's clusters meet security and hardening standards by setting custom organization policies — for example, guarantee that all GKE clusters are Autopilot. Google continues to contribute to the community through our efforts to ensure safe and scalable deployments. We look forward to continued collaboration to ensure Ray runs safely on Kubernetes clusters. Please drop us a line with any feedback at ray-on-gke@google.com or comment on our GitHub repo. To learn more, check out the following resources: Terraform templates for hardening Ray on GKE GKE cluster hardening guide View the full article
-
- kubernetes
- k8s
-
(and 1 more)
Tagged with:
-
The revolution in generative AI (gen AI) and large language models (LLMs) is leading to larger model sizes and increased demands on the compute infrastructure. Organizations looking to integrate these advancements into their applications increasingly require distributed computing solutions that offer minimal scheduling overhead. As the need for scalable gen AI solutions grows, Ray, an open-source Python framework designed for scaling and distributing AI workloads, has become increasingly popular. Traditional Ray deployments on virtual machines (VMs) have limitations when it comes to scalability, resource efficiency, and infrastructure manageability. One alternative is to leverage the power and flexibility of Kubernetes and deploy Ray on Google Kubernetes Engine (GKE) with KubeRay, an open-source Kubernetes operator that simplifies Ray deployment and management. “With the help of Ray on GKE, our AI practitioners are able to get easy orchestration of infrastructure resources, flexibility and scalability that their applications need without the headache of understanding and managing the intricacies of the underlying platform.” - Nacef Labidi, Head of Infrastructure, Instadeep In this blog, we discuss the numerous benefits that running Ray on GKE brings to the table — scalability, cost-efficiency, fault tolerance and isolation, and portability, to name a few — and resources on how to get started. Easy scalability and node auto-provisioningOn VMs, Ray's scalability is inherently limited by the number of VMs in the cluster. Autoscaling and node provisioning, configured for specific clouds (example), require detailed knowledge of machine types and network configurations. In contrast, Kubernetes orchestrates infrastructure resources using containers, pods, and VMs as scheduling units, while Ray distributes data-parallel processes within applications, employing actors and tasks for scheduling. KubeRay introduces cloud-agnostic autoscaling to the mix, allowing you to define minimum and maximum replicas within the workerGroupSpec. Based on this configuration, the Ray autoscaler schedules more Kubernetes pods as required by its tasks. And if you choose the GKE Autopilot mode of operation, node provisioning happens automatically, eliminating the need for manual configuration. Greater efficiency and improved startup latencyGKE offers discount-based savings such as committed use discounts, new pricing model and reservations for GPUs in Autopilot mode. In addition, GKE makes it easy to taking advantage of cost-saving measures like spot nodes via YAML configuration. Low startup latency is critical to optimal resource usage, ensuring quick recovery, faster iterations and elasticity. GKE image streaming lets you initialize and run eligible container images from Artifact Registry, without waiting for the full image to download. Testing demonstrated containers going from `ray-ml` container image going from `ContainerCreating` to `Running` state in 8.82s, compared to 5m17s without image streaming — that’s 35x faster! Image streaming is automatically enabled on Autopilot clusters and available on Standard clusters. Automated infrastructure management for fault tolerance and isolationManaging a Ray cluster on VMs offers control over fault tolerance and isolation via detailed VM configuration. However, it lacks the automated, portable self-healing capabilities that Kubernetes provides. Kubernetes excels at repeatable automation that is expressed with clear declarative and idempotent desired state configuration. It provides automatic self-healing capabilities, which in Kuberay 2.0 or later extends to preventing the Ray cluster from crashing when the head node goes down. In fact, Ray Serve docs specifically recommend Kubernetes for production workloads, using the RayService custom resource to automatically handle health checking, status reporting, failure recovery and upgrades. On GKE, the declarative YAML-based approach not only simplifies deployment and management but can also be used to provision security and isolation. This is achieved by integrating Kubernetes' RBAC with Google Cloud's Identity and Access Management (IAM), allowing administrators to finely tune the permissions granted to each Ray cluster. For instance, a Ray cluster that requires access to a Google Cloud Storage bucket for data ingestion or model storage can be assigned specific roles that limit its actions to reading and writing to that bucket only. This is configured by specifying the Kubernetes service account (KSA) as part of the pod template for Ray cluster `workerGroupSpec` and then linking a Google Service account with appropriate permissions to the KSA using the workload identity annotation. Easy multi-team sharing with Kubernetes namespacesOut of the box, Ray does not have any security separation between Ray clusters. With Kubernetes you can leverage namespaces to create a Ray cluster per team, and use Kubernetes Role-Based Access Control (RBAC), Resource Quotas and Network Policies. This creates a namespace-based trust boundary to allow multiple teams to each manage their Ray clusters within a larger shared Kubernetes cluster. Flexibility and portabilityYou can use Kubernetes for more than just data and AI. As a general-purpose platform, Kubernetes is portable across clouds and on-premises, and has a rich ecosystem. With Kubernetes, you can mix Ray and non-Ray workloads on the same infrastructure, allowing the central platform team to manage a single common compute layer, while leaving infrastructure and resource management to GKE. Think of it as your own personal SRE. Get started with Kuberay on GKEIn conclusion, running Ray on GKE is a straightforward way to achieve scalability, cost-efficiency, fault tolerance and isolation for your production workloads, all while ensuring cloud portability. You get the flexibility to adapt quickly to changing demands, making it an ideal choice for forward-thinking organizations in an ever-evolving generative AI landscape. To get started with Kuberay on GKE, follow these instructions. This repo has Terraform templates to run Kuberay on GPUs and TPUs, and examples for training and serving. You can also find more tutorials and code samples at AI/ML on GKE page. View the full article
-
We are thrilled to introduce Data on EKS (DoEKS), a new open-source project aimed at streamlining and accelerating the process of building, deploying, and scaling data workloads on Amazon Elastic Kubernetes Service (Amazon EKS). With DoEKS, customers get access to a comprehensive range of resources including Infrastructure as Code (IaC) templates, performance benchmark reports, deployment examples, and architectures optimized for data-centric workloads aligned with AWS best practices and industry expertise. This means that customers can quickly and easily provision popular open-source data frameworks (e.g., Apache Spark, Ray, Apache Airflow, Argo Workflows, and Kubeflow) to run on Amazon EKS. Additionally, DoEKS areas of focus include distributed streaming platforms, query engines, and databases to meet the growing demands of data processing. DoEKS blueprints are made with managed AWS services and popular open-source tools to provide customers flexibility to choose the right combination of managed and self-managed components to suit their needs. For example, DoEKS includes several blueprints with Amazon EMR on EKS so customers can take advantage of optimized features like automated provisioning, scaling, faster runtimes, and debugging tools that Amazon EMR provides for running Spark applications... View the full article
-
- eks
- data on eks
-
(and 6 more)
Tagged with:
-
Forum Statistics
70.4k
Total Topics68.3k
Total Posts