Search the Community
Showing results for tags 'nvidia'.
-
MSI is focusing pretty much on Nvidia’s graphics cards, it would seem, and has more or less stopped making AMD GPUs. Or at least the amount of MSI’s Radeon graphics cards on shelves has dwindled considerably, to the point where it’s been noticed by several sources – one of which contacted MSI and got an official response from the board maker. Did I miss this story? MSI has been completely removed from AMD's Radeon 7000 series, all existing products have been discontinued and they never released a 7700 XT/7800 XT. This all seems to have happened very quietly.April 26, 2024 See more Hardware Unboxed flagged this up on X (formerly Twitter), but it was German tech site Hardware Luxx (via VideoCardz) who contacted and heard back from MSI. The manufacturer stated: “When it comes to graphics cards, our focus at the moment is actually more on RTX cards. Nevertheless, the collaboration with AMD is essential and extremely relevant for us. We see a very positive development, particularly in the area of mainboards.” So, while that is hardly a declaration of abandoning Radeon GPUs – the wording is about ‘focusing’ more on Nvidia graphics cards, rather than wholly on Team Green’s GeForce range – it’s pretty obvious where MSI’s priorities lie. That’s made clear enough if you look at the listed cards on MSI’s page for its Radeon RX 7000 models, of which only three are now present (an RX 7600 model, plus a 7900 XT and 7900 XTX). As Hardware Luxx points out, a German price comparison site only lists an RX 7900 XT Gaming Trio as on sale from MSI’s line-up of RDNA 3 GPUs. We popped onto Newegg to see what MSI Radeon products the retailer is currently offering in the US, and there’s just one – again an MSI Gaming model, but this time an RX 7900 XTX. Still, the point remains that this is the sole model on sale. Analysis: Being diplomatic – but don’t worry about mobos It seems clear enough that MSI is running down production of RX 7000 models, then, and the statement from the firm backs that up. To state that its focus is “more on RTX cards” feels like a diplomatic stance on the matter, when clearly MSI’s plans are now very skewed towards Team Green. Indeed, we saw clues to this when AMD launched its RX 7800 XT and 7700 XT GPUs to fill out the mid-range of RDNA 3, and MSI was noticeably absent from the list of launch partners – and never did produce boards based on those graphics chips. MSI was also slow off the mark with RX 7900 launches way back, too. MSI was once a substantial backer of Radeon GPUs, but that no longer appears to be the case, and its attention has all but entirely shifted to GeForce RTX graphics cards. Can we expect that RTX 5000 models will be the only ones made by MSI, and RX 8000 (RDNA 4) may be entirely left by the wayside? It looks like this is a possibility that can’t be discounted. For those worrying about MSI’s motherboard support on the AMD side, there’s no need to fret, as this doesn’t affect the hardware maker’s efforts in that respect. That’s clarified in the above statement and the reference to mainboards, another term for motherboards, with MSI talking about this in a wholly different way, mentioning a “very positive development” in that particular area. Which we’d read as PR-speak for ‘business as usual’ of course. You might also like The best cheap graphics cards out thereTop gaming PCs: great rigs for serious PC gamingBest PC games of 2024: must-play titles you don't want to miss View the full article
-
Ampere Computing unveiled its AmpereOne Family of processors last year, boasting up to 192 single-threaded Ampere cores, which was the highest in the industry. These chips, designed for cloud efficiency and performance, were Ampere's first product based on its new custom core leveraging internal IP, signalling a shift in the sector, according to CEO Renée James. At the time of the launch, James said, "Every few decades of compute there has emerged a driving application or use of performance that sets a new bar of what is required of performance. The current driving uses are AI and connected everything combined with our continued use and desire for streaming media. We cannot continue to use power as a proxy for performance in the data center. At Ampere, we design our products to maximize performance at a sustainable power, so we can continue to drive the future of the industry." AmpereOne-3 on its way Jeff Wittich, chief product officer at Ampere, recently spoke with The Next Platform about future generations of AmpereOne. He told the site that an updated chip, with 12 memory channels and an A2 core with improved performance, would be out later this year in keeping with the company's roadmap. This chip, which The Next Platform calls AmpereOne-2, will reportedly have a 33 percent increase in DDR5 memory controllers and up to 50 percent more memory bandwidth. However, what’s coming up beyond that, at some point in 2025, sounds the most exciting. The Next Platform says the third generation chip, AmpereOne-3 as it is calling it, will have 256 cores and be “etched in 3 nanometer (3N to be precise) processes from TSMC”. It will use a modified A2+ core with a “two-chiplet design on the cores, with 128 cores per chiplet. It could be a four-chiplet design with 64 cores per chiplet.” The site expects the AmpereOne-3 will support PCI-Express 6.0 I/O controllers and maybe have a dozen DDR5 memory controllers, although there’s some speculation here. “We have been moving pretty fast on the on the compute side,” Wittich told the site. “This design has got about a lot of other cloud features in it – things around performance management to get the most out of all of those cores. In each of the chip releases, we are going to be making what would generally be considered generational changes in the CPU core. We are adding a lot in every single generation. So you are going to see more performance, a lot more efficiency, a lot more features like security enhancements, which all happen at the microarchitecture level. But we have done a lot to ensure that you get great performance consistency across all of the AmpereOnes. We are also taking a chiplet approach with this 256-core design, which is another step as well. Chiplets are a pretty big part of our overall strategy.” The AmpereOne-3 is reportedly being etched at TSMC right now, prior to its launch next year. More from TechRadar Pro How Ampere Computing plans to ride the AI waveAmpere's new workstation could bring in a whole new dawn for developersPlucky CPU maker beats AMD and Intel to become first to offer 320 cores per server View the full article
-
- chipmakers
- cpus
- (and 5 more)
-
Nvidia's Blackwell architecture is on the horizon, set to power the RTX 50-series graphics cards. We've gathered together what we know as well as rumors of what to expect from the next generation GPUs, including potential specs and release dates. Here's everything we know about Nvidia Blackwell GPUs. View the full article
-
Once you’ve configured your Proxmox VE 8 server and NVIDIA GPU for PCI/PCIE passthrough and created a Windows 11 virtual machine (VM) on your Proxmox VE 8 server, you need to add your NVIDIA GPU to the Windows 11 Proxmox VE virtual machine. You also need to install the NVIDIA GPU drivers on the Windows 11 virtual machine and connect a monitor, a keyboard, and a mouse to use the Windows 11 Proxmox VE 8 virtual machine as a normal PC. Table of Contents Preparing Proxmox VE 8 for NVIDIA GPU Passthrough Creating a Windows 11 Virtual Machine on Proxmox VE 8 Adding Your NVIDIA GPU to the Windows 11 Proxmox VE 8 Virtual Machine (VM) Adding a Keyboard and Mouse to the Windows 11 Proxmox VE 8 Virtual Machine (VM) Checking if NVIDIA GPU Passthrough is Working on the Windows 11 Proxmox VE Virtual Machine Downloading and Installing NVIDIA GPU Drivers on the Windows 11 Proxmox VE Virtual Machine Removing the Virtual Graphics Adapter of the Windows 11 Proxmox VE Virtual Machine (VM) Conclusion Preparing Proxmox VE 8 for NVIDIA GPU Passthrough Before you can passthrough your NVIDIA GPU on Proxmox VE virtual machines (VMs), you must configure your NVIDIA GPU for PCI/PCIE passthrough on your Proxmox VE 8 server. For detailed information on how to configure NVIDIA GPUs for PCI/PCIE passthrough on Proxmox VE 8, read this article. Creating a Windows 11 Virtual Machine on Proxmox VE 8 To passthrough your NVIDIA GPU to a Windows 11 Proxmox VE 8 virtual machine, you will of course need a Windows 11 virtual machine on your Proxmox VE 8 server. For detailed information on how to create a Windows 11 virtual machine on Proxmox VE 8, read this article. Adding Your NVIDIA GPU to the Windows 11 Proxmox VE 8 Virtual Machine (VM) To add your NVIDIA GPU to a Windows 11 Proxmox VE virtual machine, open the Windows 11 virtual machine in your Proxmox VE dashboard, navigate to the Hardware section of the Windows 11 virtual machine, and click on Add > PCI Device as marked in the screenshot below. From the Add: PCI Device window, select Raw Device[1] and select your NVIDIA GPU (not the NVIDIA Audio Device of the GPU) from the Device dropdown menu[2]. Check All Functions[1], check PCI-Express[2], and click on Add[3]. Your NVIDIA GPU should be added to your selected Windows 11 Proxmox VE virtual machine (VM). Adding a Keyboard and Mouse to the Windows 11 Proxmox VE 8 Virtual Machine (VM) To use the Windows 11 Proxmox VE virtual machine as a normal PC, you need to add a keyboard and mouse to the virtual machine. First, connect a USB keyboard and a USB mouse to the USB ports of your Proxmox VE 8 server. Then, open the Windows 11 virtual machine on Proxmox VE dashboard, navigate to the Hardware section, and click on Add > USB Device. From the Add: USB Device window, select Use USB Vendor/Device ID[1] and select your mouse from the Choose Device dropdown menu[2]. Click on Add. The USB mouse should be added to your Windows 11 Proxmox VE virtual machine. In the same way, add your USB keyboard to the Windows 11 Proxmox VE virtual machine. The USB keyboard should be added to the Windows 11 Proxmox VE virtual machine. Checking if NVIDIA GPU Passthrough is Working on the Windows 11 Proxmox VE Virtual Machine To check if the NVIDIA GPU passthrough is working on the Windows 11 Proxmox VE virtual machine (VM), you need to start the Windows 11 virtual machine and see if it starts without any errors. If the NVIDIA GPU passthrough fails, the Windows 11 virtual machine won’t start. To start the Windows 11 virtual machine, open it on the Proxmox VE dashboard and click on Start. The Windows 11 virtual machine should start without any issues. If the NVIDIA GPU passthrough is successful, you will see two display adapters in the Device Manager of the Windows 11 virtual machine. NOTE: To open Device Manager on Windows 11, right-click (RMB) on the Start Menu and click on Device Manager. Downloading and Installing NVIDIA GPU Drivers on the Windows 11 Proxmox VE Virtual Machine Once you’ve added your NVIDIA GPU to the Windows 11 Proxmox VE virtual machine (VM), you need to install the NVIDIA GPU drivers on the Windows 11 virtual machine for it to work properly. The process of downloading and installing the NVIDIA GPU drivers on the Windows 11 virtual machine is the same as on a real Windows 11 PC. To download the latest version of the NVIDIA GPU drivers, visit the official NVIDIA GPU Drivers downloads page from a web browser on the Windows 11 virtual machine. Once the page loads, select your NVIDIA GPU from the Product Type, Product Series, and Product dropdown menus[1]. Then, select Windows 11 from the Operating System dropdown menu[2], select the type of driver (GRD – Game Ready Driver or SD – Studio Driver) you want to download from the Download Type dropdown menu[3], select your language from the Language dropdown menu[4], and click on Search[5]. Click on Download. Click on Download. Your browser should start downloading the NVIDIA GPU drivers installer file. It will take a while to complete. At this point, the NVIDIA GPU drivers installer file should be downloaded. Once the NVIDIA GPU drivers installer file is downloaded, you will find it in the Downloads folder of your Windows 11 virtual machine. To install the NVIDIA GPU drivers on the Windows 11 Proxmox VE virtual machine, double-click (LMB) on the NVIDIA GPU drivers installer file. Click on Yes. Click on OK. NVIDIA Drivers installer is being extracted to your computer. Once the NVIDIA Drivers installer is extracted, NVIDIA drivers installer should window be displayed. To install only the NVIDIA GPU drivers (not the GeForce Experience), select NVIDIA Graphics Driver[1] and click on AGREE AND CONTINUE[2]. Select Custom (Advanced)[1] and click on NEXT[2]. Check the Graphics Driver, HD Audio Driver, and PhysX System Software components from the list[1], check Perform a clean installation[2], and click on NEXT[3]. NVIDIA GPU drivers are being installed on the Windows 11 Proxmox VE virtual machine. It will take a while to complete. Once the NVIDIA GPU drivers installation is complete, click on CLOSE. You can confirm that the NVIDIA GPU (that you’ve added to the Windows 11 Proxmox VE virtual machine) is recognized from the Device Manager app of Windows 11. You can also confirm that your NVIDIA GPU is working correctly (on the Windows 11 Proxmox VE virtual machine) from the Performance section of the Task Manager app of Windows 11. NOTE: The Task Manager app can be opened on Windows 11 using the keyboard shortcut <Ctrl> + <Shift> + <Esc>. You can also right-click (RMB) on the start menu and click on Task Manager to open the Task Manager app on Windows 11. For more information on opening the Task Manager app on Windows 10/11, read this article. Removing the Virtual Graphics Adapter of the Windows 11 Proxmox VE Virtual Machine (VM) Once the NVIDIA GPU is added to the Windows 11 Proxmox VE virtual machine and the NVIDIA GPU drivers is installed on the Windows 11 virtual machine, you can remove the virtual graphics adapter of the Windows 11 virtual machine so that you only get video output on the monitor/monitors directly connected to your NVIDIA GPU and get the best performance from the NVIDIA GPU, just like your real computer. You can use it like you do your real Windows PC, you won’t notice any difference. To remove the virtual graphics adapter from the Windows 11 Proxmox VE virtual machine (VM), first, click on Shutdown from the top-right corner of the Proxmox VE dashboard to shutdown the Windows 11 virtual machine. Click on Yes. Once the Windows 11 virtual machine is shut down, navigate to the Hardware section, select Display, and click on Edit. Select none from the Graphic card dropdown menu[1] and click on OK[2]. The virtual graphics adapter should be removed from the Windows 11 Proxmox VE virtual machine (VM). As you can see, the screen of the Windows 11 Proxmox VE virtual machine (VM) is displayed on the monitor connected to the NVIDIA GPU via HDMI cable, just like a real computer. The virtual graphics adapter is removed from the Windows 11 virtual machine and only the NVIDIA GPU of the virtual machine is used for displaying the screen of the virtual machine. I am running the UNIGINE Heaven benchmark on the Windows 11 Proxmox VE virtual machine and I am getting good framerates as you can see in the screenshot below. Conclusion In this article, I have shown you how to passthrough an NVIDIA GPU, a keyboard, and a mouse to a Windows 11 Proxmox VE 8 virtual machine. I have also shown you how to install the NVIDIA GPU drivers on the Windows 11 Proxmox VE virtual machine and configure it to work just like any other Windows PC. View the full article
-
Nvidia has unveiled new GPUs that it says will be able to bring the power of generative AI to a wider audience than ever before. The new Nvidia RTX A400 and A1000 GPUs will give creatives and professionals alike access to some of the most useful AI tools in their fields, without demanding huge amounts of computing power and resources as is currently the case. Built on the company's Ampere architecture, the new GPUs will bring tools such as real-time ray tracing to a wider array of desktops and workstations, allowing generative AI tools to reach a bigger audience. AI for all "AI integration across design and productivity applications is becoming the new standard, fueling demand for advanced computing performance," Nvidia's senior product marketing manager for enterprise platforms Stacy Ozorio noted in a blog post announcing the launch. "This means professionals and creatives will need to tap into increased compute power, regardless of the scale, complexity or scope of their projects." The RTX A400 includes 24 Tensor Cores for AI processing, taking it far beyond traditional CPU-based machines, which Nvidia says allows for running cutting-edge AI services such as chatbots and copilots directly on the desktop. In a first for the RTX 400 series, the A400 also includes four display outputs, making it a good fit in industries such as retail, transportation and financial services, which can benefit from high-density display environments showing off detailed 3D renders. The A1000 is the first in the RTX 1000 series to bring Tensor Cores and RT Cores to users, allowing them to utilize ray-tracing performance and accelerated AI tools, while boasting a sleek, single-slot design that consumes just 50W of power. With the power of 72 Tensor Cores, it offers 3x faster generative AI processing for tools like Stable Diffusion over the previous generation, as well as faster video processing, with its 18 RT Cores speeding up graphics and rendering tasks by up to 3x, making it ideal for tasks such as 4K video editing, CAD and architectural designs. "These new GPUs empower users with cutting-edge AI, graphics and compute capabilities to boost productivity and unlock creative possibilities," Ozorio added. "Advanced workflows involving ray-traced renders and AI are now within reach, allowing professionals to push the boundaries of their work and achieve stunning levels of realism." The A1000 GPU is available now, with the A400 set to go on sale later in the summer of 2024. More from TechRadar Pro Nvidia GTC 2024 — all the updates as it happenedNvidia says its new Blackwell is set to power the next generation of AIWe've also rounded up the best mobile workstations around View the full article
-
Proxmox VE 8 is one of the best open-source and free Type-I hypervisors out there for running QEMU/KVM virtual machines (VMs) and LXC containers. It has a nice web management interface and a lot of features. One of the most amazing features of Proxmox VE is that it can passthrough PCI/PCIE devices (i.e. an NVIDIA GPU) from your computer to Proxmox VE virtual machines (VMs). The PCI/PCIE passthrough is getting better and better with newer Proxmox VE releases. At the time of this writing, the latest version of Proxmox VE is Proxmox VE v8.1 and it has great PCI/PCIE passthrough support. In this article, I am going to show you how to configure your Proxmox VE 8 host/server for PCI/PCIE passthrough and configure your NVIDIA GPU for PCIE passthrough on Proxmox VE 8 virtual machines (VMs). Table of Contents Enabling Virtualization from the BIOS/UEFI Firmware of Your Motherboard Installing Proxmox VE 8 Enabling Proxmox VE 8 Community Repositories Installing Updates on Proxmox VE 8 Enabling IOMMU from the BIOS/UEFI Firmware of Your Motherboard Enabling IOMMU on Proxmox VE 8 Verifying if IOMMU is Enabled on Proxmox VE 8 Loading VFIO Kernel Modules on Proxmox VE 8 Listing IOMMU Groups on Proxmox VE 8 Checking if Your NVIDIA GPU Can Be Passthrough to a Proxmox VE 8 Virtual Machine (VM) Checking for the Kernel Modules to Blacklist for PCI/PCIE Passthrough on Proxmox VE 8 Blacklisting Required Kernel Modules for PCI/PCIE Passthrough on Proxmox VE 8 Configuring Your NVIDIA GPU to Use the VFIO Kernel Module on Proxmox VE 8 Passthrough the NVIDIA GPU to a Proxmox VE 8 Virtual Machine (VM) Still Having Problems with PCI/PCIE Passthrough on Proxmox VE 8 Virtual Machines (VMs)? Conclusion References Enabling Virtualization from the BIOS/UEFI Firmware of Your Motherboard Before you can install Proxmox VE 8 on your computer/server, you must enable the hardware virtualization feature of your processor from the BIOS/UEFI firmware of your motherboard. The process is different for different motherboards. So, if you need any assistance in enabling hardware virtualization on your motherboard, read this article. Installing Proxmox VE 8 Proxmox VE 8 is free to download, install, and use. Before you get started, make sure to install Proxmox VE 8 on your computer. If you need any assistance on that, read this article. Enabling Proxmox VE 8 Community Repositories Once you have Proxmox VE 8 installed on your computer/server, make sure to enable the Proxmox VE 8 community package repositories. By default, Proxmox VE 8 enterprise package repositories are enabled and you won’t be able to get/install updates and bug fixes from the enterprise repositories unless you have bought Proxmox VE 8 enterprise licenses. So, if you want to use Proxmox VE 8 for free, make sure to enable the Proxmox VE 8 community package repositories to get the latest updates and bug fixes from Proxmox for free. Installing Updates on Proxmox VE 8 Once you’ve enabled the Proxmox VE 8 community package repositories, make sure to install all the available updates on your Proxmox VE 8 server. Enabling IOMMU from the BIOS/UEFI Firmware of Your Motherboard The IOMMU configuration is found in different locations in different motherboards. To enable IOMMU on your motherboard, read this article. Enabling IOMMU on Proxmox VE 8 Once the IOMMU is enabled on the hardware side, you also need to enable IOMMU from the software side (from Proxmox VE 8). To enable IOMMU from Proxmox VE 8, you have the add the following kernel boot parameters: Processor Vendor Kernel boot parameters to add Intel intel_iommu=on, iommu=pt AMD iommu=pt To modify the kernel boot parameters of Proxmox VE 8, open the /etc/default/grub file with the nano text editor as follows: $ nano /etc/default/grub At the end of the GRUB_CMDLINE_LINUX_DEFAULT, add the required kernel boot parameters for enabling IOMMU depending on the processor you’re using. As I am using an AMD processor, I have added only the kernel boot parameter iommu=pt at the end of the GRUB_CMDLINE_LINUX_DEFAULT line in the /etc/default/grub file. Once you’re done, press <Ctrl> + X followed by Y and <Enter> to save the /etc/default/grub file. Now, update the GRUB boot configurations with the following command: $ update-grub2 Once the GRUB boot configurations are updated, click on Reboot to restart your Proxmox VE 8 server for the changes to take effect. Verifying if IOMMU is Enabled on Proxmox VE 8 To verify whether IOMMU is enabled on Proxmox VE 8, run the following command: $ dmesg | grep -e DMAR -e IOMMU If IOMMU is enabled, you will see some outputs confirming that IOMMU is enabled. If IOMMU is not enabled, you may not see any outputs. You also need to have the IOMMU Interrupt Remapping enabled for PCI/PCIE passthrough to work. To check if IOMMU Interrupt Remapping is enabled on your Proxmox VE 8 server, run the following command: $ dmesg | grep 'remapping' As you can see, IOMMU Interrupt Remapping is enabled on my Proxmox VE 8 server. NOTE: Most modern AMD and Intel processors will have IOMMU Interrupt Remapping enabled. If for any reason, you don’t have IOMMU Interrupt Remapping enabled, there’s a workaround. You have to enable Unsafe Interrupts for VFIO. Read this article for more information on enabling Unsafe Interrupts on your Proxmox VE 8 server. Loading VFIO Kernel Modules on Proxmox VE 8 The PCI/PCIE passthrough is done mainly by the VFIO (Virtual Function I/O) kernel modules on Proxmox VE 8. The VFIO kernel modules are not loaded at boot time by default on Proxmox VE 8. But, it’s easy to load the VFIO kernel modules at boot time on Proxmox VE 8. First, open the /etc/modules-load.d/vfio.conf file with the nano text editor as follows: $ nano /etc/modules-load.d/vfio.conf Type in the following lines in the /etc/modules-load.d/vfio.conf file. vfio vfio_iommu_type1 vfio_pci Once you’re done, press <Ctrl> + X followed by Y and <Enter> to save the changes. Now, update the initramfs of your Proxmox VE 8 installation with the following command: $ update-initramfs -u -k all Once the initramfs is updated, click on Reboot to restart your Proxmox VE 8 server for the changes to take effect. Once your Proxmox VE 8 server boots, you should see that all the required VFIO kernel modules are loaded. $ lsmod | grep vfio Listing IOMMU Groups on Proxmox VE 8 To passthrough PCI/PCIE devices on Proxmox VE 8 virtual machines (VMs), you will need to check the IOMMU groups of your PCI/PCIE devices quite frequently. To make checking for IOMMU groups easier, I decided to write a shell script (I got it from GitHub, but I can’t remember the name of the original poster) in the path /usr/local/bin/print-iommu-groups so that I can just run print-iommu-groups command and it will print the IOMMU groups on the Proxmox VE 8 shell. First, create a new file print-iommu-groups in the path /usr/local/bin and open it with the nano text editor as follows: $ nano /usr/local/bin/print-iommu-groups Type in the following lines in the print-iommu-groups file: #!/bin/bash shopt -s nullglob for g in `find /sys/kernel/iommu_groups/* -maxdepth 0 -type d | sort -V`; do echo "IOMMU Group ${g##*/}:" for d in $g/devices/*; do echo -e "\t$(lspci -nns ${d##*/})" done; done; Once you’re done, press <Ctrl> + X followed by Y and <Enter> to save the changes to the print-iommu-groups file. Make the print-iommu-groups script file executable with the following command: $ chmod +x /usr/local/bin/print-iommu-groups Now, you can run the print-iommu-groups command as follows to print the IOMMU groups of the PCI/PCIE devices installed on your Proxmox VE 8 server: $ print-iommu-groups As you can see, the IOMMU groups of the PCI/PCIE devices installed on my Proxmox VE 8 server are printed. Checking if Your NVIDIA GPU Can Be Passthrough to a Proxmox VE 8 Virtual Machine (VM) To passthrough a PCI/PCIE device to a Proxmox VE 8 virtual machine (VM), it must be in its own IOMMU group. If 2 or more PCI/PCIE devices share an IOMMU group, you can’t passthrough any of the PCI/PCIE devices of that IOMMU group to any Proxmox VE 8 virtual machines (VMs). So, if your NVIDIA GPU and its audio device are on its own IOMMU group, you can passthrough the NVIDIA GPU to any Proxmox VE 8 virtual machines (VMs). On my Proxmox VE 8 server, I am using an MSI X570 ACE motherboard paired with a Ryzen 3900X processor and Gigabyte RTX 4070 NVIDIA GPU. According to the IOMMU groups of my system, I can passthrough the NVIDIA RTX 4070 GPU (IOMMU Group 21), RTL8125 2.5Gbe Ethernet Controller (IOMMU Group 20), Intel I211 Gigabit Ethernet Controller (IOMMU Group 19), a USB 3.0 controller (IOMMU Group 24), and the Onboard HD Audio Controller (IOMMU Group 25). $ print-iommu-groups As the main focus of this article is configuring Proxmox VE 8 for passing through the NVIDIA GPU to Proxmox VE 8 virtual machines, the NVIDIA GPU and its Audio device must be in its own IOMMU group. Checking for the Kernel Modules to Blacklist for PCI/PCIE Passthrough on Proxmox VE 8 To passthrough a PCI/PCIE device on a Proxmox VE 8 virtual machine (VM), you must make sure that Proxmox VE forces it to use the VFIO kernel module instead of its original kernel module. To find out the kernel module your PCI/PCIE devices are using, you will need to know the vendor ID and device ID of these PCI/PCIE devices. You can find the vendor ID and device ID of the PCI/PCIE devices using the print-iommu-groups command. $ print-iommu-groups For example, the vendor ID and device ID of my NVIDIA RTX 4070 GPU is 10de:2786 and it’s audio device is 10de:22bc. To find the kernel module a PCI/PCIE device 10de:2786 (my NVIDIA RTX 4070 GPU) is using, run the lspci command as follows: $ lspci -v -d 10de:2786 As you can see, my NVIDIA RTX 4070 GPU is using the nvidiafb and nouveau kernel modules by default. So, they can’t be passed to a Proxmox VE 8 virtual machine (VM) at this point. The Audio device of my NVIDIA RTX 4070 GPU is using the snd_hda_intel kernel module. So, it can’t be passed on a Proxmox VE 8 virtual machine at this point either. $ lspci -v -d 10de:22bc So, to passthrough my NVIDIA RTX 4070 GPU and its audio device on a Proxmox VE 8 virtual machine (VM), I must blacklist the nvidiafb, nouveau, and snd_hda_intel kernel modules and configure my NVIDIA RTX 4070 GPU and its audio device to use the vfio-pci kernel module. Blacklisting Required Kernel Modules for PCI/PCIE Passthrough on Proxmox VE 8 To blacklist kernel modules on Proxmox VE 8, open the /etc/modprobe.d/blacklist.conf file with the nano text editor as follows: $ nano /etc/modprobe.d/blacklist.conf To blacklist the kernel modules nouveau, nvidiafb, and snd_hda_intel kernel modules (to passthrough NVIDIA GPU), add the following lines in the /etc/modprobe.d/blacklist.conf file: blacklist nouveau blacklist nvidiafb blacklist snd_hda_intel Once you’re done, press <Ctrl> + X followed by Y and <Enter> to save the /etc/modprobe.d/blacklist.conf file. Configuring Your NVIDIA GPU to Use the VFIO Kernel Module on Proxmox VE 8 To configure the PCI/PCIE device (i.e. your NVIDIA GPU) to use the VFIO kernel module, you need to know their vendor ID and device ID. In this case, the vendor ID and device ID of my NVIDIA RTX 4070 GPU and its audio device are 10de:2786 and 10de:22bc. To configure your NVIDIA GPU to use the VFIO kernel module, open the /etc/modprobe.d/vfio.conf file with the nano text editor as follows: $ nano /etc/modprobe.d/vfio.conf To configure your NVIDIA GPU and its audio device with the <vendor-id>:<device-id> 10de:2786 and 10de:22bc (let’s say) respectively to use the VFIO kernel module, add the following line to the /etc/modprobe.d/vfio.conf file. options vfio-pci ids=10de:2786,10de:22bc Once you’re done, press <Ctrl> + X followed by Y and <Enter> to save the /etc/modprobe.d/vfio.conf file. Now, update the initramfs of Proxmove VE 8 with the following command: $ update-initramfs -u -k all Once initramfs is updated, click on Reboot to restart your Proxmox VE 8 server for the changes to take effect. Once your Proxmox VE 8 server boots, you should see that your NVIDIA GPU and its audio device (10de:2786 and 10de:22bc in my case) are using the vfio-pci kernel module. Now, your NVIDIA GPU is ready to be passed to a Proxmox VE 8 virtual machine. $ lspci -v -d 10de:2786 $ lspci -v -d 10de:22bc Passthrough the NVIDIA GPU to a Proxmox VE 8 Virtual Machine (VM) Now that your NVIDIA GPU is ready for passthrough on Proxmox VE 8 virtual machines (VMs), you can passthrough your NVIDIA GPU on your desired Proxmox VE 8 virtual machine and install the NVIDIA GPU drivers depending on the operating system that you’re using on that virtual machine as usual. For detailed information on how to passthrough your NVIDIA GPU on a Proxmox VE 8 virtual machine (VM) with different operating systems installed, read one of the following articles: How to Passthrough an NVIDIA GPU to a Windows 11 Proxmox VE 8 Virtual Machine (VM) How to Passthrough an NVIDIA GPU to a Ubuntu 24.04 LTS Proxmox VE 8 Virtual Machine (VM) How to Passthrough an NVIDIA GPU to a LinuxMint 21 Proxmox VE 8 Virtual Machine (VM) How to Passthrough an NVIDIA GPU to a Debian 12 Proxmox VE 8 Virtual Machine (VM) How to Passthrough an NVIDIA GPU to an Elementary OS 8 Proxmox VE 8 Virtual Machine (VM) How to Passthrough an NVIDIA GPU to a Fedora 39+ Proxmox VE 8 Virtual Machine (VM) How to Passthrough an NVIDIA GPU on an Arch Linux Proxmox VE 8 Virtual Machine (VM) How to Passthrough an NVIDIA GPU on a Red Hat Enterprise Linux 9 (RHEL 9) Proxmox VE 8 Virtual Machine (VM) Still Having Problems with PCI/PCIE Passthrough on Proxmox VE 8 Virtual Machines (VMs)? Even after trying everything listed in this article correctly, if PCI/PCIE passthrough still does not work for you, be sure to try out some of the Proxmox VE PCI/PCIE passthrough tricks and/or workarounds that you can use to get PCI/PCIE passthrough work on your hardware. Conclusion In this article, I have shown you how to configure your Proxmox VE 8 server for PCI/PCIE passthrough so that you can passthrough PCI/PCIE devices (i.e. your NVIDIA GPU) to your Proxmox VE 8 virtual machines (VMs). I have also shown you how to find out the kernel modules that you need to blacklist and how to blacklist them for a successful passthrough of your desired PCI/PCIE devices (i.e. your NVIDIA GPU) to a Proxmox VE 8 virtual machine. Finally, I have shown you how to configure your desired PCI/PCIE devices (i.e. your NVIDIA GPU) to use the VFIO kernel modules, which is also an essential step for a successful passthrough of your desired PCI/PCIE devices (i.e. your NVIDIA GPU) to a Proxmox VE 8 virtual machine (VM). References PCI(e) Passthrough – Proxmox VE PCI Passthrough – Proxmox VE The ultimate gaming virtual machine on proxmox – YouTube View the full article
-
- proxmox ve
- pci
-
(and 3 more)
Tagged with:
-
Microsoft and OpenAI are reportedly in the process of planning a groundbreaking data center project which would include an AI supercomputer named "Stargate”. A report by Anissa Gardizy and Amir Efrati in The Information claims the goal of the project, which would be financed by Microsoft to the tune of over $100 billion, and which reportedly has a launch date set for 2028, is to reduce the two companies' reliance on Nvidia, something that a lot of the tech giants involved in AI are increasingly looking to try to do. Microsoft and OpenAI’s plan reportedly involves five phases, with Stargate being the fifth and most ambitious one. The data center will be the supercomputer The cost of the project is attributed to the age-old “sources familiar with the plans" (The Information says these are “a person who spoke to OpenAI CEO Sam Altman about it and a person who has viewed some of Microsoft’s initial cost estimates”), but neither Microsoft nor OpenAI have yet commented on the specifics of the project. The new data center project is expected to push the boundaries of AI capability and could potentially exceed $115 billion in expenses. This is more than triple the amount Microsoft spent on capital expenditures for servers and equipment last year. Microsoft is currently working on a smaller, fourth-phase supercomputer for OpenAI that is expected to launch around 2026, The Information claims. Shedding more light on the report, The Next Platform says, “The first thing to note about the rumored “Stargate” system that Microsoft is planning to build to support the computational needs of its large language model partner, OpenAI, is that the people doing the talking – reportedly OpenAI chief executive officer Sam Altman – are talking about a data center, not a supercomputer. And that is because the data center – and perhaps multiple data centers within a region with perhaps as many as 1 million XPU computational devices – will be the supercomputer.” The Next Platform also says if Stargate does come to fruition it will be “based on future generations of Cobalt Arm server processors and Maia XPUs, with Ethernet scaling to hundreds of thousands to 1 million XPUs in a single machine,” and it definitely won't be based on Nvidia GPUs and interconnects, which seems like a safe bet if the rumors are to be believed. More from TechRadar Pro OpenAI looking to build an AI chip empire worth more than AMD, Nvidia and intelOpenAI says 2024 is the "year of the enterprise" when it comes to AIMicrosoft announces new AI hub in London View the full article
-
At Intel Vision 2024, Intel launched its Gaudi 3 AI accelerator which the company is positioning as a direct competitor to Nvidia's H100, claiming that it offers faster training and inference performance on leading GenAI models. The Gaudi 3 is projected to outperform the H100 by up to 50% in various tasks, including training time, inference throughput, and power efficiency. Building on the performance and efficiency of the Gaudi 2 AI accelerator, Gaudi 3 reportedly boasts 4x AI compute for BF16, a 1.5x increase in memory bandwidth, and 2x networking bandwidth for massive system scale out, compared with its predecessor. Superior performance Manufactured on a 5nm process, Gaudi 3 features 64 AI-custom and programmable TPCs and eight MMEs capable of 64,000 parallel operations. It offers 128GB of memory (HBM2e not HBM3E), 3.7TB of memory bandwidth, and 96MB of on-board SRAM for processing large datasets efficiently. With 24 integrated 200Gb Ethernet ports, it allows for flexible system scaling and open-standard networking. Intel claims Gaudi 3 is superior to H100 across various models, including 50% faster training time on Llama 7B and 13B parameters, as well as GPT-3 175B models. Additionally, there is a 50% increase in inference throughput and 40% greater power efficiency on Llama 7B and 70B parameters, and Falcon 180B models. Intel says Gaudi 3 also outperforms H200 in inferencing speed on Llama 7B and 70B parameters, and Falcon 180B parameter models by 30%. As these are Intel benchmarks, feel free to take them with a pinch of salt. Tom's Hardware notes, "At the end of the day, the key to dominating today’s AI training and inference workloads resides in the ability to scale accelerators out into larger clusters. Intel’s Gaudi takes a different approach than Nvidia’s looming B200 NVL72 systems, using fast 200 Gbps Ethernet connections between the Gaudi 3 accelerators and pairing the servers with leaf and spine switches to create clusters." Justin Hotard, Intel executive vice president and general manager of the Data Center and AI Group, said, “In the ever-evolving landscape of the AI market, a significant gap persists in the current offerings. Feedback from our customers and the broader market underscores a desire for increased choice. Enterprises weigh considerations such as availability, scalability, performance, cost, and energy efficiency. Intel Gaudi 3 stands out as the GenAI alternative presenting a compelling combination of price performance, system scalability, and time-to-value advantage.” Gaudi 3 will be available to OEMs in the second quarter of 2024, with general availability expected in the third quarter. (Image credit: Intel) More from TechRadar Pro Intel bets on a secret weapon to beat AMD in some AI workloadsIntel reveals new spin-off company to bring generative AI learning to businessesNvidia's fastest AI chip ever could cost a rather reasonable $40,000 View the full article
-
There’s no shortage of startups pushing technology that could one day prove pivotal in AI computing and memory infrastructure. Celestial AI, which recently secured $175 million in Series C funding, is looking to commercialize its Photonic Fabric technology which aims to redefine optical interconnects. Celestial AI's foundational technology is designed to disaggregate AI compute from memory to offer a “transformative leap in AI system performance that is ten years more advanced than existing technologies.” Lower energy overhead and latency The company has reportedly been in talks with several hyperscale customers and a major processor manufacturer, about integrating its technology. Though specific details remain under wraps, that manufacturer is quite likely to be AMD since AMD Ventures is one of Photonic Fabric's backers. As reported by The Next Platform, the core of Celestial AI's strategy lies in its chiplets, interposers, and optical interconnect technology. By combining DDR5 and HBM memory, the company aims to significantly reduce power consumption while maintaining high performance levels. The chiplets can be used for additional memory capacity or as interconnects between chips, offering speeds comparable to NVLink or Infinity Fabric. “The surge in demand for our Photonic Fabric is the product of having the right technology, the right team and the right customer engagement model”, said Dave Lazovsky, Co-Founder and CEO of Celestial AI. “We are experiencing broad customer adoption resulting from our full-stack technology offerings, providing electrical-optical-electrical links that deliver data at the bandwidth, latency, bit error rate (BER) and power required, compatible with the logical protocols of our customer’s AI accelerators and GPUs. Deep strategic collaborations with hyperscale data center customers focused on optimizing system-level Accelerated Computing architectures are a prerequisite for these solutions. We’re excited to be working with the giants of our industry to propel commercialization of the Photonic Fabric.” While Celestial AI faces challenges in timing and competition from other startups in the silicon photonics space, the potential impact of its technology on the AI processing landscape makes it a promising contender. As the industry moves towards co-packaged optics and silicon photonic interposers, Celestial AI's Photonic Fabric could play a key role in shaping the future of AI computing. More from TechRadar Pro GPU prices could spike again as rumors indicate AMD wants to prioritize AINvidia's fastest AI chip ever could cost a rather reasonable $40,000Nvidia plans to compete in $30 billion custom chip market View the full article
-
- nvidia
- celestial ai
-
(and 3 more)
Tagged with:
-
Nvidia’s next generation of graphics cards, reportedly known as Nvidia Blackwell, has been the subject of plenty of rumors since at least 2023. The most recent one seems to confirm that the RTX 5080 and RTX 5090 will be launching in Q4 2024, according to Twitter leaker Kopite7kimi and reported on by PC Gamer. While finally getting a hold of what could easily be the best graphics cards sounds like great news to some, finally getting a sign of the cards’ imminent release, they’re honestly coming out way too soon. There have been reports that the Nvidia RTX 5080 would have better ray tracing capabilities and pricing than the 4080 and that the RTX 5090 would be up to 70% faster than the 4090, which sounds great on paper. But when you consider how already powerful the 4000 series of cards are, it seems a bit pointless. It’s wonderful that ray tracing, speed, and performance might be improved, but to what benefit? We aren't even taking full advantage of current gen cards, so what's the value of buying a Nvidia 5000-series GPU later this year? PC games haven’t even reached the point that they truly challenge the mid-range and high-end 3000-series of graphics cards (hell, an argument can be made that they can’t fully challenge the RTX 2070 Super in ways that truly matter), let alone the current gen. One could argue that the only reason why the current gen was so necessary was thanks to the crypto mining craze completely ruining the 3000-series market and making it nigh impossible to get your hands on one for a remotely affordable price. And right now, the 4000-series is quite excellent performance-wise, as it’s able to handle ray tracing and other advanced effects and tools like no other. The RTX 4090 in particular is nearly perfect in what it can offer, and the lower cards are still absolute performance beasts. This isn’t even mentioning the Super series refreshes, which added even more oomph to your favorite graphics cards while keeping the prices either the same or lowering them. There’s also the fact that some cards, like the RTX 4070, didn’t sell nearly as well as Nvidia wanted and in fact were rumored to be a “disaster.” While that doesn’t reflect the sales numbers for the rest of the graphics cards, it’s not a good look to see the base versions of your mid-range cards doing poorly. And while the RTX 4080 Super seems to be out of stock in many retailers, that could just as well be due to low stock in the first place. With all these issues cropping up, releasing RTX 5080 and 5090 doesn’t seem to be such a smart move on Nvidia’s part. Though the Q4 2024 launch date is still just a rumor, it would be wise for the tech giant to release these cards in 2025 at the earliest to give its 4000-series some proper breathing room. You might also like Nvidia RTX 5080 leak gets some PC gamers worried that Team Green hasn't learnedThe Nvidia RTX 5090 is rumored to be nearly twice as fast as RTX 4090Nvidia GeForce RTX 5090 could be up to 70% faster than the 4090 View the full article
-
-
- nanotronics
- nvidia
-
(and 2 more)
Tagged with:
-
New figures have claimed Intel CEO Pat Gelsinger saw a significant increase in his earnings in 2023, but it was still only just over half of what Lisa Su, CEO of rival company AMD, made. According to MarketWatch, Gelsinger's total compensation in 2023 was $16.86 million, a 45% increase from his 2022 earnings of $11.61 million, as revealed in Intel's 2024 proxy statement. In contrast, Su's total compensation for 2023 was $30.35 million, barely above her 2022 compensation of $30.22 million. The increase in Gelsinger's pay came despite an 18% drop in his salary to $1.07 million. However, the value of his stock awards increased to $12.43 million, and his non-equity incentive plan compensation jumped to $2.89 million. On the other hand, Su's salary was $1.2 million, and she received $21.85 million in stock awards and $5.84 million in options. What about Nvidia's CEO? Intel's stock soared 90.1% in 2023, while AMD's stock shot up 127.6%. However, as MarketWatch points out, since Gelsinger became CEO, Intel's stock has tumbled 28.5%, while AMD shares have charged up 92.5%. Interestingly, Gelsinger's 2023 compensation was significantly lower than his 2021 total compensation of $178.59 million, which included "new-hire equity awards of a significant magnitude" of $140.43 million. This was done to ensure Intel could hire "the best leader possible." Meanwhile, Nvidia CEO Jensen Huang's total compensation for the fiscal year that ended in January 2023 was $21.36 million, down from $23.74 million the previous year. With Nvidia's stock skyrocketing by 238.9% in 2023, it will be interesting to see his compensation for the latest fiscal year. As Tom's Hardware points out, "Impressive stock gains have already propelled the leader of the green team into becoming the world's 21st richest person." More from TechRadar Pro First reviews of AMD's fastest ever processor reveal a frightening prospectAMD CEO offers to help AI Startup that uses Radeon 7900 XTX GPUsIntel ushers in the enterprise AI PC era View the full article
-
Introduction Quora is a leading Q&A platform with a mission to share and grow the world’s knowledge, serving hundreds of millions of users worldwide every month. Quora uses machine learning (ML) to generate a custom feed of questions, answers, and content recommendations based on each user’s activity, interests, and preferences. ML drives targeted advertising on the platform, where advertisers use Quora’s vast user data and sophisticated targeting capabilities to deliver highly personalized ads to the audience. Moreover, ML plays a pivotal role in maintaining high-quality content for users by effectively filtering spam and moderating content. Quora launched Poe, a generative artificial intelligence (AI) based chatbot app by leveraging different Large Language Models (LLMs) and offering fast and accurate responses. Poe aims to simplify the user experience and provide continuous back-and-forth dialogue while integrating with the major LLMs and other generative AI models. Quora successfully modernized its model serving with NVIDIA Triton Inference Server (Triton) on Amazon Elastic Kubernetes Service (Amazon EKS). This move enabled a small team of ML engineers to manage, operate, and enhance model serving efficiently. This post delves into the design decisions, benefits of running NVIDIA Triton Server on Amazon EKS, and how Quora reduced model serving latency by three times and model serving cost by 23%. Previous model serving architecture Quora was running its model serving in hybrid mode where around half of the models were hosted on TensorFlow Serving (TFS), and the other half were hosted on a Custom Python Engine. The Custom Python Engine supported different model frameworks, such as PyTorch, XGBoost, Microsoft LightGBM, sklearn, whereas TFS was used only for the TensorFlow model framework. Figure 1: Previous model serving architecture Challenges with previous model serving architecture Custom Python Engine uses Apache Thrift, whereas TFS uses gRPC framework. Maintaining different frameworks for implementing and managing remote procedure calls (RPC) in model serving architecture added significant complexity. The existing system faced challenges with using GPUs effectively for serving, which led to unnecessary resource waste and increased costs. Furthermore, both had limited support for GPU optimization techniques that restrict model performance and efficiency. There was a pressing need at Quora to serve Recommendation models with large embeddings on GPUs instead of CPU to improve cost. Limitations of Custom Python Engine Performance: Models deployed on Custom Python Engine, which used Apache Thrift for RPC communication, encountered high latency issues that impact model performance. On certain occasions, response time could soar up to 1500 milliseconds (ms), in stark contrast to the anticipated latency of 50 ms. Service Mesh Integration: Quora uses Istio service mesh. gRPC natively supports HTTP2 and integrates seamlessly with service mesh technologies, which provide ease of support for features such as traffic mirroring and rate limiting. Apache Thrift does not support HTTP2 and is not natively integrated with Istio Service mesh. High Traffic management: Custom Python Engine models faced challenges in handling high-traffic scenarios due to limitations in its client-side rate limiting mechanism. gRPC integrates seamlessly with server-side mesh-based rate limiting solutions, providing a much more robust and scalable solution to manage surges in traffic and maintain system stability. This method has been particularly effective for making sure of smooth operation during spikes in queries per second (QPS). The significant disparity in response times across different models underscores the need for an optimized solution to enhance overall model serving performance and to meet specific latency and throughput requirements, particularly in critical use cases such as ads ranking and user feed. Quora was looking for a new model serving solution that resolves the preceding challenges, and also supports multi-ML frameworks such as ONNX, and TensorRT. Solution overview Overview of NVIDIA Triton Inference Server NVIDIA Triton Inference Server is an open-source software solution purpose-built for serving ML models. It optimizes the deployment of models in production by maximizing hardware use, supporting multiple frameworks, and providing a range of flexible serving options. Why did Quora select NVIDIA Triton Inference Server on Amazon EKS? To improve performance and optimize the cost of its model serving, Quora investigated various software and hardware, aiming to reduce latency and increase model throughput. Quora eventually selected NVIDIA Triton Inference Server due to its potential to meet the challenges in its model serving infrastructure. Triton is designed to effectively utilize GPUs for serving a wide variety of models, and flexible deployment options made it an optimal choice for modernizing Quora’s model serving. The reasons for choosing Triton include: Multi-ML frameworks: Triton supports multiple ML frameworks, such as, TensorFlow, PyTorch, ONNX, TensorRT, OpenVINO, HugeCTR, FIL (Forest Inference Library). The broad framework support facilitates the migration of all models from current custom Python engines to Triton. HTTP/GRPC endpoints: Triton provides HTTP/GRPC endpoints for model serving, which simplifies integration with Quora’s existing Istio service mesh. High performance: Triton quickly and efficiently processes requests, making it perfect for applications requiring low latency. It includes essential features such as rate limiting status, health checks, dynamic batching, and concurrent model execution capabilities. Scalability: It can easily scale up to handle large workloads and is designed to handle multiple models and data sources. Additionally, it supports a wide range of hardware (such as GPUs and CPUs), multi-node deployment, model versioning, and ensemble models handling. This makes it easy to deploy models on different hardware configurations. Managed observability: Integration with Prometheus and Grafana for metrics, tools that are already in use at Quora for monitoring ML systems. Recommendation models serving on GPUs: The NVIDIA Merlin HugeCTR (Huge Click-Through-Rate) is a GPU-accelerated deep neural network (DNN) training and inference framework designed for efficiently serving Recommendation models with large embeddings on NVIDIA GPUs. Auto-tuning tools for model optimization: Model Analyzer: Assesses runtime performance and suggests optimized configurations (batch size, instance group, CPU, memory, etc.) Model Navigator: Automates the transition of models from source to optimal format and configuration for Triton deployment Walkthrough The following walkthrough guides you through this solution. Architecture of running NVIDIA Triton server on Amazon EKS Quora chose gRPC as the standard client communication framework and Triton as the model serving for all ML models. There is a separate namespace for training and model serving in the Amazon EKS cluster. Within the model serving, separate node groups are used for the CPU-based models and the GPU-based models. Quora decided to move all new ML models on the following architecture: Figure 2: Modernized model serving Migration to NVIDIA Triton Server on Amazon EKS The existing ML model serving architecture was designed to accommodate multiple ML Serving engines, such as Custom Python Engine and TFS. The following steps are performed to add Triton Server into model serving architecture and migrate GPU models to Triton: Generate stubs for gRPC service: Quora chose to use the gRPC framework with Triton. To generate the stubs necessary for RPC communication between the server and client sides, we followed HTTP/REST and GRPC Protocol and used Triton’s protobuf specification to generate these stubs. Setup NVIDIA Triton on Amazon EKS as the serving server Customize the base image of NVIDIA with ONXX framework: NVIDIA provides pre-built Docker containers for the NVIDIA Triton Inference Server, which are available in their NGC Catalog. However, to tailor the Triton container to our specific environment, we followed the instructions detailed in Triton’s customization guide. This process included selecting the particular framework that our environment needs (for example, ONNX) and installing any additional libraries required by our models. To accommodate a variety of our models based on different frameworks, we built multiple Triton packages. Add Triton-specific model configurations: Triton requires specific configuration details, such as the model’s name, version, and procedures for preprocessing inputs and post-processing outputs. Triton is added as the third engine in the model serving architecture to incorporate Triton specific settings within the existing model configuration. These configurations are serialized into the pbtxt file, which serves as the required model configuration in the model repository for Triton deployment. Prepare the model to deploy on Triton: We took an existing PyTorch model and converted that to the ONNX format and uploaded it to an Amazon Simple Storage Service (Amazon S3) model repository. We used MLFlow model registry for model versioning and incorporated Triton packages into our Continuous Integration/Continuous Deployment (CI/CD) pipeline. With these steps, we successfully integrated the NVIDIA Triton Inference Server into the model serving architecture. Migrate models to NVIDIA Triton Server: In the initial phase, we successfully migrated four PyTorch models, running on Python engine, and two TensorFlow models, running on TFS engine, to the Triton server running with the ONNX framework. This led to substantial improvements in model availability, reducing latency and cost by at least 50%. After the initial success, three new PyTorch GPU models were added directly to the Triton server. Benefits of modernized architecture The modernized model serving platform enables Quora to achieve performance enhancement, cost savings, and substantial feature enrichment. Some significant wins observed after the migration include: Performance enhancement: Latency of the PyTorch GPU model was slashed by seven times (from 230ms to 30ms) and latency for the TensorFlow GPU model was reduced by two times (from 20ms to 8ms). Notably, significant enhancements have been observed in Transformer and BERT-based models, such as DeBERTa, RoBERTa, XLM-RoBERTa, and E5 Text Embedding, with latency reductions exceeding seven times. Improved performance occurs due to conversion to the ONXX format, and model quantization from fp-32 to fp-16. This reduces the model size and memory usage, using ONNX runtime as inference backend engine and using gRPC for the communication framework Cost savings: The GPU model serving cost is reduced by 52%, which leads to 23% overall savings in model serving. The primary contributors to cost savings are conversion to ONNX, and Model Quantization. The model size gets smaller, and Quora could enhance throughput by two times and GPU utilization by three times. Ultimately, this improves the efficiency and cuts down cost. GPU use: The adoption of ONNX frameworks improved GPU use from 40% to 80%, leading to two times serving efficiency. Unified RPC framework: The new setup promotes a unified framework by migrating all models to use gRPC and service mesh functionalities. This unification simplifies client-side RPC support and streamlines the operations. More time to focus on innovation: With Amazon EKS, engineers don’t need to spend time on undifferentiated infrastructure management. It helps reduce operational burden, such as on-call pages. This allows ML engineers to dedicate more time to experimentation, training, and serving new models for an improved customer experience. Lessons learned Adopting new technologies can be a challenging journey, often fraught with unexpected obstacles and setbacks. Here are some of the lessons we learned: ONNX as a Preferred Exchange Format: Quora found ONNX to be an ideal open standard format for model serving. It’s designed for interoperability, making it a perfect choice when working with models trained with various frameworks. After training an ML model in PyTorch or TensorFlow, we could easily convert it to ONNX and apply post-training quantization. This process led to significant improvements in latency and efficiency. gRPC as the communication framework: Quora’s experience has shown gRPC to be a reliable RPC framework offering improved performance and reliability. Remote model repository feature in Triton: Although Triton supports remote model repository in Amazon S3, our testing indicated that this feature did not function as anticipated. We recommend incorporating a step to fetch the model files from Amazon S3 and place them into a predefined local path, such as: /mnt/models/. This method guarantees the availability of model files at a recognized location, a critical need for Triton backends such as the python_backend, which require Python runtime and libraries, or the hugectr_backend, which requires access to embedding files. Support of multi-ML frameworks: NVIDIA Triton Inference Server supports multiple frameworks, such as PyTorch, TensorFlow, TensorRT, or ONNX Runtime with different hardware. Amazon EKS as ML service: Quora needed an extensible, self-serving ML service based on microservice architecture that helps ML Engineers iterate quicker before deploying the model. Ideally, this service should support various training and serving environments, essentially being a truly framework-agnostic training and model serving service. We found Amazon EKS as the most suitable ML service. Conclusion In this post, we showed how Quora modernized its model serving with NVIDIA Triton Inference Server on Amazon EKS, which provided a strong foundation for flexible, reliable, and efficient model serving. This service reduced model serving complexity, which enabled Quora to quickly adapt to changing business requirements. The key factors that drove the modernization decisions were the ability to support multiple ML frameworks, scale the model serving with effective compute resource management, increase system reliability, and reduce the cost of operations. The modernized model serving on Amazon EKS also decreased the ongoing operational support burden for engineers, and the scalability of the design improved customer experience and opened up opportunities for innovation and growth. We’re excited to share our learnings with the wider community through this post, and to support other organizations that are starting their model serving journey or looking to improve the existing model serving pipelines. As part of our experience, we highly recommend modernizing your model serving with NVIDIA Triton on Amazon EKS. View the full article
-
Forum Statistics
63.6k
Total Topics61.7k
Total Posts