قالب وردپرس درنا توس
Home / Technology / GPU data processing: speeds up the deep learning curve

GPU data processing: speeds up the deep learning curve

Artificial intelligence (AI) may be what everyone is talking about, but being involved is not straightforward. You need a more than decent understanding of math and theoretical computer science, plus understanding of neural networks and deep learning fundamentals – not to mention a good working knowledge of the tools required to make these theories into practical models and applications. 19659002] You also need an abundance of processing power – beyond what is required by even the most demanding of standard applications. One way to get this is through the cloud, but because deep learning models can take days or weeks to come up with the goods, it can be hugely expensive. In this article, we will look at alternative alternatives and why the once humiliating graphics controller is now the must-have addition to it would be the AI ​​developer.

Enter GPU

If you & # 39; reading this is safe to assume what a CPU (Central Processing Unit) is and how powerful the latest Intel and AMD chips are. But if you are an AI developer, CPUs alone are not enough. They can do the treatment, but the vast volume of unstructured data that needs to be analyzed to build and train deep learning models can give them the maximum for several weeks at the end. Even multi-core CPUs struggle with deep learning, where the Graphics Processing Unit (GPU) comes in.

Again, you're probably well aware of GPUs. But just to collect, we're talking about specialized processors that were originally developed to handle complicated imaging ̵

1; for example, to watch high definition movies or participate in 3D multiplayer games or enjoy virtual reality simulations. GPUs are especially adept at treating matrices – some CPUs are having trouble coping – and this is also suitable for specialized applications such as deep learning. Also, much more specialized GPU cores can be plugged into the processor die than with a CPU. For example, with an Intel Xeon you might expect to get up to 28 kernels per socket, a GPU can have thousands – anyone can process AI data at the same time.

Because all of these kernels are highly specialized, they can & # 39; Do not run an operating system or handle kernel program logic, so you still need one or more CPUs. What these systems can do, however, are massively accelerating processes like deep learning training, by downloading the process involved from CPUs to all of these kernels in the GPU subsystem.

GPU in practice

So much for theory, In terms of practice, there are a number of GPU vendors with products ranging from games to the specialized HPC (High Performance Computing) Market and AI. This market was pioneered by Nvidia with its Pascal GPU architecture, which has long been the model model for others to aim for.

In the case of actual products, you can get into AI for very little expenses using a cheap GPU. For example, a Nvidia GeForce GTX 1060, available for only £ 270, delivers 1,280 CUDA kernels – Nvidia GPU core technology. It sounds like a big deal, but in reality it's not nearly enough to satisfy the needs of serious AI developers.

For professional AI use, Nvidia therefore has much more powerful and scalable GPUs based on both Pascal technology and newer architecture, Volta, which integrates CUDA cores with Nvidia's new Tensor core technology, specifically to accommodate deep learning. Tensor cores can deliver up to 12 times peak terflops (TFLOPS) performance of CUDA equivalents for deep learning training and 6 times input throughput – when deep learning models are actually used.

The first product to be based on Volta is the Tesla V100, which has 640 of the new AI-specific Tensor cores as well as 5,120 general HPC CUDA cores, all supported by either 16GB or 32GB second-generation high bandwidth ( HBM2).

  gpustesla-v100- sxm.jpg

In addition to a PCIe adapter, the Tesla V100 is available as an SXM module to connect to Nvidia's high-speed NVLink bus.

Picture: Nvidia

The V100 is available as a standard plug-in PCIe adapter (these start at around £ 7,500) or as a smaller SXM module designed to fit into a special motherboard connector that, in addition to PCIe connectivity, enables V100 to be paired with Nvidia's own high-speed NVLink bus technology. NVLink was originally designed to support first-generation (Pascal-based) Tesla GPU products since it has been enhanced to support up to six links per GPU with a combined bandwidth of 300 GB / sec. NVLink is also available for use with a new Quadra adapter and others based on the Volta architecture; Also, such a pace of change in this market, there is now a switching connection – NVSwitch – which allows up to 16 GPUs to connect to a bandwidth of 2.4 TB / sec.

Shelf AI

] Of course, GPUs are not important, and in terms of serious AI and other HPC applications, there are several ways to put them at work. One is to purchase the individual GPUs plus all the other components required to build a complete system and mount it yourself. But few business buyers would like to go down the DIY route, most preferably to get a finished – and more importantly, supplier-supported solution either from Nvidia or one of its partners.

These finished solutions, of course, all use the same GPU technology, but are distributed in different ways. So to get an idea of ​​what is offered, we took a look at what Nvidia sells and a Supermicro-based option from Boston Limited.

  gpusdgx-1-anna-voltarack.jpg "data-original =" https: / /zdnet2.cbsistatic.com/hub/i/2018/07/02/78830bce-ec69-4866-84ca-1a58f6ebe478/51692aaf23cb53b67217d0f6ffd017e9/ gpusdgx-1-anna-voltarack.jpg

Take your AI choices: Nvidia (bottom) and Boston (top) deep learning servers together in the same rack.

Image: Alan Stevens / ZDNet

The Nvidia AI family

Nvidia is known to be known as "AI Computing Company" and under its DGX brand, a few servers (DGX-1 and newer powerful DGX-2) sell an AI workstation ( DGX drive), all built around the Tesla V100 GPUs.

  gpusdgx-family.jpg "data-original =" https://zdnet4.cbsistatic.com/hub/i/2018/07/02/838696f9-0e82 -4ced-b32c-fd61a695a8b3 / a4411acfa82c80a8920151714e931eab / gpusdgx-family. jpg

The elegant Nvidia DGX family of clear AI platforms are all powered by the Tesla VX100 GPUs.

Picture: Nvidia

Delivered in distinctive gold clamps, DGX servers and workstations are clear solutions that include both a standard hardware configuration and an integrated DGX Software Stack – a preinstalled Ubuntu Linux OS as well as a mix of leading frameworks and development tools required to build AI- models.

We first saw DGX-1 (recommended price $ 149,000) that comes in a 3U rack-mounted chassis. Unfortunately, in the Boston lab, it was busy building real models so we could not take any pictures of ourselves except an outside shot. From others we have seen, but we know that DGX-1 is a fairly common rackmount server with four redundant power supplies. It's standard on the inside too, with a conventional dual-socket server motherboard equipped with a pair of 20-core Intel Xeon E5-2698 v4 processors plus 512GB DDR4 RAM.

A 480GB SSD is used to accommodate the operating system and DGX Software Stack, with a storage arrangement consisting of four 1.92TB SSDs for data. Additional storage can be added if necessary while network connection is handled by four Mellanox InfiniBand EDR adapters, as well as a couple of 10GbE NICs. There is also a dedicated Gigabit Ethernet interface for IPMI remote control.

  gpusdgx-1-in-situ.jpg "data-original =" https://zdnet1.cbsistatic.com/hub/i/2018/07/02/ 786c30fa-e314-44a4-9366-8eb208758146 / 574458ba3727fbd0af8010b922ccf69f / gpusdgx-1-in-situ.jpg

We could not open DGX-1 as it was busy training, but here's hard at work in Boston Limited & # 39; s Labs.

Image: Alan Stevens / ZDNet

The all-important GPUs have their own home on an NVLink card with eight electrical outlets fully populated with Tesla V100 SXM2 modules. The first release had only 16 GB dedicated HBM, but the DGX-1 can now be specified with 32 GB modules.

Regardless of the memory configuration, with eight GPUs available, the DGX-1 has a massive 40,960 CUDA cores for standard HPC work and 5,120 of the AI-specific Tensor cores. According to Nvidia, equivalent to 960 teraflops of AI computing power, as claimed, DGX-1 equals 25 racks of conventional servers equipped with CPUs alone.

It is also worth noting that the leading deep learning frameworks support all Nvidia GPU technologies. Furthermore, when using Tesla V100 GPUs, these are up to 3 times faster than using Pascal-based P100 products with CUDA kernels alone.

Buyers of DGX-1 can also benefit from 24/7 support, update and on-site maintenance directly from Nvidia, although this is a little expensive at $ 23,300 in a year or $ 66,500 for three years. However, in view of the complex requirements of AI, many will see this as a good value, and in the UK, customers should expect to pay around £ 123,000 (excluding VAT) to get a fully equipped DGX-1 with years of support.

AI becomes personal

  gpusdgx-stationbench.jpg "height =" auto "width =" 370 "data-original =" https://zdnet4.cbsistatic.com/hub/i/r/2018/07/ 02 / 08690f1c -9127-4afa-97ed-f43a4b9accc9 / resize / 370xauto / 19cbd6a8a48becfd5911c41d0501bb01 / gpusdgx-stationbench.jpg

The elegant DGX drive on a bench in Boston Limited's Labs.

Image: Alan Stevens / ZDNet

Unfortunately, the newer DGX-2 with 16 GPUs and the new NVSwitchen was in time for our review, but we watched the DGX drive, designed to provide a less expensive platform for development, testing and iterating deep neural networks. This HPC workstation will also appeal to companies looking for a platform for AI development before scaling up to local DGX servers or cloud.

Within a floor-standing tower chassis, the DGX drive is based on an Asus motherboard with a single 20-core Xeon E5-2698 v4 rather than two as on the DGX-1 server. System memory is also halved to 256 GB, and instead of eight GPUs, the DGX drive has four Tesla V100 modules implemented as PCIe adapters, but with a full NVLink pairing that connects them.

Storage is split between a 1.92 GB system SSD and a series of three similar drives for data. Dual 10GbE ports provide the necessary network connection, and there are three DisplayPort interfaces for local monitors with up to 4K resolution. Water cooling comes as standard and the end result is a very quiet as well as hugely impressive workstation.

  gpusdgx-stationinternals.jpg "data-original =" https://zdnet4.cbsistatic.com/hub/i/ 2018/07/02/702accc7-4a30-4029-818b-4726942521c1 / b2897bd3b28683949106bbfc41b0e95a / gpusdgx-stationinternals. jpg

We got into Smart DGX Station where there is only one Xeon processor, 256 GB RAM, four Tesla V100 GPUs and a lot of pipeline for water cooling.

Image: Alan Stevens / ZDNet

With half the complement to GPUs, DGX Station delivers a claimed 480 teraflops of AI computing power. It's not surprising that it's half of what you get with the DGX-1 server, but still much more than using CPUs alone. It's also much cheaper, with a list price of $ 69,000 plus $ 10,800 for a year's 24/7 support or $ 30,800 for three years.

British buyers need to find around £ 59,000 (for VAT) for the hardware of a Nvidia partner with a one year support contract, although we have seen a number of campaigns – including a buy four, get a free & # 39; offer! – which is worth looking at. Educational discounts are also available.

Boston Anna Volta XL

The third product we were looking at was the recently launched Anna Volta XL from Boston. This is actually equivalent to the Nvidia DGX-1 and is also powered by two Xeons plus eight Tesla V100 SXM2 modules. These are all configured in a Supermicro rack mount server with many more customization options compared to DGX-1.

  gpusanna-voltaproduct.jpg "data-original =" https://zdnet3.cbsistatic.com/hub/ I / 2018/07/02/3352f514-23b2-414a-92f6-7e80b055a06c / 3b7a2d5bddc7f2e2bc47cecf95b94622 / gpusanna-voltaproduct. jpg

The Boston Volta XL has two Xeon processors and eight Tesla V100 GPUs in a customizable Supermicro server platform.

Picture: Supermicro

A bit larger than the Nvidia server, Anna Volta XL is a 4U platform with redundant (2 + 2) power supplies and separate retractable drawers for the regular CPU server and its GPU subsystem. Any Xeon with a TDW of 205W or less can be specified – including the latest Skylake processors, which Nvidia has yet to offer on its DGX-1 product.

  gpusanna-voltacpu-tray.jpg "data-original =" https: //zdnet4.cbsistatic.com/hub/i/2018/07/02/744292b2-83f3-46b7-9b96-34a0c2ecbb30/f1419352e0bf07bc1c6ce948fd07b981/gpusanna- voltacpu-tray.jpg

There are 24 DIMM tracks available with Xeons to record up to 3 TB DDR4 system memory and for storage, 16 2.5-inch drive fines can contain 16 SATA / SAS or 8 NVMe drives. Network attachments are via two 10GB network ports with a dedicated port for IPMI remote management. You also get six PCIe slots (four in the GPU tray and two in the CPU tray), so it is possible to add InfiniBand or Omni-path connection if necessary.

The GPU tray is quite spartan, filled with a Supermicro NVLink motherboard with connectors for Tesla V100 SXM2 modules, each with a large heatsink on top. GPU performance is, of course, the same as for DGX-1, although overall system flow will depend on the Xeon CPU / RAM configuration.

  gpusanna-voltav100-modules.jpg "data-original =" https: // zdnet4. cbsistatic.com/hub/i/2018/07/02/9e4de1dd-7961-48b5-8691-47178dd3d674/e4e2fb6601d377df2c2c4cab534bca6e/gpusanna-voltav100-modules.jpg

The most important Tesla V100 modules are mounted on an NVLink card at the top of the Boston Anna Volta server (one of heatsinks removed for the image).

Image: Alan Stevens / ZDNet

Anna Volta is priced much lower than the Nvidia server: Boston cites $ 119,000 for a similar specification for DGX-1 (a $ 30,000 saving on list price). For British buyers who translate to around £ 91,000 (ex VAT). The AI ​​software bundle is not included in the Boston Prize, but most of what is required is open source; Boston also offers a number of competitive maintenance and support services.

And this is about this in this fast-growing market. With regard to the GPU hardware, there is no difference between the products we looked at, so it's completely down on preference and budget. And with other suppliers who are preparing to join, prices are already falling as demand for these specialized AI platforms grows.


Nvidia reveals special 32 GB Titan V Nvidia creates a special 32 GB edition of its most powerful PC graphics card, Titan V.

Google Cloud expands GPU Portfolio with Nvidia Tesla V100
] The Nvidia Tesla V100 GPUs are now publicly available in beta on the Google Compute Engine and Kubernetes Engine.

Nvidia Expands New GPU Sky For HPC Applications
With more than 500 high performance applications that contain GPU acceleration, Nvidia aims to make them easier to access.

NVIDIA HGX-2 GPU Blends AI and HPC for Next Business Processing (TechRepublic)
NVIDIA's new GPU calculator is tortured to be able to replace 300 you

NVIDIA brings its fastest GPU accelerator to IBM Cloud to boost AI, HPC workload (TechRepublic)
The combination can help businesses and data connoisseurs to create cloud-native apps that generate new business value.

Source link