As expected, Nvidia revealed three new GeForce RTX graphics cards at its Gamescom event. We have covered many rumors and speculation, but we now know prices, features, performance ̵
Nvidia's latest GeForce architecture was Pascal, which ran everything from the best graphics cards like the GTX 1080 and GTX 1080 Ti to the entry level GTX 1050 and GT 1030. Last year, Nvidia released a new Volta architecture that apparently remains in the supercomputers and deep learning-focused fields, because the new Turing architecture seems to beat it in almost any meaningful way. If you just splurged on a Titan V, it's bad news, but for players who are using new graphics cards, your patience has paid off.
Key Specifications and Prices for the RTX 20 Series Graphics Card
There was a ton of speculation and, yes, blatant mistakes guessing what the Turing architecture would contain. Every single "leak" until last week was wrong . Chew on it for a moment. We can do an educated guess on what Nvidia and AMD can do with a future architecture, but such guesses are bound to be wrong. Nvidia revealed many core details in the Turing architecture at SIGGRAPH, and with the official announcement of the GeForce RTX 20 Series we can finally put all the rumors in the mouth.
Quick Forecast: I have used the reference specifications for all GPUs in the following table. The 20 Series Founders Edition cards have a higher price, but come with a 90MHz higher Turing boost watch and place them in the same range as the factory overclocked models are likely to land. As for the true reference cards, we do not know what they want to see or how much they are available, especially at launch. I suspect we do not want to see the lower end of the price ranges mentioned above for at least a month or two after the graphics cards begin to send.
Here are the specifications (with a few areas still unknown, like door size and transistor counts for the smaller Turing kernel):
For traditional graphics work – what games have used up to now -CUDA core numbers are moderately improved across the line. 2080 Ti has 21 percent more cores than the GTX 1080 Ti, the RTX 2080 has 15 percent more cores than the GTX 1080, and the RTX 2070 has 20 percent more cores than the GTX 1070. The result of theoretical TFLOPS is a similar 13.5 to 18.6 percent improvement, it calls 15 percent on average. Here is the important bit: These theoretical numbers represent more of a worst case scenario for Turing.
Architecturally, Nvidia has improved the CUDA kernels this round. An important change is that the CUDA cores can do simultaneous FP32 and INT calculations. Most graphics work depends on floating point calculation (eg 3.14159 * 2.71828), but integer calculations for memory addresses are also important. It is not clear exactly how this ultimately affects graphics performance, but during its GeForce RTX presentation, Nvidia CEO Jensen Huang stated that the Turing cores are "1.5 times faster" than the Pascal kernels. If this figure is as close to reality, the new GPX 20 Series GPUs will be significantly faster than the current 10 Series.
Performance improvements do not stop with more and faster CUDA cores. Turing will use 14 GT / s GDDR6 memory in the three parts that have been revealed so far. It gives the 2080 Ti a modest 27 percent improvement in bandwidth, 2080 looks a 40 percent boost, and 2070 gets catapulted to the equivalent of the 2080 model and receives a 75 percent increase in performance. Each GPU has a certain amount of memory bandwidth it needs, beyond which faster memory does not help so much. Nvidia has traditionally kept its top GPUs pretty well balanced, but the move to GDDR6 has changed things. I suspect that 2070 does not really need all that bandwidth, but having extra safe will not hurt.
All so far represents updates to Nvidia's traditional GPU architecture. What's next is the new additions, RT and Tensor kernels. RT stands for ray tracing, a technique that was first introduced in 1979 by Turner Whitted. It is probably no coincidence that Whitted came to Nvidia in 2014 and worked in her research department. The time fits perfectly with Nvidia, and begins with serious efforts to implement real-time radiation tracking hardware, and Turing is the first clear fruit of these efforts. In a recent blog post, Whitted discussed some of his history with beam tracking and global lighting.
I come back to what radiation tracking it's a bit, but the new information from Nvidia is that the RT cores make about 10 TFLOPS of calculations for each Giga Ray per second. It is important to determine that these TFLOPS are not TFLOPS for general use, but instead, these specific operations are designed to accelerate beam tracking computations. Nvidia says the RT cores are used to calculate the crossroads (where a ray hits a polygon), as well as the BVH traverse. The second bit requires a longer explanation.
BVH stands for "limiting volume hierarchy" and is a method for optimizing cutting calculations. Instead of controlling rays against polygons, objects are encapsulated by larger, simple volumes. If a beam does not cross the large volume, do not use extra effort by controlling the object. Conversely, if a beam crosses the boundary value, the next level of the hierarchy is checked and each level becomes more detailed. Initially, Nvidia supplies hardware that accelerates common functions used in beam tracking, which can increase the calculations by an order of magnitude (or more).
The latest major architectural feature in Turing is the inclusion of Tensor cores. Normally used for machine learning, you may want to wonder why these are even useful for games. There is future potential for gaming using such cores to improve AI in games, but it seems unlikely, especially when in the next five or more years is not a large installed base of players who have Tensor cores available. In the near future, these cores can be used in more practical ways.
Nvidia showed some examples of improved image quality, where machine learning that has been trained on millions of images can provide a better result with less blocking and other objects. Imagine giving a 1080p game with a high framerate, but then you use the Tensor kernels to scale it up to a pseudo-4k without the massive success contest we are currently experiencing. It would not necessarily be perfect, but suddenly the thought of 4k monitors running at 144Hz with "native" 4k content was not so far fetched.
Nvidia also discussed a new DLSS algorithm that provides a better anti-aliasing experience than TAA (temporal AA). It's not clear if Infiltrator uses DLSS, Tensor kernels, or what, but Nvidia says that the infiltrator demo runs at 78fps on a GTX 2080 Ti, compared to just "30-something" fps on a GTX 1080 Ti- both at 4k.
Turing will be produced using TSMC 12nm
A novelty not surprising at all is that Turing GPUs are produced using TSMC's 12nm FinFET process. Later Turing models could potentially be manufactured by Samsung, as was the case with the GTX 1050/1050 Ti and GT 1030 Pascal parts, but the first round of Turing GPU comes from TSMC.
What goes on to 12nm from 16nm mean in practice? Different sources indicate that TSMC's 12nm is more refined and refined to existing 16nm instead of a true reduction in function sizes. In that sense, 12nm more of a marketing time than a real door is shrinking, but optimizations for process technology over the last two years should help improve the bells, chip density and power usage – the holy trinity of faster, less and cooler running chips. TSMC's 12nm FinFET process is also fashionable at this point, with good yields, which allows Nvidia to create a very large GPU design.
The top TU102 Turing design has 18.6 billion transistors and measures 754mm2. (Note that the TU102 is what somebody calls it – Nvidia has not officially called the chips as far as I'm aware. "A Rose with Any Other Name" and all that ….) It's a big chip far greater than GP102 Used in GTX 1080 Ti (471mm2 and 11.8 billion transistors). It's almost as big as the GV100 used in the Tesla V100 and Titan V (815mm2), which is really as big as Nvidia can go with TSMC's current production line.
The TU102 supports a maximum of 4 608 CUDA cores, 576 Tensor cores and 10 Giga rays / sec spread over 36 streaming multiprocessors (SM), with 128 CUDA cores and 16 Tensor- cores per SM. As usual, Nvidia can partially deactivate chips to create lower level models – or more likely, it may harvest chips that are partially defective. The RTX 2080 Ti uses 34 SM, which gives it 4,352 CUDA cores and 544 Tensor cores as far as we can tell. Nvidia has not provided specific details about the RT core values, but the RTX 2080 Ti is rated on top 10 Giga Rays / s as Nvidia also uses for the Quadro RTX 6000, so it does not seem to have any deactivated RT cores.
The other Turing piece for now is a step down in size, but Nvidia has not given any specific numbers for the TU104 yet. It has a maximum of 24 SM and it will be used in the RTX 2080 and RTX 2070. 2080 disables only one SM, which provides the 2,944 CUDA cores and 368 Tensor cores from what we can tell. It is also rated at 8 Giga Rays / s, which indicates that the RT cores can not be integrated directly into the SMs. The RTX 2070 currently deactivates six SM, for 2,304 CUDA cores and 288 Tensor cores, and 6 Giga rays / s. The door size is likely in the 500-550mm2 series, with around 12-14 million transistors. More importantly, the TU104 will cost less to produce, so it can easily enter $ 500 parts.
Wrapping up Turing and GeForce RTX hardware, all new GPUs will use GDDR6 memory and based on VRAM capacity. Nvidia uses 8GB chips (while Quadro RTX uses 16GB chips). The TU102 has up to 384-bit interfaces, and the 2080 Ti disables a 32-bit channel to end up with a 352-bit interface, and the TU104 has up to 256-bit interfaces. Using 14 GT / s GDDR6 for both 2070 and 2080 means that they end up with the same memory bandwidth, which probably means 2070 has more bandwidth than it would normally use. GDDR6 supports official speeds of 14-16 GT / s, and Micron has shown 18 GT / s modules, so Nvidia goes for the lower part of the spectrum right now. We could see faster memory in the future, or on partner cards.
What is ray tracing and is it really so big of an appointment?
That's the architecture (for now, least), but I promised to return to these RT cores and why they are important. Nvidia invests a lot of money in ray-tracing with Turing, as it often refers to as the "holy grail" of computer graphics. That's because ray tracing can have a profound impact on the way the games are made. It's a big change that Nvidia has dumped GTX branding on the new 20-series (at least 2070 and over), switched to RTX. You can try to say that it's only marketing, but doing something near real-time beam tracking is incredible and in 10 years we may look back at the introduction of RTX just as we are currently looking back at the introduction of programmable shaders.
Explain what ray tracing is, how it works, and why it's better than alternative rendering models is a big topic. Nvidia and many others have published long explanations. Here's a good starting point if you want to know more, or check out this series of seven videos on RTX and games. Basics, ray tracing requires much more computational work than rasterization, but the resulting images are generally far more accurate than the approaches we are used to seeing. Ray tracing is particularly effective in simulating lighting, including global illumination, spotlight, shadows, surrounding occlusion and more. With RTX, Nvidia enables developers to come closer to simulation of accurate lighting and shadows.
Instead of explaining exactly how beam tracking works, it's better to look at some examples of how it is used in games. There are currently 11 announced games in development that use Nvidia's RTX ray tracing (and probably others that are not announced). There are 21 games in total that use some of the new RTX enhancements provided by Nvidia's Turing architecture, and here are a few specific examples of games that use ray tracing.
This clip from the Shadow of the Tomb Raider shows how RTX ray-tracing can enhance the lighting model. The key elements to notice are the spotlights (lights) in the foreground and the shadows they make. Adding dynamic spotlights drastically reduces performance with traditional rasterization, and the more pointlights you have, the worse it will be. Developers and artists spend a great deal of time coming up with approaches that look pretty good, but there are boundaries for what can be done. Ray tracing provides a far more accurate transmission of how light interacts with the environment.
Here's another clip that shows how the beam trace improves the illumination of the Tomb Raider shadow, this time with two cone lights and two rectangular area pieces. Everything looks good in traditional mode, with shadows that change based on the lights, but the way these shadows mix do not reflect the real world. The RTX illumination in contrast uses physically based modeling of the environment, showing green and red spotlights that blend together, colors around the edges of shadows and more.
Another beam tracking example that shows global lighting is Metro Exodus. Here the traditional model illuminates the entire room a lot more, while the right beam of radiation has deep shadows in the corners, bright areas illuminated by direct illumination and indirect lighting that help make some areas still clearly visible while others are not. The possibilities this gives to artists and designers is interesting, although I have to notice that "realistic" shadows are not always more fun.
I was given the chance to play the Metro Exodus demo, which allowed me to switch dynamically between RTX on / off. Wander around some past due buildings, with RTX lighting, the rooms are a lot darker. It can create a sense of fear, but it also makes it harder to see objects and find out where to go and what you can do. In any case, the appearance of the Metro world was excellent, and the RTX lighting gives a completely different experience. This is not just some fluffy tweak to graphics to give a little different shadows; RTX lighting clearly changes the environment and affects the gameplay.
There is another disadvantage, but: RTX has higher performance requirements. All the games that appear are in alpha or beta states, so much can be changed, but it's clear to enable all the fancy RTX effects, causing a performance impact. I saw periodic struts in the Shadow of Tomb Raider, Metro Exodus and Battlefield V-the three biggest names right now for RTX. The visual difference may be impressive, but if the performance falls in half in comparison to traditional rendering techniques, many players will likely end up deactivating the effects. There is work to be done and hopefully the work will be more in the form of software updates to improve performance without sacrificing quality, instead of waiting for a few more generations of hardware before this becomes practical.
Nvidia's RTX is the Form of the Future
If you've followed the graphics industry at all, it has always been clear that the goal was to get real-time radiation, or at least use some elements of beam tracking in real time graphics engine. Our graphics pieces have come a long way in the past 30 years, including milestones such as 3dfx Voodoo as the first regular consumer card that could make 3D graphics high performance, GeForce 256 as the first GPU with acceleration of the transformation and light process, and AMD's Radeon 9700 Pro as the first fully programmable DirectX 9 GPU. Nvidia's Turing architecture appears to be just as great for a change over its predecessors as any of these products.
Like all changes, this will not necessarily be a nice and clean break with the old and the beginning of something new. As cool as real-time ray tracing may be, it requires new hardware. It's the overall chicken and egg problem, where the software does not support a new feature without the hardware, but building hardware to accelerate something that's not currently used is a big investment. Nvidia has made the investment with RTX and Turing, and only time will tell if it pays off.
Unfortunately, in the next five years, we will have a messy situation where most players do not have a card that can make RTX or even generic DirectX RT from Microsoft. I'm going to talk to some developers using RTX for ray tracing to find out how hard it is to add support to a game. Hopefully, it's not too difficult, because most developers must continue to support older products and rasterization technologies.
Even in the long run, RTX extensions can not win. It's proprietary Nvidia technology, so AMD is completely locked out right now. Ideally, standards will evolve, just like they did with Direct3D, and eventually games can support a single APU that will stretch on which GPU / processor in a system. We are in a pretty good place with DirectX 11/12 these days, so maybe DirectX RT 5.0 will be the default. But regardless of how we get there, real-time ray tracing or any variant of it is set to be the next big thing in PC games. Now we just have to wait for the consoles and the software to capture the hardware.
But how does the hardware actually do? Stay up to date for our full review of the GeForce RTX 2080 Ti and RTX 2080, around September 20th.