Last week I had written a summary of Nvidia's Turing RTX series of graphics cards, including their specifications, launch dates and prices. For a high overview of it, check out the article here . Now that we are a week removed from the keynote, however, we have had the opportunity to digest more of the information that was discussed, as well as (trying to) understand the new technology that was introduced.
It's pretty much we do not know about the RTX 2070, 2080 and 2080 Ti. The make-up of the GPUs to the technologies they want to support is simply unparalleled in consumer GPUs. This article will be my attempt to deconstruct many of the technical reasons discussed for a better understanding of these technologies.
It is to be said that I not is an expert on these subjects in any way. There will be some speculation here. As a result, I encourage everyone to wait for independent references and just do not take Nvidia on their word. However, I hope that after reading this article, you'll get away with a better understanding of the core business and technologies at stake in Turing RTX.
If you remember from Nvidia's main note, Turing is set to handle beam detection (RT core), AI (Tensor core) and traditional computing operations (Turing SM). But before we go deep into these technologies, it's important that we have a common vocabulary to establish our baseline. The RTX GPUs are about ray tracing. This is completely different from how games have been done for decades. The technique that has been a supporter all these years is called rasterization.
Rasterization effectively takes the 3D corners and projects them into a 2D pixel plan, otherwise known as a screen area. This projection leaves a silhouette of a triangle. You then test which of the pixels that are in the triangle.
After being used for decades, rasterization has advantages and limitations. The biggest advantage is undoubtedly the phenomenal parallelization allowed with rasterization. This parallelization is what GPUs are extremely effective at.
Rasterization is not without its limitations, no more apparent in lighting. Each time you want to light on the stage, project light from the light source. When rasterizing, handling two or three lights is fine. However, handling a site light (a single light source such as the sun for example) is a problem because the area noise is effectively an infinite number of light sources that interact with everything in the scene. Everything in the scene would be an illumination surface, ie global lighting. Therefore, the amount of light you have to throw in that stage will increase dramatically. This is simply unfeasible as it becomes extremely expensive computational.
This is where ray tracing comes in. Ray tracing is just tracing the path of each of these light particles and calculating how they interact with the objects in your scene. Because this radiation trace is mathematically and physically correct, it can very easily maintain the real qualities of the objects it hits, the shadows it creates, and the reflections it creates. For example, you can very easily imagine that a wooden surface will look more diffuse than a more reflective surface like metal. You expect this because this is how wood and metal behave in reality.
In short, shadows, reflections and lights look and behave as they do in reality. You emit a beam of light that cuts a pixel on the screen and into the world looking for a triangle that it would cross.
Ray tracing has been quite impossible with consumer class hardware, so developers have to come up with smart tricks to imitate the behavior of light. The three most common (but not exhaustive) methods are reflection probes, cube maps and screen space reflections (SSR).
A 19659002 Reflection Sense resembles a camera that captures views of the surroundings in all directions. These capture data are stored as a cube map essentially a 2D capture of a 3D environment, saved as a collection of six textures (hence "cube") to display the environment in up, down, forward, back, left, right position.
Kubemaps are then used on traditionally reflective surfaces like windows to simulate reflection. The problem is because these data are essentially baked – remember that it's a snapshot efficiently – it's not real-time and therefore can not accurately reflect the current environment. For example, an explosion on a street will not be reflected on a cubic surface like the windows of a bus stop.
Developers can reduce this somewhat by using screen-room reflections. However, SSR has its limitations. Because SSR only generates reflections for what is in the current display class, no occluded objects are reflected. Objects can be occluded by other objects, or simply by tilting the camera. For example, an explosion that comes behind you, and thus not in the display room, will not be reflected on the bus stop window in front of you.
Finally, you've probably heard the term "TFLOPS" or "tear FLOPS." In computing, this is used to measure the performance of hardware. A FLOP is just "floating point operation per second." Therefore, TFLOPS trillion floating point operations per second.
But what is a floating point? Very simple (and I mean very simply), a floating point value is a representation of a real number as bits. The higher the precision of this number, the greater the accuracy. Here you see terms like FP32 and FP16. FP32 is 32 bit (or 4 bytes) memory against FP16 which is 16 bit (or 2 bytes) memory. FP32 is most common for computer graphics and is also known as simple precision, while FP16 is known as semi-precision. The more precision, the fewer calculation operations you can perform. In other words, the targeting on FP32 will be exactly half of the targeting on FP16.
With this in hand, we'll take a closer look at the actual Turing GPU. Nvidia breaks down GPU in three different cores: Turing SM, RT core and Tensor core. These three cores perform all different functions, and this is where Turing completely differs from previous architectures. Previous architectures only have SM.
So … what is an SM? An SM is a streaming multiprocessor. Turing SM can make independent floating point and integer operations simultaneously. You use float point to calculate color, for example. And you can use integer operations to calculate addresses to run shader programs, then calculate things like groundbreaking at the same time.
This ability to handle floating point and integer operations is completely different from previous architectures that could only perform floating point operations. Turing also has a new unified dual-bandwidth cache architecture from the previous generation (Pascal).
Because of this, Turing SM is simply more efficient and powerful compared to a Pascal SM. Nvidia says Turing SM is 1.5x more efficient than previous generations, which independent references must be verified. As a result of all this, Turing SM gives 14 TFLOPS of FP32 + 14 TIPS (Trillion Integer Operations Per Second). In addition, Turing SM can handle Variable Rate Shading to enable foveated rendering. Foveated rendering is a way to better use the GPU power to provide more accurate rendering where the eyes look, which is very useful for VR applications.
The RT core is the radiocarve core. Per Nvidia it is built to handle up to 10 Giga rays / second. The 1080 Ti capable of 1 Giga rays / second. So when you see Nvidia utilizing Turing like having 10x performance of a 1080 Ti, this is what they mean. It does not mean that this Turing GPU is 10x more powerful than a 1080 Ti. Turing is simply 10x as fast in ray tracing applications than a 1080 Ti. This RT core can also handle BVH three-traversal (more on this under) and Ray Triangle cross-test.
The Tensor core runs AI treatment, deep learning and the like. Per Nvidia, it can perform 110 TFLOPS FP16, 220 TOPS FP8 and 440 TOPS FP4.
As I mentioned in my article last week, Nvidia has created a new calculation to measure the performance of these Turing RTX GPUs, RTX -OPS. This is just an aggregate of the weighted performance contributions to the Tensor cores, RT Cores and SM cores and shaders. RT core handling beam trace, SM handles INT32 and FP32 shading, and the Tensor kernel handles DNN processing (deep neural network). Turing can also utilize AI (Tensor cores) to generate some missing pixels to fill out a frame.
Nvidia says when they collect this, they get 78T RTX-OPS. It should be noted that right now I have no idea how they weigh each of these kernels and I do not think the information has been published as of this writing. Jen-Hsun also said that comparing this with Titan X, Titan X can perform 12T RTX-OPS. This is misleading because the calculation of an RTX-OP involves Tensor cores and RT cores. These are things that Pascal simply does not have. It's not a one-to-one comparison that Jen-Hsun wanted you to believe.
I mentioned BVH above when I discussed the RT core. BVH stands for boundary volume hierarchy and it works like this. For radiation tracking, send out a beam and see where the triangle you project crosses with an object in a scene. You can do it one by one and "walk" along this 2D projection to throw your beams, but this is ineffective.
For this, you use BVH. Instead of checking each pixel to see if it intersects an object in the scene, take all the objects in a scene and put them in volume (think of a box). If the beam you cast does not tick any of these boxes, you can ignore that area. You do not have to waste time and calculation of power rays in that direction anymore because you know that nothing is there.
If the beam thrown crosses with one of these boxes, you know to throw more rays in this direction. So now you move into the next set of smaller boxes in the first big box and repeat the process and control each of these hierarchy levels.
The RT core allows all this mathematics to take place and enables acceleration of all these calculations. Compared to Volta, Turing 6x is more effective when it comes to such calculations.
Nvidia also discussed DLSS, or deep-learning super-exam. DLSS is done by utilizing deep learning to take a lower resolution image and create a higher resolution image. You can also take an incomplete picture using DLSS to complete the missing pieces.
Meaning, theoretically, you can run a 1080p high-end game, but use DLSS to run it in a psuedo-4K at super high fringe rates. I can easily see Nvidia deliver DLSS in its cloud service, GeForce Now.
Compared to temporal antialiasing (TAA), DLSS provides a much cleaner accurate image. Epic's Infiltrator demo ran a single Turing card with 4K at 60fps, compared to a 1080 Ti that would run it at 30-something fps. Digital Foundry confirmed that in a demo behind closed, the infiltrator DLSS demo running side by side by 4K on a GTX 1080 Ti and RTX 2080 Ti dies.
Framerates for 1080 Ti was around 35fps with 2080 Ti that deliver around twice it all through the demo. Remember, this is a repeatable demo, which means that this demo will run along the same path whenever it makes it easy to optimize. I'm curious to see real achievements in games that utilize DLSS. Currently, this is the list of games that have been verified to support DLSS .
Of course, because of how nuanced and new all this technology is, people will unnecessarily freak out. Articles like this increase performance problems when using these technologies and start to worry about people.
I think that the people who want to be upset at under 60fps at 1080p with ray tracing enabled in games like  Tomb Raider only fundamentally fail to understand the size of what's just happening under the hood.
Please note that movie CGI utilizes ray tracing and it may take hours or days of reproduction per frame. The fact that we now have a single graphics card that can do this in real-time at even 30fps is a monumental milestone in PC graphics.
This is just the first generation of this technology. The game must be further optimized before launch. Drivers must be solved. People just need to understand the bigger picture. Their concern stems from an ignorance that is essentially Nvidia's and the media's efforts to soften and educate.
However, this still promotes concerns for real "normal" rasterization performance. Normally when we measure performance, we do that by comparing the relativistic increase of the new product relative to the old product.
Nvidia showed consciousness not metric or talked about such performance, so until we see actual references, it is difficult for us to measure this. Cynically, you can say they did not discuss this because maybe Turing is not as big an increase as people expect.
Conversely, the SMs in Turing can perform independent FP and integer operations – some previous generations could not do. SM also has a new unified dual-bandwidth cache architecture for Pascal. In this sense, a Turing SM is more capable and efficient than a Pascal SM. But at the end of the day, we just have to wait for references and tests in real time.
For this purpose, nvidia tweeted this link a few days after their main speech, citing performance gains in real-world in different games,
But straight out of the box, it gives you a huge performance upgrade for gaming you are playing now. The GeForce RTX GPUer delivers 4K HDR games on modern AAA titles with 60 frames per second. An achievement even though the GeForce GTX 1080 Ti GPU is unable to handle.
These fragments above were captured on 4K HDR on an RTX 2080. Anecdotally, I achieve 45fps on 4K HDR in Final Fantasy XV on a GTX 1080 Ti overclocked to 2GHz so it's great to see a 60fps figure for that game on the RTX 2080 (not even the highest 2080 Ti). to see.
Of course, take this with a salt grains. This is Nvidia itself which gives the result number, and we have no indication of the 2080 was overclocked. In addition, we have no idea what quality settings these games were tested on.
This next chart is most interesting. In essence, the Nvidia GTX 1080 performance per game at 4K normalizes to boost the relative performance of the RTX 2080 for the same game.
Some games see a 2x increase when using DLSS, telling me that DLSS data is delivered through driver updates. This is really massive. In other games, however, we see a 50% increase in performance at 4K when it does not use DLSS, still a huge increase. In this scenario, if you experience 45fps on a GTX 1080, you can expect 67fps on an RTX 2080 without DLSS. With DLSS, it goes from up to 90 fps.
With this in mind, we can perform simple maths to estimate potential further increase from a 2080 Ti card. To make sure that the 2080 Ti is about 30% more powerful in computation compared to 2080, we can estimate seeing a 30% boost in performance. So if you get 60fps at 4K by 2080, you can expect to expect 78fps on a 2080 Ti.
Once again, take this with a salt grain. We need to wait for independent references for all these cards. But although Nvidia's achievements claim here, it's a gross exaggeration and realistically we can only expect a 30% boost over a 1080 sense DLSS, it's still a big leap in performance all the same. Suddenly 4K 144Hz monitors are sensible … if independent real-time testing confirms Nvidia's performance requirements.
This question: Why did not Nvidia speak this during their main speech as they knew they had everyone's attention and that people would want to know this information? They wanted to talk to Ray Tracing. That's the simple answer. Ray tracing markets better than "50% increase in performance." Yet this shows me a dissonance between what Nvidia wants to discuss versus the common people needs to hear.
To complete this whole, PC players are used to high resolutions and high framerates. We are used to having our high-resolution cake and also eating it on beautiful high fringes. PC players must accept the fact that beam tracking is only on a completely different level, and never requires a calculation.
This means that PC players must also accept lower frame rates. Remember that PC players always have the ability to disable it if they want their frame rates high. But in my opinion, this hit is definitely worth it if it means more physically accurate and disappointing games.
This is how the envelope is pushed. We may still push technology forward. This is how we achieve progress. Some people will always say that graphics do not matter. But these are the same people who continue to buy graphics cards and consoles generation on generation. In addition, this is such an embarrassing recurring mentality. Without progress you will remain stagnant. And in the real world, when you're stagnant, you move backwards.
In the case of beam tracking, this is just the beginning. I'm extremely curious to see the new gambling effects that this may allow. One might imagine seeing a hostile soldier in a reflection running against you in Battlefield V and then responding to it. One can also imagine how developers can utilize Tensor cores to create better AI.
The reality is that only a fraction of consumers want this technology available to them. Meaning, developers will be catering to a small subgroup. They still need to make the backed global lighting methods that exist, in addition to implementing beam tracking. How and whose developers will exploit this is up to them, but I certainly hope to see widespread adoption.
Additionally, this uses Nvidia's custom-made platform, which means that AMD users will not be able to exploit this. Although they could understand, their GPUs are simply not designed for ray tracing. Nvidia had to create dedicated hardware to enable this.
And right now, AMD does nothing. As I mentioned in my previous article there is one thing to trust in your strength (Ryzen CPUs), but to, at least outwardly, apparently not compete in the GPU room, the question asks: Why is AMD still in the? Will they even be in it? I do not easily ask these questions, but as someone who cares about this industry, I can not escape.
In the case of consoles, there is absolutely no way next-generation Xbox and PlayStation will be able to perform ray tracing. The so-called "enhanced" consoles of Xbox One X and PS4 Pro can hardly create native 4K, nevermind 60fps at that resolution. With the inevitable anticipated hits on performance that ray tracing will bring, there is simply no chance that the next consoles will even be able to do this. In fact, I do not expect that consoles should be able to perform beam tracking for at least two generations. In short, PC games will be on a completely different level in the foreseeable future, provided we see the developer's adoption of RTX.
Everything that comes from Gamescom from Microsoft and Sony looks legitimate as children's games in comparison to the technology shown on Nvidia. Tweets from Microsoft said Battlefield V will be the "most intense, expensive battlefield" that deserves an Xbox One. I just have to sit there and smile as you cling on these console manufacturers on their heads in a condescending manner.
Not 24 hours before was groundbreaking technology on the screen from Nvidia, but Microsoft claims that Battlefield V is so immersive that it deserves an Xbox One – a console that consistently struggles to beat 1080p with its bigger brother which does not consistently beat 4K in most games, nevermind 60fps. It only shows me the absolutely core distribution of consoles today and modern PC technology, a breadth that continues to widen and broaden.
Finally, I think fundamentally that we are only at the beginning of a whole new era of PC gaming. This sounds terrible like hyperbolt, and I absolutely do not owe you to think about it. After all, how many times have we heard it before?
The difference here is that radiation tracking, unlike 3D and VR, has long been the sacred degree of PC graphics. The desire to imitate the real world has been there since the beginning of PC games. All these methods such as physically based reproduction, surrounding occlusion and the like are evidence of this desire to imitate the real world. I think that racing in PC graphics is still in its childhood, but the truly revolutionary leap in hardware and software technology done by Nvidia can not be downloaded at least. This is truly a remarkable achievement and an inflection point in PC graphics and real-time rendering. I'm very excited about what's going on next.