Revolutionizing Graphics: A Deep Dive into DLSS Technology by NVIDIA (part two)

DLSS Explained: AI-Powered Upscaling in Gaming

What DLSS essentially does is it averages out the pixels around us, much like what our computer does. For instance, when we’re playing a game and we want to lower the resolution. Imagine you have a 4K monitor, but the game isn’t running smoothly. So, you decide to lower the resolution to 1080p. The problem is, the graphics card would be generating four times fewer pixels than the screen actually has, and it needs to know what to display. It could either just put something there in the middle in a tiny square or it has to have some technique to stretch this image and fill in the pixels. This is where upscaling comes in, and games already do this. The problem is that the image ends up looking blurry. 

There are parts where we have jaggies, which now appear even larger, and overall, it doesn’t look right. That’s why upscaling using artificial intelligence is a great solution, so you can understand it from a different perspective. Imagine you’re an artist and you’re asked to draw by hand an image from 1080p to 4K. It would probably take you several days, but you’d do it much better than these algorithms because, for example, when you encounter a tree, in your mind you know what a tree looks like. There are many situations that you, as a human being, with the experience of the real world and video games, know. You know the best way to complete that drawing because you know thousands of different cases, and it’s almost impossible to explain this to a computer using only code. 

It would take years and years to write a program that can identify each specific case and draw each area in a specific way with a specific detail. This is where the convolutional neural network helps us a lot, so we actually have a pretty big advantage when artificial intelligence programs itself. Now, as I mentioned before, both convolution and other processes that the neural network does are basically multiplying two lists of numbers, and the graphics card does this very quickly, but it can still be made faster. We could create a specific, super-fast hardware to perform these tasks. 

That’s why years ago, Google invented the concept of tensor CORE, a specialized core for neural network operations. It’s designed to do one thing very well, and that’s why it’s faster than the typical processor core or graphics card. Even NVIDIA took that idea and implemented it into its graphics card. Along with DLSS, they introduced a series of cores specialized in these kinds of multiplications, which are what we need for the neural network. And this way, what is achieved is that for each frame in real time, this kind of processing can be done, I mean, it’s done in milliseconds, which I honestly find crazy when you think about it. But here we have one of the first key points in all of this DLSS stuff: we need some specific hardware in some way. Could we do this without specific hardware? I mean, could we use the CUDA cores of the graphics card to perform this same operation? But the fact that we have some special cores not only makes it faster because they’re optimized for that, but they also free up the graphics card to focus solely on processing the graphics, and both parts work in parallel. 

The first version of DLSS was trained separately for each of the games, I mean, each game had a different model that was optimized for its specific graphics, and they did this by generating images of a game in very high resolution, let’s say 16K, and then in low resolution. And all of this was passed to a supercomputer, and then they included this in the graphics card drivers and downloaded it directly to your computer without you realizing it. I mean, it came to you already with the NVIDIA drivers. The result was okay, better than a regular upscale, but nothing to write home about. What really caught everyone’s attention was DLSS 2.0. Here the story changed a bit, instead of just upscaling the image, this time we’re going to do something more sophisticated, said. First of all, we’re going to need more data, such as the game’s depth map or Z-buffer, which tells the neural network what’s closer and what’s farther away in the image. And this makes perfect sense because if we combine the information from the different patterns with the distance they’re at, we could have different reconstruction patterns for things that are closer and things that are farther away. And that would make them of better quality. We wouldn’t draw everything in the same way, but depending on the distance, we draw it differently. We also have the motion vector, I mean, where the camera is moving, which also gives us a lot of information about how certain things are supposed to be.

And finally, we have a temporal filter. Temporal means that it uses several frames one after another instead of looking at a single frame in a single image, it uses several frames one after the other. And this is very smart because there are certain details that we gain extra by looking at two different frames. I mean, there are things that are not very clear in one image and in the other one we have an extra pixel that helps us understand. More information and there’s one more point. And it’s that NVIDIA recommends developers to shake the camera a bit. I mean, to have a little shake, and that way the DLSS ends up seeing several pixels between different frames or different angles of the same object. We need to change the perspective a bit. So, in the end, we get more information and with this version what was achieved was totally mind-blowing, and basically, they are clearer and more accurate images than the originals. I mean, you take for example the game rendered at 1080p and the game at 720p upscaled to 1080p. And the DLSS 2.0 backed version looks better, for example, it has fewer jaggies and more definition.

This is probably because the network was actually trained with very high-resolution images. I mean, it wasn’t trained to reconstruct the image at 1080p, but they trained it to reconstruct it at 16K. And that makes the image have a lot of detail, which is why they thought since we’re at it, let’s also use it as anti-aliasing. Anti-aliasing is a technique used to remove those famous jaggies that you see in the image. Nowadays, there are very good techniques like temporal anti-aliasing or multi-sample anti-aliasing, but they are expensive for the graphics card and they are part of this graphic section, not the one we talked about before. They add to the graphics processor load of each frame, but using DLSS 2.0 they realized that they could also use it as anti-aliasing because the image ends up being quite clear, as I mentioned before, and all this running in 1.5 milliseconds on an RTX 2080 TI is now perfect DLSS 2.0. Okay, there are no defects, okay, there are quite a few defects. Don’t forget in the end that we’re making up information. That isn’t in the original image. I mean, there are details that artificial intelligence is making up. They’re literally things that it imagines, that can go there, and that’s why it keeps reconstructing things that cost it. 

For example, when there’s a sudden change in the image and during one or two frames it invents things. What is known as hallucinations, spots that don’t make sense, sometimes it also has ghosting, I mean, it leaves trails behind characters. In specific cases, there are some flaws and all these flaws appear. Especially if you start analyzing each of the frames to analyze the game’s future. But my personal opinion, after playing a few hours, especially the game I spent the most hours with DLSS, was Atomic Heart. 

And really when you’re playing you don’t realize the flaws. I mean, it’s really hard for them to jump out at you. Maybe in some situations, some artifact might catch your attention. But when you’re immersed in the game, I do think it’s better to have that extra resolution that DLSS gives you, even that anti-aliasing, which, at least in the case of Atomic Heart, looks even better than the temporal anti-aliasing. I mean, it has more definition, fewer jaggies. Is it worth it? In exchange for occasionally seeing some artifact that anyway, at least from my point of view, I don’t notice them with the naked eye, I mean, playing, I don’t realize this. The big drawback it has, I repeat, is that it needs specific hardware. There are other technologies like, for example, Fidelity Super Resolution, which is open-source and can work on any graphics card because it uses the normal and current graphics card cores, but it doesn’t use machine learning, it doesn’t use a neural network, but it’s an algorithm created by humans written by humans, and I have to say it’s impressive, it’s very good. 

But we’ll never really reach the quality that a neural network achieves, which itself detects and learns all those patterns, because we really aren’t capable of considering as many parameters and factors as a computer can. And that’s where our limit lies. 

Really, a well-equipped computer with a good algorithm can create much better software than we could ever create, and DLSS is proof of that. There’s another very important and interesting point, especially about this technology. Okay, and that is, nowadays. For example, we listen to music in MP3 format. MP3 music has several compression levels. One of the compressions it has is psychoacoustic, which basically removes certain information within the song that we, because of how our ears work, don’t notice or don’t perceive. Directly, our brain is capable of reconstructing that information even if it’s not there. It also lowers the quality in certain bands and certain parts of the song, which we don’t have much auditory sensitivity to begin with. So, it takes advantage of that. It cuts and does everything. Those who are more knowledgeable will appreciate a format like, for example, FLAC with very high bitrates. But for most people, MP3 is the standard, especially with wireless headphones, where there’s even more compression and services like Spotify. We’re directly listening to cloud music. This has been a revolution, literally in the world of music and in reality, there’s a great loss of data in all of this, right? I think the same could happen with images with all these super-resolution technologies. I mean, for example, we could take a photo or shoot a video of something. Send that photo or that video to the cloud to be stored on a server, but in a low resolution, like for example, 720p. And then when we want to see it, there’s an artificial intelligence, a service similar to DLSS that converts it back to 4K and shows it to us, and we might not even realize it. 

We wouldn’t be watching a video that we think we recorded in 4K, but in reality, that information has been lost, it’s been reconstructed from scratch by artificial intelligence and it seems real, but it’s not the real information and this could be done for many things. And it would solve many problems that we have at the information level. I mean, part of the information that we capture from the real world would be lost and we would consume it as an illusion generated by artificial intelligence. I don’t know, it seems very impressive to me, right? To think about this but the story doesn’t end here because in 2022 NVIDIA introduced DLSS 3.0, which is a system that’s not perfect, but it’s mind-blowing. Okay, it’s mind-blowing

Now, in addition to having image scaling and anti-aliasing, we also have frame generation. We’re talking about the neural network not only being used to increase the size of the frames, I mean, in the resolution part adding detail to the image, I mean in the graphics part, but we’re directly inventing entire frames out of thin air, so we’re also helping with the frames per second. To do this we need some more data. Okay, for example, we start from everything we had before, but in addition to that, we’re going to have to add another piece of data, which is the optical flow, which is, having several images of the game, calculating where the pixels are moving at that moment. 

Obviously, what this is going to do is help artificial intelligence generate.

 

In the ever-evolving landscape of gaming technology, NVIDIA’s DLSS stands as a beacon of innovation, reshaping the way we perceive and experience video games. Through the seamless integration of artificial intelligence and cutting-edge graphics hardware, DLSS has unlocked new realms of visual fidelity and performance optimization, elevating gaming experiences to unprecedented heights. As we navigate the thrilling terrain of gaming advancements, let us continue to embrace the transformative power of DLSS, forging ahead into a future where every pixel tells a story of unparalleled immersion and excitement. Welcome to a realm where technology and imagination converge, propelling us towards boundless horizons of gaming excellence. 

Leave a Reply