One of the most notable announcements during the RTX 30 Series launch was the technology involved in helping the new Ampere cards achieve an x2 performance advantage over Turing and in a series of articles, we’ll take a look at these technologies. In this article, we’ll cover RTX IO.
In perhaps the most come-from-behind feature of the RTX 30 series cards, RTX IO is aimed to alleviate CPU bottlenecks by allowing the GPU to make IO requests directly to the SSD, helping reduce load times for both textures and generally any game asset. But first, a bit of background. Solid-state drives have revolutionized storage be providing super-fast , responsive access to data. As storage size grows, games themselves have evolved to provide higher resolution graphics which are stored as gigantic resource files. Games today have reached more than 200GB of install size and are still rising. SSDs on the other hand are reaching speeds up to 7GB/s for Gen4 NVMe SSDs.
The truth is that with faster SSDs, they don’t do the work themselves. With an uncompressed 7GB/s transfer saturating a Gen4 SSD, this will utilize 2 CPU cores entirely according to NVIDIA’s internal testing.
This is due to the fact that storage IO is done through the CPU and the PC architecture relies heavily for the CPU to manage and process IO request to and from the storage devices. This load is spread amongst CPU cores on modern many-core CPUs but during gaming usage wherein a game accesses its resource files that are crammed into gigantic files where they are streamed by the CPU, they can utilize over double the bandwidth of PCIe Gen4 due to the compression. In this scenario, the CPU will receive so much IO requests to fetch data and pass it along and decompress the asset. The slide above shows a 24-core/48-thread CPU being saturated
The chart above shows NVIDIA’s test which shows load times done on a 24-core Threadripper with traditional hard drive vs. RTX IO.
With that in mind, Microsoft actually developed the DirectStorage API to resolve this challenge. It was an obvious direction for the industry as a whole and it can be recalled that the Playstation also detailed a similar solution for their PS5 console in the tech seminar earlier in the year. Going back the DirectStorage, its an API that allows a GPU to access compressed data directly a storage device, unpacking, decompressing and utilizing the data all via the GPU. NVIDIA developed RTX IO in conjunction with DirectStorage, wrapping it for optimized usage with NVIDIA’s GPUs for gaming scenarios allowing their GPUs to have lossless data decompression which improves IO performance almost twofold and allows the massive amounts of CUDA cores to handle IO offloading from dozens of CPU cores allowing them to push decompression way beyond what compressed data loads that PCIe Gen4 can provide.
RTX IO will require developers to utilize the feature thru SDK along with NVIDIA’s other APIs but RTX IO will be dependent as well to Microsoft DirectStorage which will ship next year so any games being developed right now with NVIDIA SDK will be able to utilize RTX IOs GPU-accelerated storage access.
When asked about the performance hit of RTX IO on the GPU itself, an NVIDIA representative responded that RTX IO utilizes only a tiny fraction of the GPU, “probably not measurable”. Developers will have full freedom how they utilize RTX IO especially for games that are GPU-intensive, the developer understands the needs best and will have the best knowledge in which method to do.