At Unite 2024, Unity’s development team introduced a series of advanced GPU optimization techniques aimed at improving rendering performance across various platforms. Whether you are developing for high-end PCs, consoles, or mobile devices, these methods are critical for delivering visually rich, smooth-running applications.
One of the fundamental challenges in real-time rendering is reducing GPU latency to improve frame rate. GPU latency occurs when the processor must wait for the GPU to finish rendering previous frames before it can start processing new ones. This bottleneck can drastically affect performance, especially in visually demanding applications.
Framerate Optimization
To address this, Unity offers several profiling and debugging tools. The Visual Effect Graph profiling tool helps developers track GPU overhead related to particle systems, allowing them to optimize effects more effectively. Another essential tool, the Shader Graph heatmap, provides a visual estimate of the cost associated with different Shader Graph nodes. Developers can leverage this feedback to make informed decisions when optimizing shader programs.
In addition, RenderDoc, which is integrated into Unity’s editor, allows for frame capture and detailed analysis of how frames are composed, down to native graphics commands and GPU resources. Combined with hardware GPU profilers (such as NVIDIA’s Nsight), developers can gain deeper insights into GPU timings and identify bottlenecks more accurately.
Overdraw optimization
In cases where the GPU is pixel-bound, a common cause is overdraw, where pixels are shaded multiple times in a frame. Overdraw can lead to excessive GPU workload, particularly in scenes with transparent materials, which don’t typically write to the depth buffer. This increases the likelihood of multiple passes being applied to the same pixel, taxing the GPU even more.
Unity’s Rendering Debugger offers an overdraw visualization tool, enabling developers to identify which parts of their scene suffer from high overdraw. To reduce overdraw, developers can use techniques like enabling depth pre-pass or depth priming, which pre-renders the scene to a depth buffer. This process helps to avoid redundant pixel shading by using early depth testing to discard hidden pixels before they are shaded.
Another important consideration is transparency. Because transparent materials often contribute significantly to overdraw, minimizing their use—or optimizing how they are rendered—can yield substantial performance improvements.
Managing memory bandwidth
On mobile devices, GPU performance is often limited by memory bandwidth, which impacts both energy consumption and thermal performance. External bandwidth usage—such as reading and writing to the frame buffer—drains battery life and generates heat, causing mobile devices to throttle the GPU. This can lead to performance drops, especially on devices without active cooling systems.
Unity introduced several solutions to address this issue. For instance, the Render Graph viewer tool allows developers to inspect frame resources and monitor read and write operations across render passes. By merging compatible render passes into native render passes (on platforms like Vulkan, Metal, and DX12), Unity reduces the amount of data transferred between the GPU and system memory, significantly cutting down on memory bandwidth usage.
An example highlighted during the talk showed that using On-Tile Deferred Rendering for mobile platforms led to a 30% reduction in memory writes in test environments, demonstrating a meaningful improvement in performance.
Enhancing GPU Performance
Post-processing effects are notorious for consuming significant GPU resources, especially when applied at full resolution. Unity’s new Spatial-Temporal Post-Processing (STP) provides a solution by enabling applications to render intermediate frame buffers at lower resolutions, which reduces GPU overhead while maintaining near-native visual fidelity.
STP is a software-based upscaling method that works across platforms, including Universal Render Pipeline (URP) and High Definition Render Pipeline (HDRP), and delivers high-quality results on compute-enabled devices. The technique is particularly effective on mobile platforms, where GPU performance is often constrained by hardware limitations. In Unity’s internal testing, STP significantly reduced the time required for computationally expensive effects like Screen Space Ambient Occlusion (SSAO).
In combination with Hardware Dynamic Resolution, which adjusts rendering resolution based on the GPU workload, developers can maintain consistent frame rates while reducing the amount of work the GPU must perform. This technique has been successfully tested on high-end mobile devices like the Samsung Galaxy S22, where dynamic resolution scaling helped maintain performance across varying thermal states.
Geometry Processing
Geometry processing, specifically vertex shading, is another critical area where performance can suffer. Unlike pixel shading, which scales with screen resolution, vertex processing is independent of resolution. This means that increasing scene complexity (for example, by adding more vertices) can overwhelm the GPU, leading to performance bottlenecks.
Unity’s Frustum Culling and Depth Rejection techniques help reduce the amount of geometry processed by discarding objects outside the camera’s view or hidden by others. However, these processes occur after the vertex shading stage, meaning the GPU still has to process all vertices, even for objects that won’t appear on the screen.
To further optimize geometry processing, Unity introduced Indirect Draw Calls, which allow the GPU to handle draw parameters directly without CPU intervention. GPU Occlusion Culling adds another layer of optimization by skipping objects that are occluded by other elements in the scene, thus reducing unnecessary vertex processing.
Ray Tracing Optimizations
Ray tracing, while providing stunning visual realism, is computationally expensive and can quickly consume a significant amount of GPU resources. One major source of overhead in ray tracing is the process of building the Acceleration Structure (ARTAS), which is required to traverse the scene’s geometry and perform intersection tests. Each additional object added to the acceleration structure increases the CPU overhead, as seen in some game projects where ARTAS processing can take nearly 10 milliseconds per frame.
Unity introduced Solid Angle Culling as an optimization to mitigate this overhead. By calculating the angle between an object and the camera, this technique discards distant or small objects from the acceleration structure, significantly reducing the number of instances processed. In some cases, Solid Angle Culling reduced ARTAS processing time by 60%.
To optimize memory usage, Unity also introduced BLAS Compaction and a GPU Memory Allocator for smaller structures. These techniques reduce the memory footprint of static meshes and improve memory efficiency for dynamic scenes. For example, in tests with a large scene containing 6.7 million triangles, Unity’s memory optimizations reduced memory usage from 450 MB to just 100 MB.