is there a good explanation of why the modern Unreal Engine does things the way it does?
I can explain Nanite and it's reason for existing and what it solves and what he doesn't actually understand about it.
Do you remember the Megatexture that Id software was pushing way back when. The Megatexture is something called virtualized texturing, the idea is that you can have absolutely massive high detail textures for everything, but you can't send all of that to the GPU without running out of memory fast, so they automatically only send in the parts of the giant texture that are needed for the current frame to the GPU. The texture exists and the GPU can just load in the only part it needs when needed so VRAM is not bloated, this got later adapted into most major engines as catch all texture streaming (though that can also apply to a different type of streaming where all the highest mipmap levels of all textures are loaded, and then visible textures only have their lower mips streamed in to the GPU as needed), Megatexturing/Atlasing also gives the benefit of not needing to use up texture sample slots since everything is all on one giant texture and you just index the coordinates needed, this (atlasing) has largely been superceded by Bindless textures which is a whole other can of worms that I'm not equipped to explain. This is why late 7th gen and 8th gen games had texture pop in problems btw.
Nanite is this concept applied to geometry, only send to the GPU the level of detail that you currently need at the moment, and stream it in and out like mipmaps for 3d models, where each mip level is a lower quality version of the mesh. This means any geometry can be as complex as you want since only the level that is necessary for your current camera is loaded, instead of the entire 200 million tri model. Of course this wouldn't work with just geometry streaming, since the way GPUs render means they dislike tiny triangles because of overshading, a tangent I will now go on.
GPUs run programs called shaders, shaders run parallel and is basically just math that colors pixels. GPUs batch these programs into 2x2 pixel grids, since each pixel will need surrounding pixel information for things such as mipmapping or pixel derivatives, but the issue with doing this is that you can have cases where the GPU runs for most of the pixels in the quad, but only one of the pixels actually draws, this happens a lot with very small or very thin geometry. MSAA (Multi Sample Anti-Aliasing) applies this same principle and will decrease your performance the more thin or tiny geometry there is, since it runs the shader for geometry edges multiple times per pixel for each coverage level (2x, 4x, 8x). This is fine if the math to draw your pixels is not super complicated, but modern PBR rendering is a
LOT of math compared to non-PBR and this overshading (not overdraw, different thing) causes a lot of math to run more than it needs to.
Overdraw is when geometry is drawn over, meaning the time spent drawing the pixels that get drawn over was wasted, which can get really bad when pixels start costing a lot with modern rendering. Games used to be drawn (and some still are) with Forward rendering, which means as the geometry is drawn, it is also shaded and lit at the same time, because the geometry drawn is the completed geometry, you are able to do hardware multisampling (MSAA), the problem with traditional forward rendering was that to render lights you would have to iterate a list of all lights for every surface being drawn in a loop, so you could easily get exponential shader instructions with each light added to the scene, and being forward rendered means a lot of overdraw would occur rendering back to front, overdraw that costs because of the cost of lights. The solution to this (though it was only really an issue on console hardware at the time) was deferred rendering, where instead of rendering the final shaded surface when you render the geometry, instead you render a series of buffers, depth, normal, diffuse, specular (or roughness and metal etc for PBR, I won't go into the full detail of bitmasking for shader types and stuff), and then you render the lights using those textures that make up the scene, only for where lighting affects the surfaces. Because the textures are just 2d this removes a lot of overdraw and lets you break light limits, but because you are sending multiple textures around you can't use MSAA anymore, since to do it you would need to multisample every buffer and work without rasterized pixels until the end, which is excessively expensive.
Deferred turned into clustered Deferred or Forward (or Forward+, where a couple thin buffers are also stored to allow some effects), where lights are now calculated using a froxel (frustum voxel) grid from the camera, and only lights within the affected grid area are shaded, essentially removing the light limitation issues with traditional forward rendering and optimizing deferred lights further, this is what essentially all modern games use now. Examples of modern games that use clustered forward rendering are Doom 2016, Doom Eternal, Detroit Become Human, and all Source 2 games except Dota 2 and Deadlock. Most games use deferred still, or the more recent, and getting back to the Nanite explanation, Visibility buffer rendering.
Visibility buffer rendering is the hot new thing and I'm not fully equipped to explain it properly, so check out
this blog post for a breakdown with visuals. The gist is that a visibility buffer is created which only stores visible pixels, and then the deferred buffers are created using that, meaning there is zero overdraw and overshading is reduced heavily. Nanite combines this with a custom software rasterizer for when triangles get small enough, because of the aformentioned overshading, and do some 2d imposter stuff. It's all very complicated but essentially it means Nanite: Reduces overdraw as much as possible, renders geometry very fast, streams geometry in and out of VRAM like mipmaps allowing for extremely high detail meshes without the cost, automatically reduces meshes down with clustered LODs. Of course this is not a magic bullet and Nanite incurs a heavy overhead cost for turning it on, but if your scene is complex enough to perform better with nanite on, then any further addition
does not increase the cost. It's also VRAM heavy since its sending a lot of geometry in and out of VRAM, combined with all the other Unreal 5 features all being virtualized, meaning on the GPU, so Lumen, Virtual Shadow Maps, Nanite, it all hammers the VRAM a lot. If you are VRAM starved, open an Unreal 5 game with Nanite, stop moving and look at the framerate, then move around and see how the framerate drops quite a bit, until you stop moving again. This is because Nanite is moving a bunch of data in and out of VRAM as you move around.
Epic's solution to their very heavy shaders, the cost of Nanite, and especially Lumen, is mandatory upscaling, since rendering fewer pixels means you can run faster. They also use temporal upscaling as a denoiser for Lumen and Virtual Shadow Maps (which use hybrid software/hardware raytracing when enabled). Unreal is doing a lot, which is why it runs like a chunky turd, but for the majority of games its complete overkill.
I probably explained things poorly or incorrectly, so if anyone wants to correct me go ahead. The Unreal docs explain a lot about Nanite and the other systems, and Epic has a lot of GDC talks on the subjects as well.