is it the wall of sound produced by metal that makes it extra difficult to record and mix?
From someone who has actually mixed stuff, (not really professionally, but I've done it and worked at it quite a bit) yes somewhat.
When it comes to recording, it's not going to be a lot harder or easier than other genres. As long as you have the right space and enough mics.
However, metal is characterized by dense compositions, extremely high harmonic content (distorted guitars wall to wall), and high tempo.
Slow, sparse songs (e.g. Jazz standards) are easier to mix in one sense because you have fewer signals clashing and fighting for bandwidth. On the other hand, you have to take a light touch with those types of recordings, as the style is not suitable for heavy compression, EQ, or other effects. You have to get Jazz tracks to sound rich and full, but also relatively dry, which can be tricky. You have less wiggle room to fix subpar recordings compared to Metal.
With Metal you can lean into the tools more, (more compression, EQ, and other effects to hide flaws) but you have a lot more energy across frequency ranges to manage and force to co-exist. For example, you may have low toms, bass drums, and detuned bass guitar all rumbling along very fast at the same time. This is a hell of a challenge to mix compared to a jazz trio playing at 1/3 the tempo and letting each other solo.
Metal and tonally similar genres like alt-rock are considered (by some, who know what they're doing) good styles for auditioning speakers, because they have a lot of harmonic content, i.e. you are forcing the speaker to play many frequencies across the whole spectrum. The "wall of sound", if you will. This same quality makes tracks like that more challenging to mix than a sparse genre like jazz or minimal techno, for example.
and so is mixing (ignoring effects/processing). You can accurately capture/record a metal band just as easily as an orchestra. Or, you could... Modern music is more "produced" than "recorded"...
A lot of mixing isn't linear, (ignoring effects and processing in mixing is like ignoring salt and pepper in cooking) and you still need to make decisions on what gets heard and what doesn't. I think your point about Metal being more produced than recorded is probably correct. Nothing except the drums really make sense unamplified and unprocessed in that genre... even at a live show, and I think the drums get some processing as well in almost all cases. Meaning linearity is completely forgotten before the mixing even really starts.
Or, to put it another way, capturing the performance is not really where I think the action is, in a metal mix.
With Jazz or Classical, capturing the performance in free air can be 80% of the mixing job, and you can't afford to mess it up. With Metal, half of the band can be synthetic and the other half can have 5 FX boxes before the recording, and nobody would bat an eye.