Are you a hardware/rf/analog guy? Because that't where I think the issues are (if any), on the digital (interpreted) side, let alone at protocol level, everthing is settled: 0's are 0's and 1's are 1's, and it's either correct or not, no twilight zone.
To my knowledge there are zero investigations for example for the following kind of stuff (just proposed by a member in german forum) : what if we replace the main clock oscillator of a good (or even "audiophile") switch with the worst one we can imagine wrt stability and jitter, just good enough that the thing keeps on working with no or little (recoverable) digital errors? Does it have any impact on audibility when the signal is feed to, say, a given streamer+DAC unit? The expectation is: no (for all the well-known reasons like mutiple levels of buffering etc etc), but do we know fur sure?
Sadly, I can't as I have zero expertise in this field. From what I've read, though, it seems people are getting sound changes depending how many other units are connected to a switch, what their power-supply specifics are, etc. Of course, no blind testing and all, and there is a big chance that if the perceptions are real, they could be induced by secondary effects... notably EMI is IHMO the most likely candidate for all sorts of otherwise "unexplained" audio phenomena, both with direct connection/cabling and indirect paths via mains-grid. We have people reporting their sound changes if they replace the power supply of their WLAN router, a use case that certainly can only have a indirect mechanism via mains grid disturbance.
I would bet a substantial portion of my life's earnings that anyone claiming to hear a difference by changing the PSU of their WLAN router is placebo entirely. Think of it in the same terms as power cables that cost hundreds, or even thousands. How does a passive cable improve anything when the wiring in the wall costs afew dollars, maybe even pennies, per foot?
All audio delivery protocols that I've seen are predicated on TCP. Spotify, Tidal, as well as NAS protocols like SMB or (rarely) NFS - all TCP. That matters because TCP has built in error correction - when a packet arrives, it's checksum is calculated and compared to the checksum that was embedded in the packet upon transmission. If it is NOT a match, TCP automatically requests retransmission and waits for it to arrive so the data stream can be reassembled then presented in order.
So, in your example of mains grid disturbance, we would experience this in potentially a few different ways. The 2 most likely out comes would be Ethernet engotiating at a slower speed (wired 1Gb may negotiate to 100Mb half duplex, a problem I see often w faulty hardware) or it would negotiate at full speed but TCP retransmits would be off the chart. That would look like high % of failed checksum compared to a functional connection. "High" is relative, but In a healthy network they should be well below 1%.
Now, if we put all this together even some rare streaming protocol that is NOT based on TCP, rather UDP which is the most common alternative, we would experience this as stuttering audio. Think of a poor quality VoIP call - there are very tiny gaps in the conversation because UDP doesn't re-request those lost packets. What would be the point? It would actually make LESS sense if a lost packet was re-requested during a live interaction. The sentence would be like:
"I'm on my way to my girl....... House.
..................
Friend's."
You can see how that would quickly become a problem in music. The sample/packet size are much smaller of course, but hopefully my example illustrates the point.
So, in short, IETF standards were designed with your use case in mind - engineers knew that as data networks scaled, transmission signals would be subject to alllllllll kinds of abuse. Poor power, poor application design, misconfiguration by entry level techs, etc. So they built error correction into the data stream wherever they could (TCP).
Thus, we can be highly certain that changing a clock will not produce distortion in the way we tend to think of it here on ASR. Impossible? I suppose not, but only to the extent that walking through a wall is impossible: quantum theory suggests that if you try walking through a wall over and over, sometime in the next few trillion years it will actually work.
The exception being a modification that takes the network switch OUTside IETF standards, in which case I guess anything can happen although I would bet on degraded or outright data stream failure over an increase in THD, IMD, etc. Jitter, sure, but again not necessarily in an audible manner (more like sporatic buffering).
Hopefully that's helpful? I didn't really address your RF point, I suppose. That is actually a known issue, and the reason why shielded cabling was invented. However, again that scenario would be perceived as slow throughput equating to buffering. It's a little more complex than that, but not much and certainly wouldn't affect sound in the analog domain in any way once the buffer completes.
Edit: I completely forgot, UDP also leverages checksums but because the whole point of UDP is generally lower latency/higher throughput, bad packets are typically discarded unless the application itself re-request them.