How accurate this is I can't say, but some online research into the time frame returned this:
"A reflection from a speaker is generally considered part of the first arrival (or "direct sound") when it arrives within roughly 20–40 milliseconds (ms) of the initial direct sound. This phenomenon is part of the Precedence Effect (or Haas Effect), where the brain fuses early reflections with the direct sound into a single perceptual event, increasing the perceived loudness and clarity rather than being heard as a distinct echo."
Was what you quoted generated by AI? I ask because it's misleading. What it says about the Haas Effect is correct, but the assertion that everything within the Haas Effect interval is "considered part of the first arrival (or "direct sound")" is incorrect, at least in a home audio setting.
The Haas Effect suppresses directional cues from reflections arriving within 20-40 milliseconds of the direct sound so that we can tell the direction a sound came from in a reverberant environment. After that 20-40 milliseconds (the timespan varies with the specifics), reflections can be heard as distinct echoes, assuming they are still loud enough.
The Haas Effect kicks in at about .68 milliseconds, which roughly corresponds to the path length around your head from one ear to the other. Before .68 milliseconds, the "window" is open for first-arrival sounds (no Haas-effect suppressions yet). For a first-arrival sound arriving from directly to your left, it will wrap around your head and reach your right ear .68 milliseconds after it reaches your left ear. The ear can tell the arrival direction from how long the interval is between arrival at the first ear and arrival at the second ear. That time interval will be less than .68 millisecons for any other arrival direction. Then the "window" closes after .68 milliseconds because after that reflections would be giving your brain contradictory directional cues. The Hass Effect wears off after about 20-40 milliseconds.
But psychoacoustically there is a great deal happening due to reflections that arrive between the initial arrival of the sound and the end of the Haas Effect 20-40 milliseconds later!
If reflections off the front face of the speaker (or its edges) occur, they almost always arrive within that initial .68 milliseconds when the "window" is open to receiving new first-arrival sounds. And the ear/brain system can mis-interpret those extremely early reflections! They can function as false azimuth cues: The ear/brain system computes the horizontal arrival angle (azumith) of a sound from the less-than-.68-milliseconds time gap between its arrival at one ear and then the other ear. Super-early baffle reflections arriving before .68 milliseconds can interfere with this and result in degraded sound image localization. So these very early, pre-Haas-effect reflections are disproportionately detrimental to spatial quality, and you'll often see manufacturers use round-overed edges or very narrow baffles for the sake of better imaging.
Reflections off of room surfaces all arrive later than .68 milliseconds so the Haas Effect suppresses - but does not necessarily eliminate - their influence on the perceived arrival direction. More specifically, a speaker's strong first reflection off the same-side-wall will tend to widen the soundstage in the direction of that reflection. There are arguably some trade-offs from having strong early same-side-wall reflections, but in general most listeners find the soundstage-widening effect to be worthwhile and enjoyable.
Pretty much all of the in-room reflections affect sound quality. If these reflections are spectrally similar to the first-arrival sound, their effect on sound quality is generally beneficial. If they are spectrally significantly different from the first-arrival sound, they may be detrimental to sound quality. How loud they are also matters, as does how quickly they decay.
The longer a sound lasts, the louder it seems to be, even if the measured SPL is unchanged. So to the extent that some frequencies take longer to decay in-room, those frequencies can be perceived as louder, even if they do not MEASURE as louder.
The in-room reflections also convey information about the acoustic space of the playback room, with earlier initial reflection arrival times obviously corresponding to shorter reflection paths and therefore a smaller playback room size. Speaker placement and orientation, listener location, and room acoustic treatment can be used to reduce the "small room signature" of the playback room, which in turn can make the "sense of space" on the recording itself more perceptually dominant.
Getting back on point, I would probably consider reflections off the front baffle (and/or diffraction) arriving within .68 milliseconds of the first wavefront to be effectively part of the "first arrival or direct sound", but everything after that is, imo, clearly NOT part of the "first arrival or direct sound", from the standpoint of playback in a home audio setting.
This is just scratching the surface; people far smarter than me have written chapters if not books on this topic.
And, I welcome correction and/or elaboration from any such people far smarter than me (or even just a little smarter than me) who read this.