That has not been my experience; quite the contrary, but then I'm using speakers specifically designed for this kind of set-up which may make a difference.
The idea is NOT for the first contralateral reflection to be perceived as a secondary sound source; the idea is to delay its arrival and de-correlate it relative to the first-arrival sound, which will have arrived at the other ear. This way it contributes to spaciousness without the arguably negative side-effects of an early-arriving ipsilateral reflection.
Earl Geddes on the subject:
"The earlier and the greater in level the first room reflections are, the worse they are. This aspect of sound perception is controversial. Some believe that all reflections are good because they increase the listeners' feeling of space – they increase the spaciousness of the sound. While it is certainly true that all reflections add to spaciousness, the very early ones (< 10 ms.) do so at the sake of imaging and coloration. There is no contention that reflections > 20 ms are positive and perceived as early reverberation and acoustic spaciousness within the space. In small rooms, the first reflections from an arbitrary source, mainly omnidirectional, will never occur later than 10-20 ms (basically this is the definition of a small room), hence the first reflections in small rooms must be thought of as a serious problem that causes coloration and image blurring. These reflections must be considered in the [loudspeaker] design and should be also be considered in the room as well.
"Reflections become less of a problem as coloration and image shift at lower frequencies. Below about 500 Hz. early reflections are not as much of an issue. The ear has a longer integration time at lower frequencies and it has a poorer ability to localize resulting in a lower sensitivity to early reflections. Image localization is strongly weighted towards the higher frequencies.
"A reflected signal that arrives at the opposite ear from the direct sound is less perceptible as coloration and image shift than if both signals arrive at the same ear. This is because of head shadowing above about 500 Hz and the fact that our ears can process signals between them. When the two signals arrive at the same ear, the signals are physically merged in space even before they enter the ear and no amount of auditory processing can separate them. When these signals arrive at different ears, the auditory processing system can diminish the adverse effects of these early reflections through cognitive processing between the ears."
And a short video clip: