I wouldn't be at all surprised if the process is not bit perfect; the question is: can you hear a difference on your own? Can you hear the difference if it is pointed out ahead of time (through waveform comparison)?
I would be absolutely surprised if it weren't bit perfect.
We're dealing with 7 bits of MIDI data controlling, typically at most, 48,000 samples per second. These numbers are chump change for modern computers. The bit depths can add some more complexity to that in terms of dynamic range (I'm usually working at 32 bit float), but that doesn't apply as much to the situation here.
Yes, real-time performance is a tough task (as is always stated, "the audio thread waits for nothing"), but OP here is talking about recording MIDI data, which -- once recorded -- acts as a static input controlling either a synthesizer or sampler. So let's break that down.
A digital synthesizer, unless designed with some amount of randomness (typically for "analog-like" behavior purposes), by definition is the same every time. The MIDI data in this case is going to be something like CC7, controlling output volume; CC1, controlling some pre-defined parameter (i.e. opening/closing a filter); etc. A solid representation of OP's example in this case would be "CC7 controlling output volume of a synth over a 4 second period, linearly from totally silent upwards to 0 dB." I fail to see how that could possibly change from one playback to the next, unless, again, some amount of randomness-with-same-MIDI-data is a feature of the synth's programming.
A sampler is just, at its most basic form, playing back audio. Audio is a static file; MIDI is controlling which audio is playing back. Round Robin sampling, where, say, C4 is played N number of times and X number of similar samples are called randomly so as to avoid the "machine gun effect," where literally the same sample is called, could account for "it not sounding the same," but like the "analog-like" programming of the synth above, that's on purpose, not a flaw.
I routinely deal with situations where phase cancellation null tests would reveal the kind of behavior that OP is talking about, and I simply have never come across them. And that's not even going into sensitivity in listening, which while subjective and impossible to prove, is something I put a lot of faith in.
Sorry, unless OP can point me to a solid source laying out a further explanation, I call horse shit.