I believe you are actualy "right", the article states "we were given ..." but WHY didn't they ASK for single train activities? These data is normally available in any railway, as it is used for several things, including programmed maintenance.
In other words the Authors did IMHO a very interesting analysis of the data they were given, but - had they actually asked for the "right" kind of data - it would have been much simpler and faster.
If you prefer they approached the problem from the viewpoint of data analysts (which is what they are, and evidently very good at it) but not from the viewpoint of an accident investigator.
It was done in a single day, OK, but the train usage data is not "extra data" it is something that is "ordinary data" that you have typically for two to four weeks in advance (as a planning) and daily or at the most weekly afterwards, let alone - in November - the actual usage data for August-October.
Actually I know nothing about the specific Singapore train system, but the basics are "trans-national" and "well established", a metro is an extremely complex structure to manage, but the principles are well established on the experiences of early railways.
You have to manage a time table, the personnel and the maintenance of both the rails (and station and power lines, etc. i.e. the "static" parts of the infrastructure) and of the "moving" parts (the trains).
In order to do so you have "allocation" tables, usually made in advance two or four weeks at least, stating "who" (personnel) and "what" (train) are "where".
These "programs" are monitored and changes to it (think of people not showing up at work, a train having a malfunction, etc.) modified to allow for these impredictable events.
So at the end of the day (each day) you had a printed piece of paper with modifications - if any - scribbled on it.
Nowadays the same thing is done on computers, possibly in a much more detailed way, but the basic data "which train was in service on which line at what time" has been available on paper since day one (or two) of any railway in the world.
While I can (barely) understand how these basic data for the 5th of November was not available on the 5th night, there is no real reason why data since August and until - say - the 4th of november was not available.
As said, the data analysis carried on the data provided by SMRT has been carried on in a clever way, as finding the culprit from that set of data would have not been at all easy through other simpler analysis methods, but somehow SMRT (and LTA) provided the "wrong" set of data.
I don't think they particularly have minute by minute timetables on the MRT, just first and last train times and number of trains operating per line during peak/off-peak periods. On the circle line the trains are driverless and operate with a moving block system (Comms-based Train Control), so the headway between trains is fairly constant, but the absolute timings are dependant on the train ahead travelling at the ideal speed, not stopping for too long etc. (e.g. if someone holds the doors to jump onto the train last minute you'll be delayed as the doors re-open and attempt to close again). So I think you would have to extract logs from the control system or the trains themselves to establish exactly where they were at any particular time. And bear in mind that this information is probably rarely used (at least for this purpose) and may not be in a format suitable for the kind of analysis they needed to do. Funnily enough it was the use (and failure of) of the CBTC system that caused this problem.
Also remember that until they begun the analysis it wasn't clear that a single train was the cause. I imagine they were investigating trackside issues as well as issues with those trains that stopped, rather than an independent train that wasn't experiencing problems.
TLDR; It's not a case of looking at pre-defined timetables.
Sure, but minute-by-minute is not needed.
If you look at the last graph on the page, the one after "The pattern was especially clear on certain days, like September 1. You can easily see that interference incidents happened during or around the time belts when PV46 was in service." it is the most simple data, drawing points (the incidents) over a service timetable.
These data (when the single train was in service or not) must have been available.
I believe that the tricky part was the "delay" between the PV46 passing and (some time later) the "wrong braking" of another train.