Totally agree. As someone who does scientific modeling, fitting models is easy. Finding the model that correctly describes my problem and can be trained with existing or easily acquired input data is hard.
In many cases this is true but it is important to note that for deep learning, the training was the hard part that was missing for a long time. Better weight initialization and momentum methods were what really made deep networks work (not just GPUs, as some people tend to believe).
GPUs were necessary but not sufficient, true. But the real change (at least for supervised learning) was not in the polishing on training methods (an incremental improvement), it was the availability of big enough data sets. No amount of modeling can make up for insufficient data.