Essentially all I needed was a way to upload a data set, run tests against that data set and spit out a percentage of pass fail.
Braintrust makes this pretty easy, but If I was to do it again I would vibecode the same functionality.
Essentially all I needed was a way to upload a data set, run tests against that data set and spit out a percentage of pass fail.
Braintrust makes this pretty easy, but If I was to do it again I would vibecode the same functionality.