It's tempting to be a tool vendor, but unfortunately there are many established tool people. You don't even know the names, most likely, of all the little neural network toolkit companies that flopped in the early-mid 90s.
It's better to use a new technology to create a complete solution for people.
The problem with building a complete solution is that it usually requires a lot of investment and domain specific knowledge.
Having a better algorithm offsets a lot of it but I am not sure I could write a better CAD (computer assisted diagnosis) software on my own in a reasonable time frame that will outperform the market leaders even if using their data I can improve their results by 50%.
There is just too much science involved in feature selection etc...
Subject matter expertise can be hired, as employees, or as advisors to your company. For this type of start up, you not only need a team to build out the product end users will ultimately use (an expensive endeavor in and of itself), but you also need to gain credibility from experts and get published before doctors will look at your stuff.
I was hoping to avoid all the hassle by sticking to my core competency of artificial intelligence doing a co-venture with some established players in the fields and help them improve their results.
I know that the Ralph Waldo Emerson quote "Build a better mousetrap and the world will beat a path to your door" is not true but I was hoping that at least the world will meet me half way...
I see. How about creating a public API (like someone else suggested) so we can see what it does, and then have a couple guys shop it around for you as you get validation studies going?
Then you give people enough to get enticed, but don't give everything away. You can then pursue multiple applications with a parallel effort.
Most AI application aren't realtime (although some of them in the chemical industry for example are ).
A doctor doesn't care if his cancer diagnosis takes another 20 milliseconds so long as its a bit more accurate. And a bank trying to analyze if a credit card transaction is fraud or not doesn't care about the short delays either.
fincancial fraud detection bottlenecks are typically between ram and the processor. Thousands of snychronously incoming transactions have to be examined simultanously, because they correlate heavily.
It is not like "here is one transaction, is it a fraud?" but "here are 2^20 transactions, what are the frauds?".
You could do this by pipelining, but I guess Banks want a zero downtime system and I personally would not trust an API in terms of reliability.
Another point is, that banks will not give you the original data. They will have to "pseudonymize" several entries, such as credit card numbers, names, ...
This would force them to preprocess the data which gives every transaction a very little + O(n) and which might decrease the speed even more.
(I'm not saying it's technically impossible, but I'd say there are better ways, such as releasing it closed source or just using it to predict financial data - which as we all know is possible and being done by hedge fonds, so this should be the best way IF you have that algorithm ;)
I've taken classes from people who worked on fraud detection for banks (Fair Isaac) and they were working on legacy hardware (shitty old mainframes) with retardedly limited floating point precision.
Performance is of the essence in these situations; any clever trick you can think of to speed things up should be used in such a situation (but keep it fairly simple; lookup tables and so forth, for example).
Applying the algorithm to financial markets would be highly latency-sensitive, though.
I'm not sure I buy the web service idea, either: wouldn't typical applications require a lot of input data (training set + test set) in order to be effective? Uploading all that data could be annoying, compared with just running the algorithm locally at the customer's site.
Training always takes a ton of time, even more so with my algorithm which is a bit more complex. But training is usually only done once and afterwards results can be generated very quickly.
I wasn't talking about training time -- I was talking about data set size. Frequently uploading a few GB of data over the public Net to do effective training is going to be an annoyance. You may also need to perform the training multiple times, especially if your algorithm takes any parameters.
It does take a few parameters but there is no reason why the same dataset would have to be uploaded every time you tweak a parameter, I can just store it and let you play with it until you are satisfied with the results.
And again training isn't done that frequently usually.