Still digesting this, but it doesn't pass the smell test. There are too many assumptions that are just plain... wrong.
For example:
There seems to be an implicit assumption that all of the REST endpoints are just maps to CRUD operations. Perhaps for the most barebones, basic API this is true, but service APIs are more than just CRUD ops. They're abstracted interfaces to a complex series of tasks (for the sake of argument, we'll call them 'orchestrations' ). Just because you're doing a "POST" to /user/foo doesn't mean that the service is only doing a simple insert. It might be looking up similar users, sending out notifications, validating data, de-duping, etc.
The more I think about this the less sense it makes. Perhaps I'm missing something.
Couchdb lets you listen to changes[1], if you want to do anything in addition to the basic CRUD you can do it there. This does limit you to performing asynchronous operations but most of the time that's good enough. You do still end up with a scaling issue but because the users aren't hitting that part of your system directly it's easier to spread the load over time.
It's clear the author doesn't understand REST - he thinks that it is a dumb wrapper over a relational database. It's not surprising that straw man burns so well.
REST is a set of constraints for building a system. A RESTful interface is one that cleanly abstracts the underlying data store into one that is simple to use for third party systems.
It's important to understand that a RESTful system doesn't necessarily expose a true model of the underlying data store. You could have an invoicing system that contains Customer and Invoice models. I see developers confusing that to mean "you have /customer/ and /invoice/ endpoints and never the twain shall meet". If you need to embed customer details into invoices and/or invoice details into customers, then feel free to do that if it makes sense for your application.
I'd recommend reading it, as it helps clarify a few things. Also read up on the term HATEOAS, which details how information should be interlinked where possible, so your system can be (theoretically) crawled for more information.
I think the best definition is in Fielding's thesis where he lists the 4 REST interface constraints: 1) Identification of resources 2) Manipulations through representations 3) Self-descriptive messages and 4) HATEOAS.
1 & 2 mean that you should name a resource something for identification only and that name should be used no matter what you are doing to it. In HTTP you use a combination of verbs and request body data to tell the server what to do with a resource. HTTP implements #3 though status codes including verbose descriptions in the response body. HATEOAS (Hypermedia as the engine of application state) means that 100% of functionality must be accessible through a browser.
For the sake of brevity and callback, I'll say it's a SMART wrapper over a backend service or services (commonly at this point, a relational database).
His post does focus mostly on CRUD. Personally I usually proxy CouchDB to a path in my REST API for handling CRUD and then proxy in the rest of my services at other paths. In the case of doing a write and then some other operation, the CouchDB folks usually write a process that listens to the changes API which then triggers the desired other operations.
The main benefit of having an actual API layer is decoupling - between your front end clients and you back end service(s). This article seems to forget this major point.
Decoupling has loads of benefits - for example, it gives separate development cycles, and in this case provides a point of scalability too - not all REST dbs cache brilliantly or aggressively enough for all use cases.
The point about using delayed syncing between client and server is separate. I can see how it can help in certain use cases (Google docs doesn't auto-save every single letter you type in for example I suspect, but probably bulks them up into a save every x seconds) but at other times this kind of strategy sounds like it will add unnecessary complication.
The syncing adds unnecessary complication for desktop apps. For apps that needs to be accessible via mobile devices, it means the difference between whether or not your app is usable at all for a lot people. About 30%-50% of my commute is "dead zones" where I'm lucky to get GPRS speeds, and often have no signal at all... This is in one of the busiest commuting corridors in London....
Getting syncing right is tricky, though, because you do need to think through your possible conflict resolution issues in case the user moves from device to device, potentially while one or both devices lack a connection.
Explicitly linking database operations to API calls is idiotic for multiple reasons. First it creates tight coupling, second if you want to do multiple operations at once you have to make multiple calls to the API. That doesn't scale at all.
Everyone should be writing their REST back end and the API should be as specialized as possible. One HTTP call should give you back everything you need for a page/view w/e.
That mega-JOIN that brings back the entire client context is more awesome for db throughput, than plethoras of single-object calls. THIS IS WHY you build an API, rather than exposing CRUD. CRUD is not an API.
I agree the article is missing the abstraction all public apis need to offer non changing services.
Behind that public front everything can change and routes can be added, if things change majorly you version it. But never is it a good idea to map the db operations directly to the rest urls, so much more has to happen in production.
The public api layer is sometimes mapped more manual as auto generation rest apis rarely aren't leaky abstractions. ORMs, data layers, authentication etc behind the scenes may be generated or flexible to change.
Public apis might even have public objects that aren't objects from the database in full or at all i.e. meta/tracking fields, private info.
Public apis offer multiple levels of service: direct data/json, mvvm views, tools, syncing, per-view state (multiple objects), authentication etc. The term REST has colluded it a bit but ultimately REST is representational state transfer, it doesn't have to be web to db without an application layer.
An API is a public definition that should be as concrete as possible for the service use cases and public apis provide a flexible abstraction to expose a service to other consumers. So I will keep writing my public apis, I'll generate boilerplate behind that but I won't be locked to any implementation other than a uniquely designed api for my service, features needed and use case.
> it doesn't have to be web to db without an application layer
I don't even think it's possible to do that if you have a bit more complicated application. Because every API that does more than the very basic CRUD needs something else besides interfacing with the database.
Yes the only time I could see that being useful might be in a private api used by an admin application or similar. But for a public api, prepare for some pain.
This article is really making a lot of broad assumptions about what people are making APIs for and how people write code.
"Now you hit an API that has to do some async behaviour and now your screwed. There are some solutions for that but they make the code really complex, even Node.js."
I don't know what he meant, but we have a situation where our REST API needs to reply with “I see you want to get X. Come back and ask for X again when it's ready.“. In our backend, we just add the task in message queue.
Actually it doesn't. The "hard" problem here isn't telling the client to come back later, it is providing the client with a way to get the actual results later. So providing a request identifier that gets populated in a database when the task is complete may be one way to do that.
The protocol I wrote just sent a job-ID to the client and the client started polling for the result.
Since AJAX is already asynchronous I could write a JavaScript library for the browser, which made the whole thing transparent for the client.
The only problem I ran into were multiple long running jobs, which caused parallel polling. But since the communication between server and client was hidden behind the client-lib, I could change the protocol later, to merge multiple polling requests in one etc.
> A lot of the Internet is going over unreliable wireless
> technologies. So your beautiful REST calls are now riddled
> with exception handling, because there are so many ways of
> things going wrong.
Let me get this straight, you want do to representational state transfer (REST[1]), that is, keep all long-lived state on the server, and all (single) state changes confined to a single request - and you're worried about your clients intermittent internet connection?
You might have a problem with your architecture, but your architecture doesn't appear to be a REST architecture.
There is a core of truth to this article in that we're all reinventing the wheel trying to sync a client-side model with a server-side model (assuming your (web) app has client-side state at all, of course).
Indeed, CouchDB/PouchDB already solved this problem, and redoing it can be considered waste. I think the author cut a bit too many corners for the argument to stick, but in principle it's entirely true.
The "sad" thing is that if I'd use Couch for this, I'd get more than what I asked for: I also get a database, document-oriented, with a big map-reduce component, difficult to add custom queries, difficult to extract management information from, limited admin tooling. It might fit certain problems well, but it also fits many problems badly.
I'd love for a Couch/Pouch kind of tool for replication/sync that somehow allows me to plug my own datastore on the backend and my own model structure on the frontend. I'm not sure what that would look like - there's a good reason data storage and sync are so tied together in Couch. But still, "you shouldn't design your own API and the only way not to is to use this particular database here" just doesn't feel like we're done with this debate.
I've got a low traffic project I'm working on at the moment that uses pouchdb as an offline data store for a tablet app. I'm currently playing with implementing the couchdb protocol in rails. It seemed like a crazy idea to start with but with a little redis sprinkled in to handle information that wouldn't normally in active record models it seems to coming together better than expected.
Aren't you severely limited in storage with pouch? I think best case with browsers is like 50Mb with the average being about 10Mb. I can't envision a situation where you could ever index pouch.
This is a key point. HTTP states that URLs are opaque. Understanding that is important. Trying to leverage URL conventions is just asking for trouble. SEO complicates this a bit for crawled sites, but for web services it is a very different situation.
Beautiful URLs are convenient though. Not every API call is going to follow a succession of links from the main entry point. Do you really design opaque URLs on purpose? Does it work well for your clients? This is a genuine question, I'm quite seduced by the hypermedia story, but still wondering how it works in practice. It seems that for the most part, we don't really know yet.
I think REST and "nice" URLs are orthogonal - HATEOAS definitely doesn't require you to be assembling URLs but human readable URLs make life a bit easier as a developer so why not have them?
I think "beautiful" is the wrong word. It should be logical and should try to convey what it does. I don't care what an URL looks like, as long as I can understand what it supposed to do.
APIs are for developers, not for search engines and definitely not for the end user.
Performance and simplicity of client code come to mind. If the API is consumed by a UI, it sounds good, particularly if the API provide hypermedia controls that can help generating the UI. But for automated consumption, issuing one call to a know endpoint still seems so much more straightforward than walking down the links graph until the expected link relation is found. I guess I just have to go & try by myself because at this point it seems we don't have much documented experience to rely on.
Yes, but you shouldn't hardcode the known end point, you should cache it after a walk. At some point in the future, it may 404 or 410 on you, in which case you rewalk the API from the entry point, following the link relations that got you to the thing you are after. Then you cache it again.
If you keep the data on client side for longer than trivial amount of time, wouldn't you have to deal with conflict resolution? This could be a big problem for many use cases.
Yes, it is incredibly hard. But on the other hand, the problem of clients losing connection does not go away just because syncing is hard. If you're targeting mobile users, you have the choice between your app being unavailable a lot of the time to a lot of users, or handling syncing.
I write my own REST APIs and I don't really run into problems like this. Removing REST isn't really the answer to the problem "what happens if clients disconnect?" You can write a layer in the client to deal with this gracefully and still use REST. I don't really get how the two concepts relate.
For anyone interested, I recently wrote a library called Hustle (https://github.com/orthecreedence/hustle) which implements a beanstalk-like queuing system on top of indexeddb, allowing you to queue up writes to your local data to be synced remotely.
I also don't understand why any of this (OP's problems with REST) is the fault of REST or a database. To me this is 100% architecture woes. Lets of people use REST successfully.
What if someday, you decide that you want to change the database because CouchDB was bought by Oracle? You'll have to clone Couch's REST API or you may break your clients.
As others have noted, as an Apache project, the CouchDB project can't be "purchased by Oracle". Obviously you meant that somewhat metaphorically (and as a reference to MySQL), but it's worth noting:
MySQL was the product of a private company. It was always possible for it to be purchased, and that eventually happened. The same is true for many modern databases, including most of the major NoSQL databases. CouchDB cannot be. Yes, there's risk of the project could die in a variety of ways, but there is an entire class of risks that CouchDB is not subject to, when most of its competitors are. If you really want to avoid having your DB purhcased and killed by Oracle, you'd actually prefer CouchDB over, eg, MongoDB.
Beyond that, you say: "You'll have to clone Couch's REST API". That's actually the other half of what makes CouchDB a safer bet: The HTTP protocol is itself open source, well documented, and has already been cloned repeatedly. It's used by quite a few different projects in different ways, including PouchDB, CouchDB, TouchDB, Hoodie.io, BigCouch, and CouchBase Lite.
Yep. There is a reason people abstract things. Sometimes people get a new toy and just go a bit overboard, want that one tool to solve all their problems.
Don't get me wrong, CouchDB is awesome, but it isn't the answer to everything.
CouchDB can’t be bought by Oracle or anyone, since it is run as an independent Open Source project and under an organisation (the Apache Software Foundation) that allows and supports commercial entities contributing, but single vendors can never control the project in a way that a single vendor product could.
You are still vulnerable to CouchDB as a project folding, but since it is all Open Source and open discussions, that one is easy to find early on.
And finally, you could just use one of the other implementations of the CouchDB API available :)
Of course the "Oracle buys CouchDB" scenario wasn't meant serious :). But the point still stands. There may be much more trivial cases which can drive an user to switch from CouchDB and CouchDB derivatives to another DB. E.g. the user figured that a relational DB suits his use case better after all.
I don't know about everyone else, but the problems described in the article I encounter every day and so far I managed to beat it with solid Typesafe stack. Main components of my stack - Scala/Play/Akka/Slick/PostgreSQL. By changing few configs I can scale from 1 server to 10 just like that thanks to Akka.
Do you know of some tutorials or even better some examples for getting started with this stack? I know of this JHipster [1], but it's Java and doesn't use Akka AFAIK.
Oh, that Scala School looks great, thanks! I also found this: http://typesafe.com/activator/templates <- some templates to get started with the Typesafe stack. I guess it's another weekend lost for me =)
90% of the time I end up just proxying connections to couch and doing sanitizing/filtering on the data on the fly.
The views are bit annoying, but I just plug in elasticsearch once it becomes even moderately complex. It just automatically indexes the data by listening to the _changes feed.
This is a meandering tour of one developer's trials, not a solid discussion of REST at all.
I withdrew after "Whether that it's tried and tested of MySQL or shiny and new of MongoDB. This choice is probably going to affect how you scale." It was not worth continuing after that.
If you have already written a competent relational schema, written a SQL API on top of that, and mapped that API to a RESTful service successfully, this article is not for you.
> There are problems with your APIs losing data because of errors or downtime. A lot of the Internet is going over unreliable wireless technologies. So your beautiful REST calls are now riddled with exception handling, because there are so many ways of things going wrong.
Couldn't this be easily solved by:
a) putting the server behind nginx to terminate slow connections after several seconds, and
b) adding retry logic to the client?
This isn't a rhetorical question - would this approach actually work if a client was part way through sending data when it lost network connectivity, or am I missing something?
Despite the slightly ranty tone, this was an interesting read. The sync-first approach presented here may certainly have value in some contexts. Also interesting to hear of someone implementing backends with just CouchDB. Again one size does not fit all but it's always interesting to read another developer articulating his thoughts on the way he works.
What if you want to plug in another DB for testing? When I use Peewee ORM with Python, I use Postgres in production and SQLite in-memory for testing. Is there a good way to do this when you put logic in the DB?
I'm a huge fan of CouchDB, but I think that CouchDB is the answer to his problem just as much as using some other database like Postgres is the problem to begin with. The problem with "REST" (oh yeah, baby, I'm getting the air quotes out) APIs to begin with is most of the time we heavily couple them to the fact that they will be consumed by a browser client. Design your REST APIs to stand by themselves and design them to be consumed by any client. There is no doubt that web browsers have special needs, address those special needs with a (hopefully) thin layer between your REST API and the browser.
so how does using couchdb solve some of the pain points? by providing a user authenticated REST api out of the box? what if you need to write some server side business logic?
I read the article, it definitely sounds exactly like what I've been through. What I didn't get from the article is, what to do next? What is the right step to take here?
For example:
There seems to be an implicit assumption that all of the REST endpoints are just maps to CRUD operations. Perhaps for the most barebones, basic API this is true, but service APIs are more than just CRUD ops. They're abstracted interfaces to a complex series of tasks (for the sake of argument, we'll call them 'orchestrations' ). Just because you're doing a "POST" to /user/foo doesn't mean that the service is only doing a simple insert. It might be looking up similar users, sending out notifications, validating data, de-duping, etc.
The more I think about this the less sense it makes. Perhaps I'm missing something.