Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Don't design your own REST back end (plus.google.com)
135 points by rograndom on April 7, 2014 | hide | past | favorite | 75 comments


Still digesting this, but it doesn't pass the smell test. There are too many assumptions that are just plain... wrong.

For example:

There seems to be an implicit assumption that all of the REST endpoints are just maps to CRUD operations. Perhaps for the most barebones, basic API this is true, but service APIs are more than just CRUD ops. They're abstracted interfaces to a complex series of tasks (for the sake of argument, we'll call them 'orchestrations' ). Just because you're doing a "POST" to /user/foo doesn't mean that the service is only doing a simple insert. It might be looking up similar users, sending out notifications, validating data, de-duping, etc.

The more I think about this the less sense it makes. Perhaps I'm missing something.


Couchdb lets you listen to changes[1], if you want to do anything in addition to the basic CRUD you can do it there. This does limit you to performing asynchronous operations but most of the time that's good enough. You do still end up with a scaling issue but because the users aren't hitting that part of your system directly it's easier to spread the load over time.

[1] http://guide.couchdb.org/draft/notifications.html


It's clear the author doesn't understand REST - he thinks that it is a dumb wrapper over a relational database. It's not surprising that straw man burns so well.


I have yet to see a crystal clear definition of REST. Where would be the best one you have found?


REST is a set of constraints for building a system. A RESTful interface is one that cleanly abstracts the underlying data store into one that is simple to use for third party systems.

It's important to understand that a RESTful system doesn't necessarily expose a true model of the underlying data store. You could have an invoicing system that contains Customer and Invoice models. I see developers confusing that to mean "you have /customer/ and /invoice/ endpoints and never the twain shall meet". If you need to embed customer details into invoices and/or invoice details into customers, then feel free to do that if it makes sense for your application.

There's more to it than that, and I've mostly just described how common HTTP-based RESTful systems work. Fielding gave a lot more detail in his thesis: https://www.ics.uci.edu/~fielding/pubs/dissertation/rest_arc...

I'd recommend reading it, as it helps clarify a few things. Also read up on the term HATEOAS, which details how information should be interlinked where possible, so your system can be (theoretically) crawled for more information.


I think the best definition is in Fielding's thesis where he lists the 4 REST interface constraints: 1) Identification of resources 2) Manipulations through representations 3) Self-descriptive messages and 4) HATEOAS.

1 & 2 mean that you should name a resource something for identification only and that name should be used no matter what you are doing to it. In HTTP you use a combination of verbs and request body data to tell the server what to do with a resource. HTTP implements #3 though status codes including verbose descriptions in the response body. HATEOAS (Hypermedia as the engine of application state) means that 100% of functionality must be accessible through a browser.


It's a style, not an implementation. Highly opinionated definitions are the norm for REST.


It is defined very clearly. The fact that most developers don't know what it is does not mean REST is a subjective style.


For the sake of brevity and callback, I'll say it's a SMART wrapper over a backend service or services (commonly at this point, a relational database).


You can do this by adding all the logic to the database. The author is a CouchDB fan. There's pros and cons to this approach. It's not a new idea - see http://stackoverflow.com/questions/1473624/business-logic-in...


His post does focus mostly on CRUD. Personally I usually proxy CouchDB to a path in my REST API for handling CRUD and then proxy in the rest of my services at other paths. In the case of doing a write and then some other operation, the CouchDB folks usually write a process that listens to the changes API which then triggers the desired other operations.


I'm not sure if the author is even aware of the 202 status code, which seems to solve this and other asynchronous issues.


Perhaps the client would poll a response if everything was truly async?


The main benefit of having an actual API layer is decoupling - between your front end clients and you back end service(s). This article seems to forget this major point.

Decoupling has loads of benefits - for example, it gives separate development cycles, and in this case provides a point of scalability too - not all REST dbs cache brilliantly or aggressively enough for all use cases.

The point about using delayed syncing between client and server is separate. I can see how it can help in certain use cases (Google docs doesn't auto-save every single letter you type in for example I suspect, but probably bulks them up into a save every x seconds) but at other times this kind of strategy sounds like it will add unnecessary complication.


The syncing adds unnecessary complication for desktop apps. For apps that needs to be accessible via mobile devices, it means the difference between whether or not your app is usable at all for a lot people. About 30%-50% of my commute is "dead zones" where I'm lucky to get GPRS speeds, and often have no signal at all... This is in one of the busiest commuting corridors in London....

Getting syncing right is tricky, though, because you do need to think through your possible conflict resolution issues in case the user moves from device to device, potentially while one or both devices lack a connection.


Explicitly linking database operations to API calls is idiotic for multiple reasons. First it creates tight coupling, second if you want to do multiple operations at once you have to make multiple calls to the API. That doesn't scale at all.

Everyone should be writing their REST back end and the API should be as specialized as possible. One HTTP call should give you back everything you need for a page/view w/e.


+1

That mega-JOIN that brings back the entire client context is more awesome for db throughput, than plethoras of single-object calls. THIS IS WHY you build an API, rather than exposing CRUD. CRUD is not an API.


A Remote Facade, as Martin Fowler would put it: http://martinfowler.com/eaaCatalog/remoteFacade.html


I agree the article is missing the abstraction all public apis need to offer non changing services.

Behind that public front everything can change and routes can be added, if things change majorly you version it. But never is it a good idea to map the db operations directly to the rest urls, so much more has to happen in production.

The public api layer is sometimes mapped more manual as auto generation rest apis rarely aren't leaky abstractions. ORMs, data layers, authentication etc behind the scenes may be generated or flexible to change.

Public apis might even have public objects that aren't objects from the database in full or at all i.e. meta/tracking fields, private info.

Public apis offer multiple levels of service: direct data/json, mvvm views, tools, syncing, per-view state (multiple objects), authentication etc. The term REST has colluded it a bit but ultimately REST is representational state transfer, it doesn't have to be web to db without an application layer.

An API is a public definition that should be as concrete as possible for the service use cases and public apis provide a flexible abstraction to expose a service to other consumers. So I will keep writing my public apis, I'll generate boilerplate behind that but I won't be locked to any implementation other than a uniquely designed api for my service, features needed and use case.


> it doesn't have to be web to db without an application layer

I don't even think it's possible to do that if you have a bit more complicated application. Because every API that does more than the very basic CRUD needs something else besides interfacing with the database.


Yes the only time I could see that being useful might be in a private api used by an admin application or similar. But for a public api, prepare for some pain.


This article is really making a lot of broad assumptions about what people are making APIs for and how people write code.

"Now you hit an API that has to do some async behaviour and now your screwed. There are some solutions for that but they make the code really complex, even Node.js."

What does that even mean?


I don't know what he meant, but we have a situation where our REST API needs to reply with “I see you want to get X. Come back and ask for X again when it's ready.“. In our backend, we just add the task in message queue.



Actually it doesn't. The "hard" problem here isn't telling the client to come back later, it is providing the client with a way to get the actual results later. So providing a request identifier that gets populated in a database when the task is complete may be one way to do that.


  → POST /resource
  ← 202 Accepted
  ← Location: /jobs/XXX


Right but you need /jobs/XXX to return a response that is not 404. How you accomplish this is outside the scope of what 202 provides you.


I never had a problem with this.

The protocol I wrote just sent a job-ID to the client and the client started polling for the result.

Since AJAX is already asynchronous I could write a JavaScript library for the browser, which made the whole thing transparent for the client.

The only problem I ran into were multiple long running jobs, which caused parallel polling. But since the communication between server and client was hidden behind the client-lib, I could change the protocol later, to merge multiple polling requests in one etc.


It means he had a problem and so will you. I'll stop short of calling it a profoundly disturbing assumption, but it pervades the entire article.


   > A lot of the Internet is going over unreliable wireless 
   > technologies. So your beautiful REST calls are now riddled 
   > with exception handling, because there are so many ways of
   > things going wrong.
Let me get this straight, you want do to representational state transfer (REST[1]), that is, keep all long-lived state on the server, and all (single) state changes confined to a single request - and you're worried about your clients intermittent internet connection?

You might have a problem with your architecture, but your architecture doesn't appear to be a REST architecture.

[1] https://www.ics.uci.edu/~fielding/pubs/dissertation/rest_arc...


There is a core of truth to this article in that we're all reinventing the wheel trying to sync a client-side model with a server-side model (assuming your (web) app has client-side state at all, of course).

Indeed, CouchDB/PouchDB already solved this problem, and redoing it can be considered waste. I think the author cut a bit too many corners for the argument to stick, but in principle it's entirely true.

The "sad" thing is that if I'd use Couch for this, I'd get more than what I asked for: I also get a database, document-oriented, with a big map-reduce component, difficult to add custom queries, difficult to extract management information from, limited admin tooling. It might fit certain problems well, but it also fits many problems badly.

I'd love for a Couch/Pouch kind of tool for replication/sync that somehow allows me to plug my own datastore on the backend and my own model structure on the frontend. I'm not sure what that would look like - there's a good reason data storage and sync are so tied together in Couch. But still, "you shouldn't design your own API and the only way not to is to use this particular database here" just doesn't feel like we're done with this debate.


I've got a low traffic project I'm working on at the moment that uses pouchdb as an offline data store for a tablet app. I'm currently playing with implementing the couchdb protocol in rails. It seemed like a crazy idea to start with but with a little redis sprinkled in to handle information that wouldn't normally in active record models it seems to coming together better than expected.


I just use elasticsearch to index couch, but that's also stricly on the server side.

there's rarely enough data in pouch for me to have to index that way though.


Aren't you severely limited in storage with pouch? I think best case with browsers is like 50Mb with the average being about 10Mb. I can't envision a situation where you could ever index pouch.


> You hopefully care about how the URL schema looks like

That's not REST, that's something else. Bit of a strawman.

> So then you don't have beautiful URLs right.

You never needed them.


This is a key point. HTTP states that URLs are opaque. Understanding that is important. Trying to leverage URL conventions is just asking for trouble. SEO complicates this a bit for crawled sites, but for web services it is a very different situation.


Beautiful URLs are convenient though. Not every API call is going to follow a succession of links from the main entry point. Do you really design opaque URLs on purpose? Does it work well for your clients? This is a genuine question, I'm quite seduced by the hypermedia story, but still wondering how it works in practice. It seems that for the most part, we don't really know yet.


I think REST and "nice" URLs are orthogonal - HATEOAS definitely doesn't require you to be assembling URLs but human readable URLs make life a bit easier as a developer so why not have them?


> Beautiful URLs are convenient though

I think "beautiful" is the wrong word. It should be logical and should try to convey what it does. I don't care what an URL looks like, as long as I can understand what it supposed to do.

APIs are for developers, not for search engines and definitely not for the end user.


Indeed, "descriptive URLs" is more what I had in mind.


Not every API call is going to follow a succession of links from the main entry point.

Why not?


Performance and simplicity of client code come to mind. If the API is consumed by a UI, it sounds good, particularly if the API provide hypermedia controls that can help generating the UI. But for automated consumption, issuing one call to a know endpoint still seems so much more straightforward than walking down the links graph until the expected link relation is found. I guess I just have to go & try by myself because at this point it seems we don't have much documented experience to rely on.


Yes, but you shouldn't hardcode the known end point, you should cache it after a walk. At some point in the future, it may 404 or 410 on you, in which case you rewalk the API from the entry point, following the link relations that got you to the thing you are after. Then you cache it again.


If you keep the data on client side for longer than trivial amount of time, wouldn't you have to deal with conflict resolution? This could be a big problem for many use cases.


Syncing and resolving conflicts is one of those things that everyone thinks is initially easy then realizes how difficult it is.


All I saw was the problems when he talked about that. The data will be a mess (if his database is in any way similar to the one I work on).


Then they punt and centralize.


Yes, it is incredibly hard. But on the other hand, the problem of clients losing connection does not go away just because syncing is hard. If you're targeting mobile users, you have the choice between your app being unavailable a lot of the time to a lot of users, or handling syncing.


I write my own REST APIs and I don't really run into problems like this. Removing REST isn't really the answer to the problem "what happens if clients disconnect?" You can write a layer in the client to deal with this gracefully and still use REST. I don't really get how the two concepts relate.

For anyone interested, I recently wrote a library called Hustle (https://github.com/orthecreedence/hustle) which implements a beanstalk-like queuing system on top of indexeddb, allowing you to queue up writes to your local data to be synced remotely.


I also don't understand why any of this (OP's problems with REST) is the fault of REST or a database. To me this is 100% architecture woes. Lets of people use REST successfully.


What if someday, you decide that you want to change the database because CouchDB was bought by Oracle? You'll have to clone Couch's REST API or you may break your clients.


As others have noted, as an Apache project, the CouchDB project can't be "purchased by Oracle". Obviously you meant that somewhat metaphorically (and as a reference to MySQL), but it's worth noting:

MySQL was the product of a private company. It was always possible for it to be purchased, and that eventually happened. The same is true for many modern databases, including most of the major NoSQL databases. CouchDB cannot be. Yes, there's risk of the project could die in a variety of ways, but there is an entire class of risks that CouchDB is not subject to, when most of its competitors are. If you really want to avoid having your DB purhcased and killed by Oracle, you'd actually prefer CouchDB over, eg, MongoDB.

Beyond that, you say: "You'll have to clone Couch's REST API". That's actually the other half of what makes CouchDB a safer bet: The HTTP protocol is itself open source, well documented, and has already been cloned repeatedly. It's used by quite a few different projects in different ways, including PouchDB, CouchDB, TouchDB, Hoodie.io, BigCouch, and CouchBase Lite.

If you're curious about the topic, I recommend this blog post: http://caolanmcmahon.com/posts/couchdb_is_not_a_database/


Thanks for the link. CouchDB really sounds more like a platform/framework. As for the points you mentioned, refer to my answer to janl.


Yep. There is a reason people abstract things. Sometimes people get a new toy and just go a bit overboard, want that one tool to solve all their problems.

Don't get me wrong, CouchDB is awesome, but it isn't the answer to everything.


CouchDB can’t be bought by Oracle or anyone, since it is run as an independent Open Source project and under an organisation (the Apache Software Foundation) that allows and supports commercial entities contributing, but single vendors can never control the project in a way that a single vendor product could.

You are still vulnerable to CouchDB as a project folding, but since it is all Open Source and open discussions, that one is easy to find early on.

And finally, you could just use one of the other implementations of the CouchDB API available :)


Of course the "Oracle buys CouchDB" scenario wasn't meant serious :). But the point still stands. There may be much more trivial cases which can drive an user to switch from CouchDB and CouchDB derivatives to another DB. E.g. the user figured that a relational DB suits his use case better after all.


I don't know about everyone else, but the problems described in the article I encounter every day and so far I managed to beat it with solid Typesafe stack. Main components of my stack - Scala/Play/Akka/Slick/PostgreSQL. By changing few configs I can scale from 1 server to 10 just like that thanks to Akka.


Do you know of some tutorials or even better some examples for getting started with this stack? I know of this JHipster [1], but it's Java and doesn't use Akka AFAIK.

[1]: http://jhipster.github.io/presentation/


On http://typesafe.com you'll find plenty examples using the whole stack. When I was learning Scala I found http://twitter.github.io/scala_school/ best source to break into it. http://playframework.com is a good source of tutorials and docs for Play. http://akka.io is the home of Akka and http://slick.typesafe.com is Slick's website.


Oh, that Scala School looks great, thanks! I also found this: http://typesafe.com/activator/templates <- some templates to get started with the Typesafe stack. I guess it's another weekend lost for me =)


For those of you using .NET I highly recommend ServiceStack:

https://github.com/ServiceStack/ServiceStack

https://servicestack.net/features

Doesn't use ASP.NET. Runs on Mono.

Don't bother with WebAPI.


ServiceStack is great but the recent licensing change is irritating to say the least.


You can still use the v3 version for free.

I agree that for a hobbyist the new license fees are too high. For a company those costs are a no-brainer though.

What Demis has done with ServiceStack (and now having made it his full-time job after leaving Stack Exchange) deserves remuneration in my opinion.


I love CouchDB so much.

90% of the time I end up just proxying connections to couch and doing sanitizing/filtering on the data on the fly.

The views are bit annoying, but I just plug in elasticsearch once it becomes even moderately complex. It just automatically indexes the data by listening to the _changes feed.


This is a meandering tour of one developer's trials, not a solid discussion of REST at all.

I withdrew after "Whether that it's tried and tested of MySQL or shiny and new of MongoDB. This choice is probably going to affect how you scale." It was not worth continuing after that.


If you have already written a competent relational schema, written a SQL API on top of that, and mapped that API to a RESTful service successfully, this article is not for you.


> There are problems with your APIs losing data because of errors or downtime. A lot of the Internet is going over unreliable wireless technologies. So your beautiful REST calls are now riddled with exception handling, because there are so many ways of things going wrong.

Couldn't this be easily solved by: a) putting the server behind nginx to terminate slow connections after several seconds, and b) adding retry logic to the client?

This isn't a rhetorical question - would this approach actually work if a client was part way through sending data when it lost network connectivity, or am I missing something?


Despite the slightly ranty tone, this was an interesting read. The sync-first approach presented here may certainly have value in some contexts. Also interesting to hear of someone implementing backends with just CouchDB. Again one size does not fit all but it's always interesting to read another developer articulating his thoughts on the way he works.


Does couchdb have something that provides fine-grained control over the db models, like flask-restless does(https://flask-restless.readthedocs.org/en/latest/customizing...)? I didn't find anything in their documentation or on google.


What if you want to plug in another DB for testing? When I use Peewee ORM with Python, I use Postgres in production and SQLite in-memory for testing. Is there a good way to do this when you put logic in the DB?


it’s not quite there yet, but pouchdb.com with pouchdb-server (npm, gh) is the “SQLite to Postgres” in CouchDB terms :)


I'm a huge fan of CouchDB, but I think that CouchDB is the answer to his problem just as much as using some other database like Postgres is the problem to begin with. The problem with "REST" (oh yeah, baby, I'm getting the air quotes out) APIs to begin with is most of the time we heavily couple them to the fact that they will be consumed by a browser client. Design your REST APIs to stand by themselves and design them to be consumed by any client. There is no doubt that web browsers have special needs, address those special needs with a (hopefully) thin layer between your REST API and the browser.


nobackend.org


so how does using couchdb solve some of the pain points? by providing a user authenticated REST api out of the box? what if you need to write some server side business logic?

I read the article, it definitely sounds exactly like what I've been through. What I didn't get from the article is, what to do next? What is the right step to take here?


Can't read easily on an ipad. Back button pressed.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: