OP/fanboy here; i'm interested in your thoughts on our products like Cloud BigTable: OSS (HBase) API, but unique goog performance bits under the hood. Valid? How can we best move the state of the art forward?
The problem with big table/hbase is that it isn't just a hosted instance - you've made it mostly work with the same api, but it's not the same beast. That you have a modified version of the Java client and a page detailing a heap of differences is not a minor detail.
Honestly this just sounda like Google m/o (aka embrace/extend/extinguish 2.0) at everything.
Gmail sort of supports imap and smtp but in several ways it's broken, but users don't care that it's google's fault, they just want their free email.
Big table is not hbase it's just sort of compatible, but developers dont care they just want to drink the cool aid.
If Google really wanted to commit to anti-lock-in they would focus on runningc open source software that they use their infinite budget to tune to absolute best performance, combined with contributions back to those projects, to improve them where tuning alone isn't enough.
Common API even in non-perfect way is so much better than everybody reinventing the wheel from scratch, because some tiny use case doesn't fit in the previous API.
>This article is more probably to convince ppl to use Google Cloud rather than how to run multicloud strategy.
Yes. For public cloud, it's AWS v. everyone else right now. This is why Docker is popular among IT mega-vendors because container portability gives IBM/Microsoft Azure/GCP/Red Hat a fighting chance to stop AWS from gobbling up the entire cloud infrastructure.
Container portability => less AWS lock-in => They'll come to my platform, perchance.
TF is entirely opensource. The same models will run on the open source version as in the hosted version. How is this not the equivalent of saying "If you use postgres, you are locked in."
Well that's odd, they sort of forget to mention the whole premise of (all the projects composed in) openstack (which is to provide the same APIs across providers, vendors... by its very nature escaping lock-in by doing this).
Also sadly (and maybe I'm taking this wrong) but calling the holy land 'Google Infrastructure For Everyone Else' seems condescending ...
Author here: yeahhhh we don't know what to call it. CoreOS folks said this and it sorta made sense, we call out the name ookyness right in the post. AsKSHBDDVBTFRcsM seemed, well, dumb. Ideas?
Call it 'googles view of a stack', at least that doesn't make it sound like an elitist view on (one of many) versions of a stack (that yes is just one version of many stacks that have and do work for the <wider world>).
After all the noise about Firebase at the last Google I/O, it's odd that the article doesn't mention it at all. I guess its proprietary API and undocumented wire protocol with no open source alternatives would be off message...
I wish this was true. I am locked-in with Firebase and would love to self-host now that I have resources. Sadly I will have to rewrite the API.
It's not that Firebase is not great, I just want to take control back.
To be fair, I think Horizon is one of the more promising Firebase alternatives. However, it has completely different semantics, and is quite immature at this point (look at the limitations: http://horizon.io/docs/limitations/). But if Firebase remains proprietary and Horizon continues to improve I could certainly imagine that rewriting my app will eventually start to look attractive.
I know everybody would love to see this, but for me it's a bit much to ask from Google to open-source Firebase since they invested quite a lot of time and money into it to get to the current version. Apart from that, I can definetly recommend http://deepstream.io as a Firebase alternive, although it currently lacks an Android/iOS client (Java client is in the works).
I doubt Google will open source Firebase since the latest version is tied deeper into other Google Cloud stacks. I evaluated Firebase as a backend for an iOS/Android app but has to drop them due to the app needs China market. Firebase Android SDK has dependency on Google Play Store services, and majority of Android phones in China do not have Google Play services installed. There are even report that some sites under Firebase Hosting (basic static site hosting) can not be accessed in China.
If you highly value the hosted PaaS aspects of Firebase then it's good to look at (they are one of the best in that space). If you're looking for an open-source alternative that can be hosted in your private or public cloud, check out Couchbase Mobile (full disclosure, I'm an architect at Couchbase). We're open-source under Apache 2 and have all the database and sync functionality of Firebase. Many of our customers/users evaluate us against Firebase, it really depends on where you place the most value: PaaS, controlling your own data, open-source, etc. http://developer.couchbase.com/mobile
Yeah, I agree that open sourcing Firebase is too much to ask. But there's other things they could be doing: a) document the wire protocol and license the clients for free use with any server, b) make the API definitions public domain and provide a compatibility test suite for alternative implementations, c) lead a standardization effort on real-time JSON databases, etc...
True, maybe https://github.com/firebase could help with some reverse engineering. However, I discovered the Firebase document and the deepstream.io record structure to be fairly similar, so with the help of https://github.com/firebase/firebase-queue the migration from Firebase to deepstream.io + RethinkDB might be possible.
As a more general note, I would expect some of the stuff you mentioned to happen in the upcoming months/years, since Firebase is right now set to become the universal plattform for app devs (analytics, dynamic links etc.) rather than a standalone database offering.
Sure! We should be tankfull they throw us a bone here and there(i.e. a whitepaper)...until then let's invest more in their cloud services so they can show us how small we are without them. I remember people asking why they don't open source datastore and appengine. Their response was that we are too dumb to manage it so it won't help us anyway.
If you ate not willing/able to pay for managed cloud services like Firebase, there are some really cool self-hosted alternatives out there (like deepstream.io and horizon.io). Open-sourcing software should always be a can, not a must. We could as well expect Amazon to open source DynamoDB, which is proprietary as well and still has quite a lot of users.
Given that this blog post is about having portable workloads and Firebase is specifically not portable, isn't it obvious that talking about Firebase would be "off message"?
Actually, the article specifically talked about some non-portable services and explained why this wasn't a big deal, so to me leaving out Firebase implies that 1) it is a big deal and 2) they're not interested in making Firebase apps portable. My conclusion is that they're only opening up the services where they're not in the lead and getting people locked into the ones that are. Which is fine and a great business strategy but rather detracts from the "we're all about openness and choice" image they're trying to project.
I personally perceive Google as ahead in the container space and quickly the machine learning space.
I don't see how one product being closed has any impact on the openness or portability of the products mentioned in this article. It seems unreasonable, to me, to criticize them for omitting an exhaustive list of closed products in a blog post about portable platforms.
i use and love Google Cloud Datastore, but it also falls into this proprietary umbrella.
Though honestly you should never write your app directly against a database schema. ORM or FRM (functional relationship mapping) is how it should be done.
I think the key thing is data interchange so you can get off of product X when you have business reasons to do so without running into a huge data migration project.
Standard SQL databases you can use pretty much the same tooling as if you were running it yourself. Cloud storage, you can download/upload to a new thing, but it may take some time and cost some bandwidth. BigQuery/Redshift you should be storing your data as .json.gz on S3/GCS as a backup anyway, so an export shouldn't be needed -- you'd just import the cloud storage data into another system.
I did a smallish BigTable dump a couple months ago -- kinda a PITA to do -- I don't recommend it; maybe I was doing it wrong, I'm not that familiar with hbase and it was a one-off thing.
I also did a rather small scale data migration out of app engine (~100GB) awhile back -- total PITA -- datastore dumps are protobuf, so you need to build special tooling to import the data into another database.
And generally, I don't use the other services to avoid lock-in. DNS round-robin+nginx for load balancing and ssl termination, Saltstack/Ansible deploying to Ubuntu VMs for config management and deployment. I run the databases myself so I can run it on any cloud I want -- or localhost; having a full dev environment you can create and destroy on your own machine using the same tools that you deploy to is really nice. This also enables doing things like running dev workloads on cheaper clouds (Linode) while running production in a VPC on AWS.
As someone who worked with Google from a customer perspective, they take pride in the fact that their big data (BigTable, BigQuery, DataFlow) is proprietary and superior the their OSS alternatives (I believe they actually are).
I am not saying that they should OSS it, if they build something so awesome that differentiate them, they should know how to monetize it. But at least don't come off with a post like this.
Working at Google Cloud myself, I find the company philosophy to be to provide standard interfaces to superior implementations. This keeps us honest, because if the door to leave is open, then you have to actually want to stay. (Standard disclaimer: my opinion, not the company's.)
I'm surprised they didn't mention Cloud Foundry, which runs on OpenStack, vSphere, AWS, Azure and ... GCP. It's owned and managed by an independent foundation and is the most mature of the open PaaSes, having a head start of several years.
I guess they see it as a competitor to Kubernetes. Or, plausibly, Google is very large and the engineers who are working on Cloud Foundry integration with GCP are not known to Miles Ward, Head of Global Solutions.
I work for Pivotal, we donate the majority of engineering to Cloud Foundry. We make our money selling a distribution, Pivotal CF.
Cloud Foundry's ability to switch cloud providers, or to take a cloud inhouse, and in future to do so seamlessly, is very attractive to our customers. IaaS providers remind them the bad old days of RDBMS and middleware lockin and they do not want to go back.
As for the services, it's good that Google are pointing to open alternatives.
Google's support for a multi-cloud vision would be a lot more meaningful if they were offering the open source alternatives as part of their cloud platform and contributing to their development. Google has done so in a few key cases (Kubernetes and Tensorflow), but in other areas it's hard to get comfortable with the belief that there will be smooth transition to other infrastructure.
You've got a point. BigTable is an example; they've never open-sourced it and instead other OSS projects (notably the Hadoop ones) were inspired by it.
Other examples are GFS, Colossus, Spanner and F1. The OSS analogue of GFS is HDFS (AFAIK), but I know of nothing equivalent to the others. (CockroachDB is a bit like Spanner, but is much more pragmatic about its clocks.)
Many companies would be better off however using cloud services since the "lockin" still costs them less than the overhead of running xyz service them self. Lock-in is only important if you have a cheaper way to run something, not if you're hiding costs in IT salaries for operations. :)
> Lock-in is only important if you have a cheaper way to run something
False. Lock-in is a business risk, and anyone who thinks it isn't is kidding themselves.
If you are dependent on a single vendor for a non-standardised service, you are also at risk if the vendor goes out of business, if the vendor decides to raise prices significantly, if the vendor decides to discontinue that service, or even if the vendor decides to change the mode of operation for that service.
You are also beholden to that vendor's internal procedures, and tech decisions, which aren't always in your best interests. You may remember last September when there was a fucking massive AWS outage, where customers couldn't access the console, EC2 instances wouldn't spin up, etc. The fault for all of that, was a network disruption which caused DynamoDB to flip it's shit and make "the worlds biggest cloud provider" into the worlds biggest brick shitting machine.
They've probably made changes so that type of error is less likely in future, but that doesn't matter because you don't get to make that decision if you're a single vendor customer on AWS. Whatever choices they make, they make for you.
> hiding costs in IT salaries for operations
Here's a pro-tip for you: use of AWS/Azure/Google Cloud/etc doesn't absolve you of the need for operations staff. If you think your single nodejs developer is equally skilled and has enough time to setup and maintain your infrastructure just because it's dynamic and has a browser interface, I have a great investment offer for you, in magic beans futures.
All of this is accurate. Also on the infrastructure deployment and maintenance side, virtualization already vacated the need for any ordinary organization to spend time worrying about infrastructure. You just rent colo and slap hypervisors in.
There are still organizations messing around with overly complex internal server systems because they have bad admins or are heavy in technical debt, but these are no longer the rule.
For most clients I work with (who often have this "oh but shouldn't we use AWS, they're the best right?" attitude) even a few rented VPS (i.e. on shared hardware) is sufficient.
I'm a big fan of the way some providers like Rimu allow more control, which helps with scaling up - by default most customers just use one or more Xen VM's on shared physical hosts, but you have the option to rent dedicated hardware (either existing stock, or custom orders), and then you can run one or more VMs - without noisy neighbour concerns.
The "single vm on a host" seems weird to most people at first, until you realise that it allows you to migrate the VM between physical hosts.
Sure, the new hotness is servers-as-cattle and we should just provision a new instance, but that doesn't always work for smaller outfits.
The blog entry covers a lot of options, but as a cheapskate who wants to play with cloud infrastructure, I'm wondering if there are any good resources for pricing comparison?
I could use AWS or Google, but if I'm just keeping a toy cloud service going. I'm not particularly worried about reliability, and I don't want to spend $30+/month.
Digital ocean droplets are great for a $5/month cloud for toy projects. Extremely easy setup. I use it for running an instance of "https://thelounge.github.io " and a few other projects which essentially pays for itself since it's replacing a $5/month irccloud membership. I don't know, aws or google cloud or azure might have something similar, but the elastic pricing style options were less appealing for my use case when I was checking out the options.
Most cloud providers have hourly billing (if not smaller). So you can spin up some instances of what you want to try and will only be billed a small amount.
For Google Cloud, a compute engine f1-micro instance is only $4.09/month. Standing it up for only 24 hours is $0.19.
As well, both Amazon and Google have budgets you can set on your account that can notify you if you go over some thresholds you setup. This can help prevent you from spending too much money.
I also recommend Digital Ocean for quick/cheap personal toy services, but for projects of enduring duration and reliability pricing calculators are totally public for AWS[1] and GCP[2]. I haven't seen any direct comparison calculators as everyone's particular mileage/needs may vary.
The good news is that at scale, the days of compute, storage, and data management becoming easy to use utilities to plug into are here, or near. The bad news is it still feels a bit too 'big' to plug small lamps into the main AWS or Cloud Platform grids, but I suspect those days are numbered as well.
And they would quickly demonstrate the value Kubernetes brings to the table when the spots are terminated out from underneath your app (yes, provided the Kubernetes API and etcd are running on the survivors ;-) )
Do you know about the AWS free tier? You can get an EC2 micro instance for free for 1 year. When it runs out it might be worth switching to Digital Ocean.
Agreed, but a word of caution if you're working with single VPSes.
When a VPS has problems don't waste time to open a ticket to their support. They were never able to fix anything on my VPSes the 2 or 3 times I had problems since I started using them in December 2010.
Try to reboot the VPS from the web manager. If that doesn't fix the problem create another VPS immediately and let the original one expire at the end of the period you paid for. I suggest to pay per month and to have an automated installation script. Make backups for the data and make sure they can be transferred to the new VPS.
OVH has other offerings such has cloud and private cloud services. I never used them so the support there could be better.
I don't really understand why the title was changed to this from my original or the title of the post, since it is less informative and more inflammatory...
Gentlemen, I would like to interrupt this "google promotes open-source anti lock-in" PR party, and ask for a minute of silence for the recently departed GoogleCode.
Um, Google Code was comprehensively obsoleted by Github. Google moved all their open source code to there, and provided a truckload of tools to help everybody else migrate to various other repos as well.
There's already an open standard for the core functionality of code hosting sites: Git. Projects migrate between GitHub, GitLab, Bitbucket and other alternatives all the time by simply changing the Git remote. There's some lock-in via extras like issue trackers and wikis but that's not difficult to migrate, especially with Markdown as a common standard so markup can be preserved.
I'm not really sure why you think that's relevant to this. Google Code simply failed to compete with the alternatives and was obsoleted. It has nothing to do with lock-in.
When Google Code shutdown, people just needed to add another git remote and push their code up there. There wasn't any vendor lock-in there either. The Go community managed to migrate everything to GitHub without any real losses.
This article is more probably to convince ppl to use Google Cloud rather than how to run multicloud strategy.