I'm glad this is out, i'm going to deploy this locally and learn as much about it as possible. Oxide is pretty much the company I dream to work at, both for the tech stack, plus the people working there. Thank you Oxide team!
Can you get me excited? I spent 20 seconds browsing the homepage and walked away with "so the idea is vertical integration for on-premise server purchases? On custom OS? Why? Why would people pay a premium?"
But immediately got myself to "what does a server OS do anyway, doesn't it just launch VMs? You don't need Linux, just the ability to launch Linux VMs"
> so the idea is vertical integration for on-premise server purchases? On custom OS? Why? Why would people pay a premium?
As I understand it, re: vertical integration, the term is actually "hyperconverged". Here, that means it's designed at the level of the rack. Like -- there aren't per compute unit redundant power supplies. There is one DC bus bar conversion for the rack. There is an integrated switch designed by Oxide. There is one company to blame when anything inside the box isn't working.
In addition, the pitch is they're using open source Rust-based firmware for many of the core components (the base board management controller/service processor, and root of trust), and the box presents a cloud like API to provision.
If the problem is: I'm running lots of VMs in the cloud. I'm used to the cloud. I like the way the cloud works, but I need an on-prem cloud, this makes that much easier than other DIY ways to achieve (OMG we need a team of people to build us a cloud...).
> but "hyperconverged" isn't really what we're doing.
This is fair, especially as it follows a discussion of how you're building at the rack level, which is more like a mainframe than current "hyperconverged" offerings. Recommend others read the linked post.
It seems like the folks on HN tend to think the world runs on AWS (I'm not trying to say they don't have a huge market share), but many huge enterprises still run their own datacenters and buy ungodly amounts of hardware.
The products that are on the market for an AWS-like experience on-prem are still fairly horrible. A lot of times the solutions are collaborations between vendors, which makes support a huge pain (finger pointing between companies).
Or, a particular vendor might only have compute and storage, but no offering for SDN and vice-versa. This sucks because then you have two bespoke things to manage and hope they work together correct.
These companies want a full AWS experience in their datacenter, and so far this looks to be the most promising without dedicating huge amounts of resources to something like Openstack.
The "(finger pointing between companies)" took me from confusion to 100% understanding, was at Google until recently. It was astonishing to me that it was universally acceptable to fingerpoint if it was outside your immediate group of ~80 people.*
Took me from "why would people go with this over Dell?" to "holy shit, I'm expecting Dell to do software and make nvidia/red hat/etc/etc etc/etc etc etc help out. lol!"
* also, how destructive it is. never, ever, ever let ppl talk shit about other ppl. There's a difference between "ugh, honestly, it seems like they're focused on release 11.0 this year" and "ughh they're usless idk what they're thinking??? stupid product anyway" and for whatever reason, B made you normal, A made you a tryhard pedant
Is AWS outpost truly a full AWS stack/experience? I thought it wasn't actually meant to be a "data center in a box" experience, but more so a way to run some workloads locally when you are already using AWS for everything else.
Some data products will run successfully in AWS Outposts. Others will not. For example, AWS itself can't run DynamoDB in an AWS Outpost. It recommends users to run ScyllaDB in DynamoDB compatible mode.
With DHH and others promoting a post-SaaS approach (once.com, etc.) we might see hardware refresh as cost-cutting. Astronomical compute bills and lack of granularity bring all things cloudy into sharp focus.
People don't usually throw out their server hardware after 3 years. After 3 years is up they'll probably sell service plans. And with the code being all open source some owners may go the self-supported route, though probably most will buy service plans.
Was actually referring to DHH and the 37signals products.
Whilst I think we will see a trend back towards more on-premise hardware I don't think SaaS is going away anytime soon. And in fact it's arguably better for everyone because the software is being continually maintained.
Where, exactly, are you getting these 3FTEs qualified to touch production OpenStack infra, for more than a year, where their aggregate cost is less than a rack of equipment?
Having a solid on-prem rack product to me is a great thing. I like IaaS services a lot, don't get me wrong, and I think they're the right pick for a bunch of cases, but on-prem servers also have their "place in the sun", so to speak :) I could present any number of justifications that I don't think I'm qualified enough to defend, but the gist is that at the bare minimum, I'm glad the option exists.
As to why I'm personally excited: I enjoy the amount of control having such an on-prem rack would afford me, and there surely could be a great amount of cost-savings and energy-savings in many scenarios. Sometimes, you just need a rack to deploy services for your local business. I like the prospect of decentralizing infrastructure, applying all the things we've learned with IaaSes.
In the last 10 years and 6 different clients/employers I worked there is pretty much no way to run production on the cloud. Only 1 of them had some stuff running in the (GCP) cloud at all.
Of all of the 6 infrastructures I've seen, only 1 of them is half decent, with 6 dedicated teams around the datacenter working closely together (by dedicated I mean, nothing is required of them concerning the core software product that the company develops). Network, Unix/Virtu, Windows, Storage, PC, and datacenter. That's 30+ people just to run a couple big datacenters and a few more server rooms. The service was actually quite good with VMs/zones delivered under an hour and most tech issues solved in half a day. The other infrastructures were either bigger or smaller, with more or less people, and were all terrible, sometimes needing weeks of email exchanges with excel attached to get a single VM.
AWS was the dream everywhere I went for everybody. Oxide may be coming out with a product that will solve a LOT of issues. SmartOS/IllumOS has all the tech to be self-sufficient (virtualization, storage, SDN...), add support for networking and storage and you get a complete product that a handful of people can run (well, you still need a windows team in most cases but fine).
Dell is a mixed bag depending on how well the individual region you are dealing with is doing overall. Things were great for us, but something changed and now getting good support for hardware failures has been a nightmare of jumping through hoops, time zone handoffs to other teams, and forced on-site techs to replace a stick of ram.
The best elevator pitch I've heard is "AWS APIs for on-prem datacenters". They make turn-key managed racks that behave just like a commercial cloud would with all the APIs for VM, storage, and network provisioning and integration you'd expect from AWS, except made to deploy in your company's datacenter under your control.
AWS's pricing model works kinda at their OMG eyewatering scale - aka all the custom hardware they design is highly cost optimized, but just doing custom hardware has a notable cost. This is easily covered by their scale, to make for their famous margins. [during their low scale times, they did use a good bit of HP/Dell, etc]
Oxide seems to be no different (super custom hardware) only major difference being the "in your datacenter" part. Since you own the cost of your datacenter, Oxide has to come in a lot cheaper to even compete with AWS, but how do you do that with low volume [and from the look of it not-cost optimized, but instead fairly tank-like] bespoke hardware? Feels like the pricing / customer fundamentals are going to be pretty rough here outside perhaps a few verticals.
Oxide seems to be a lot more efficient than a rack full of 1U servers with each having 2 PSUs + 2 ToR switches + 1 management switch somewhere for all the OOBMs. All those little fans and power conversions eat a lot of power, the fans and the PSUs all cost something too. Also, have fun managing all of that in a secure manner or debugging anything at all. Once you add the VMware licensing you might end up with more or less the same cost up front and quite likely higher overall cost. And I am not even beginning to talk about racking/ stacking of the whole rack. I haven't seen much support even when Dell/EMC owned VMware and together produced the VXRail lineup and the company I used to work for was presented as the reference project in Saxony, Germany at that time. All of the boxes would add up to about 2 standard racks but it was representative of the other bigish customers in that area and time.
I imagine, some of the customers will order 1-2 racks half full and over a few years possibly add a few sleds, these will probably demand great GUI/ manual experience and possibly competitive Oracle/ SAP/ MSSQL benchmarks and I can imagine Veeam integreation. Other customers such as the DoE or some big enterprise customers will order whole rows of racks and demand perfect automation options. That is just a guess.
Datacenter costs are weird. The first big cost is having a datacenter. However once you have the space, power, cooling and that part makes sense, then the actual hardware going into it can have a pretty decent premium and still be highly competitive with AWS. It will also depend heavily on what you are doing and producing, if the answer to that is a large amount of data, and it needs to transit out of AWS, suddenly the cost of a pretty large datacenter is really cheap in comparison. AWS egress fees have a markup that will make your accountants panic. From a hardware standpoint, once you need GPU compute or large amounts of RAM, the prices get pretty dumb as well.
> Oxide has to come in a lot cheaper to even compete with AWS
Which should be pretty easy. I don't know the exact costs but in a previous Oxide discussion the number 1M was thrown around. If that's roughly correct, that is comfortably less than a single year of AWS bills at most startups I've been in (except the very tiny ones < 15 people).
Haven't seen any performance numbers either so admittely estimating here, but from what I know about building racks of 1U servers and knowing that Oxide is more efficient, I can believe an Oxide rack should handily outperform the AWS VMs we (the startups) were paying >>>100K/mo for.
If these numbers are anywhere in the ballpark, an Oxide rack should easily be saving quite a bit of money already by year two.
You are not wrong that OpenStack is sort of similar in a sense, but the difference is that Oxide is a hardware + software product, and OpenStack is purely software.
Except that it use the same standard CPU as commodity machine. Doesn't have much of the extra reliability stuff. It can go from vertical to horizontal scaling. The OS is open source Unix. And yeah its not like a mainframe at all really.
I bet it costs a fraction of what a similarly powerful mainframe would cost. However I don't think the customers for each overlap that much. If you need a mainframe, you need one and there is no discussion about possible alternatives because there are none.
I'm excited to see how this compares to SmartOS. I'm pretty heavily invested in SmartOS in my personal infrastructure but its future, post-Joyent acquisition, has been worrying me.
I really wish I did work for an org big enough to use Oxide's gear. Not having to futz around with bogus IBM PC AT-type compatibility edifice, janky BMCs and iDRACs, hardware RAID controllers, etc, would be so unbelievably nice.
IllumOS needs to attract new developers. To do that, the platform build needs to become a lot more straightforward. It's a pretty huge endeavour in my opinion. I'd be happy to help out on that regard, but in the past Joyent has not been very open to outside support.
I've noticed you make this comment repeatedly when illumos is mentioned on HN. I think you're underestimating the irreducible complexity of the build process for what is essentially a whole UNIX operating system, save for a few external dependencies. It's not just a kernel, but an extensive set of user mode libraries and executables. The build is complex in part because it's a complex body of software.
I also think you're overestimating the extent to which make(1S) is the reason we're not more popular than Linux. There are any number of more relevant factors that make someone choose one operating system or another. Also, certainly for me personally my goal is not world domination, merely the sustainable maintenance of a body of software that helps me solve the problems that I work on, and which I enjoy using and developing as a result.
I agree we need (as do all projects!) new developers, both now, and over the long term. We work as we can to make improvements to the build process, and the documentation. We are a relatively niche project, but we do attract new developers from time to time, and we're making changes at least as rapidly as we ever have in the past. There are a number of actively maintained illumos distributions (OmniOS, SmartOS, Tribblix, OpenIndiana, and now Helios) and there are a variety of commercial interests that ship more proprietary appliances on top of an illumos base. For our part at Oxide we continue to encourage our staff to get involved with illumos development as it makes sense for them, and we try to offer resources and assistance to the broader community as well.
I do, yes, but your comment makes it clear that this is a problem that you either don't really think of as a problem, or that you don't know how to address. Building open source communities is hard work. Telling everyone how amazing your product is(even if it is), is only a small part of it. The lesson to take away from your time at Joyent should be that, that way of community building didn't work, and there needs to be some change.
Even in the early 2000s linux had a make menuconfig or make xconfig setting to build linux. And yes this is different, it's a posix distribution. Yocto was a relatively niche project as well and it also addresses the issue of building a collecting of posix applications into a big project, so does gentoo's stage.
I'm sure that at the time of it's creation OpenSolaris was ahead of its curve, but that's how many years ago? You know as well as I do that sprinkling LD_LIBRARY_PATHs here and there and then removing undocumented dot files here and there isn't really a sane way to handle such a build process for a curious third party. Most will probably drop it before it gets to that point.
There have been many many projects that have reworked their entire build architecture, some of which took years to flesh out fully.
What needs to happen for illumos to get a boost of development in the long term is:
1. first for you to acknowledge on a political level that there is an issue that needs to be addressed here, and
2. to then work with the community, and it doesn't have to be across the board, but you need to be willing to invest in some experts and some people interested in solving this, so they can grind out something that is more sane in this current world.
"Read our getting started guide" isn't really all that useful, when most of the complex issues happen after that and are often met with "this isn't how we do things".
> sprinkling LD_LIBRARY_PATHs here and there and then removing undocumented dot files here and there
I obviously don't have any context about the issues you were facing at the time, and I can't really figure it out based on the advice you ostensibly received. I'm definitely sorry if we have lead you astray in the past, but those are not workarounds I would encourage people to use today. If there's some aspect of the build process that requires workarounds like you're describing, it's definitely a bug and we'll fix it when we're made aware as best we can.
As for the rest of it, I think you're putting the cart before the horse on some level. An operating system is a large and complex thing to work on, regardless of whether it's built with make or ninja or bazel or whatever other build tool.
The Rust toolchain is another similarly complex body of software, which also has a large and at times inscrutable build process. I know because I have personally contributed to it, and had to figure out how to get it to work. Rust obviously has more active contributors than illumos, but it also has vastly more active _users_ -- it is a body of software that has broad applicability to many people and the work they do.
For illumos to continue to succeed as an actively maintained project, what we need to do is continue to inspire _users_ to want to use it. Nobody wants to work on an operating system they don't personally need to use at all. We draw contributions today from a mixture of community driven distributions making fixes or adding features, and by people employed by companies like Oxide who have a vested economic interest in the deployment of the software.
None of this is to say that we're perfect, or that we're not trying to improve things. Just that we're trying to put build system improvements in the proper context amongst all the other work there is to do with our limited resources. It's probably more important that we have support for new Intel client NICs like you would find in a modern desktop system, for example, than it is that we replace make. It's important that we continue to add system calls and libc facilities that other platforms have adopted in order to ease software porting. It's important that we continue to maintain modern JDKs and Python and Go and Rust and C/C++ compilers. It's important that we keep up with security issues and the endless stream of mitigations imposed by the sieve-like nature of speculative CPUs.
There's actually quite a lot of stuff going on for us all the time, and we do still find time to improve the build system. If you have more specifics in mind, that's fantastic and we'd love to here about them concretely! I would encourage you to channel your enthusiasm into writing an illumos project discussion (IPD) describing the issues you see and the work you'd propose to sort them out! You can see some examples of existing IPDs at https://github.com/illumos/ipd
And as ever, if you hit issues in the build as it stands, please file bugs! We can't fix things we haven't heard about.
I had been using SmartOS for a long time but finally had to bite the bullet and give up. I ended up deciding on Proxmox on a ZFS root and am quite happy with it.
I've been running smartos at least since 2015 where I co-located my server. There have been times where I felt like giving up, but people like danmcd, jperkin and others always stepped in and fixed what needed to be fixed for LX to be usable and working. (Keeping java updated and running is hard, uphill battle. Thanks!)
I always ran a mixture of OS and LX zones and bcantrill's t-shirt with "Save the whales, kill your VM" made sense. I've used zones in Solaris 10 even before and they just click with me. FreeBSD's jails are nice, but far from it. And linux's cgroups are a joke. And using KVM/VMs for security containerization is just insane.
At dayjob, I've implemented multiple proxmox clusters, because we're linux shop and there's no way to "sell" smartos or tritonDC to die-hard debian colleagues, but I've managed to sell them ZFS. With personal stuff, I like my systems to take care of themselves without constant babysitting and SmartOS or OpenBSD provide just that. I don't dislike windows, I love UNIX. You could really feel those extra 20y UNIX had compared to linux.
I migrated all my stuff to proxmox for like 2 months. And then went back to SmartOS, because there was something missing ... probably elegance, sanity, simplicity or even something you'd call "hack value".
And here I am, having compared the SmartOS documentation and ease of installation to Proxmox... and with very few complaints am using Proxmox to host a file server on bare metal/Samba container and a OPNSense on VM.
I remember buying the OpenSolaris Bible in 2008, getting really excited to dig into my second Unix (after FreeBSD). And then, the Sun went down on me... and I stuck with Ubuntu 10 years.
the nice thing about the Proxmox + ZFS setup is that it works and is even recommended without using hardware raid controllers. Less headaches either way.
I recently wrote a guide [1] how to use proxmox with ZFS over iSCSI so you can use the snapshot features from a SAN
I feel the same. I used a SmartOS distro called Danube Cloud for a long time and am looking to move and looked at Harvester[1] and OpenNebula, but with everything I know about Kubernetes(and LongHorn) I'm reluctant to use something so heavily based on Kubernetes.
At its peak I reached out multiple times to Joyent to fix their EFI support for virtualization. The Danube team had similar experiences with them, working on live migrations for VMs, and a few months back I did a rebase of the platform image to a more recent illumos stack.
Two of the fundamental issues with Illumos is that they don't seem to understand that they need to fix the horrendous platform build to get community support to keep up with the pace of development of other OS's. The platform build is a huge nasty mess of custom shell scripts, file based status snapshots, which includes the entire userspace in the kernel build. Basically if your openssl version is out of wack the entire thing will fail. Not because it has to, but because it was never adapted to modern needs of someone just wanting to hack on a kernel. It's fixable, but I don't see any desire to fix it, and even if that desire eventually shows up it might just be too little, too late.
> Oxide is pretty much the company I dream to work at, both for the tech stack, plus the people working there.
Same for me. Oxide is the only company I know that I'd really love to work for. Similar (I think, observing from the outside) to Sun. That's what I dream about.
Unfortunately their pay structure is such that I can't afford it, with a family to support. Maybe when the kid is out of university, if I don't need much income anymore, I can fulfill the dream.
> but they are mass-market consumer oriented which is not interesting.
Kind of. Apple focuses their high end gear on creative professionals. Us Unix geeks have much more modest needs, which are often satisfied by the average uninspired Dell design. Still, Apple has a decent Unix underneath all that glitter. At the same time, there are almost no desktop-friendly Unixes besides the free crowd. HP and IBM have given up on the Unix workstation market eons ago. IBM’s POWER gear can crush the best Xeons and Epycs, but they have nothing to compete with the “good enough” low end.
It’s a shame Oracle doesn’t offer Solaris on their cloud the same way IBM offers AIX (and Z, which, surprisingly, is a certified UNIX as well) on theirs.
Well... For me and a lot of people, it's still good enough - it runs MacPorts, Python, I can compile things and so on, but, TBH, I also have a couple Linux boxes lying around on my desk and on the network one hop away from my desk, so the deep hackability side is solved outside the Mac in my case.
You can always install OpenIndiana and use it. It’s been a while I don’t use it, but I gather it’s still a worthy daily driver (plus, Teams, Slack, Outlook, and others simply don’t have installable apps for it.