On the one hand, it's impressive that they've apparently built a general-purpose LLM interface for this. On the other hand, even before considering LLM limitations, trying to define app behavior of any complexity using plain language is a great way to get a buggy and inconsistent mess.
Yes, even when we get designs that passed multiple rounds of user interviews, even after developers went over them and approved and estimated work, we still end up finding undefined behaviors while coding it.
We would need AGI-level AI to code software from English and not produce buggy mess.
On the other hand, current top LLMs are pretty great at writing chunks of code if you specify exactly what you want. I am personally impressed with Claude 3.5.
I would also find developers who could take the slightest vague term, misunderstand it in a 'technically correct' but to everyone else, outrageous way.
To counter this I used to hold short meetings with them every single morning to pull them back into line! We also shortened every build into bite sized chunks so the humans could check they didn't go off-piste. We called this Agile.
Poor communication between humans is seemingly insoluble. Perhaps a computer misunderstanding you in real time so you can correct is the best solution?
Exactly. Even when building websites for clients they often ask for mututally incompatible things within the same sentence, and there's often hidden complexity buried in the most innocuous-sounding sentences.
Indeed. This one reason why when humans describe what they want from their software, developers build something entirely different from what was requested.
I feel like syntax actually helps in writing unambiguous logic. Doing away with syntax and writing in plain English might sound nice to someone who is just starting to learn how to program and it's tired of not understanding syntax, but I feel like doing away with syntax is really doing a disservice to any actual application development.
Right, English is just a plain bad way to tell a computer to do something. The signal to noise ratio is too low and references are too ambiguous.
You see this with the ridiculous "prompt engineering" crap that has sprung up where people are slowly reinventing programming languages but completely unspecified. Something that could be specified in 4 lines of godforesaken YAML ends up taking 40000 words of very bizarre English and using enough energy to roast a whole pig.
Yeah, what I'm saying is I'm unconvinced that this goal is a good idea to begin with. Even if executed perfectly, I don't think it will be able to compete with programming languages for the purpose of programming.
Whether it's a namespace collision or not, it's uncool of them to steal the name of Apache Spark, which was an influential project in machine learning, and must have helped generative AI get to where it is today.
Apache Spark casually gets called Spark, and it especially gets called Spark when referring to subprojects. For instance there's Spark Core, Spark Streaming, and PySpark. https://en.wikipedia.org/wiki/Apache_Spark
Trademarks are automatic when certain criteria such as usage in commerce are met, at least under the statutory or common law of most/all US states and some other countries with English colonial histories; certain US federal protection under the Lanham Act can also be automatic. You’re probably thinking of registered trademarks which are indeed not automatic. Both kinds are often illegal to infringe, but unregistered trademarks have fewer and weaker remedies in court.
Whenever you see the raised TM symbol, that’s claiming trademark status but not necessarily registered trademark status. The R in a circle is restricted to registered trademarks.
You can see here that Apache’s trademark for Spark is unregistered, but that doesn’t make it invalid (at least in the US and the other countries I alluded to): https://www.apache.org/foundation/marks/list/#unreg_s
Fun fact: it’s also possible to register trademarks with US states, but that’s almost always worse than doing so federally with the USPTO, and federal registration is valid nationwide. State-level trademark registration is mainly a historical artifact predating the federal trademark system, but it’s still technically available.
Wouldn't be the first time Microsoft (And other big corporations, anyone remember Apple iOS vs Cisco IOS?) behaves this way [0] . I'm sure there are more examples out there.
We (the GitHub Next team) use and love Apache Spark. So we made sure to connect with ASF before releasing GitHub Spark, and confirm they were comfortable with us using this name.
We felt like there was sufficient difference between the two products, that there wouldn’t be any confusion. Especially with the target audience that GitHub Spark ultimately intends to reach.
That said, we plan to validate this during the Technical Preview phase. Since we absolutely want to be respectful of Apache Spark, and its impact on software.
The GitHub Next team sounds more like a department than a team. It also sounded odd to hear that a group of that size uses and loves something that is unlikely to be used by a large percentage of the team directly. Yay corporate motivational speak! I can use another look at https://despair.com/collections/demotivators after this.
When GitHub Next asked for this, there was already pressure in place for ASF to give that to them, because they're locked into GitHub, so it may not have been given entirely freely. You can say that you're confident it was, but to me it seems impossible to know for sure. I don't know if it was or not. It might be that they would have decided to give it freely but the thought of their relationship with GitHub came to mind while they were considering it. In any case, it's a big ask, because if this takes off, soon the phrase Spark Application that another commenter mentioned will be ambiguous.
With more than 10, I tend to prefer the term department, though the term department could also be used for an organizational division with a smaller number of people.
On the GitHub jobs page, there isn't such a selection, but in autocomplete there are two results for team and none for department or group: https://www.github.careers/careers-home/jobs
They could've called it GitHub Moist, or Github Fungus, or Github Bequeath. They didn't for a very simple reason:
The vocabulary isn't infinite. And within that finite number there's an even smaller set of words you would use for a product. And within that there's an even smaller set of words that are a combination of some of all of pleasant/short/applicable/relevant.
And Spark is a choice because the icon of choice for all things AI-related seems to be the sparkles emoji.
"Experienced developers can still see and edit the code"
I'm curious about the quality of the generated code. Does it adhere to any principles? Is it readable to developers? Is it easily extensible?
I love the idea of a code generator for developers that can generate a scaffold from natural language, but I am dubious of an entire app which can be generated from natural language.
For many people describing an app is the challenge itself. That’s why we have digital product managers who use formal methods like user stories to describe the system or write specs and requirements documents for engineers.
There's no it way won't suck, so it's good to github built in a way to iterate and called it experimental. We need much better llms, or some other technique, before we can have anything like this.
People have been trying this building software in plain English bullshit for at least 65 years, look into the history of cobol. It didn't work last time, or the time before, or the 50 times before that. It absolutely will work someday, but there's no compelling reason to believe that day is coming in the next month or two.
Oh yeah Cobol absolutely worked as a programming language, but it never lived up to the promise of being a plain English replacement for code which was a common promise at the time.
of course, but to evaluate or train better LLM you need to have first some framework/application like gh spark working.
just look at aider and how well it's able to estimate llm coding performance.
like with everything in the past we need first to build something that is helpful for some cases and then improve it every generation.
I believe spark will be great for some internal tooling or some personal side projects, you don't have to handle every corner case in such application
Github Spark looks like v0.dev, and v0 is much faster than me in building a good UI in React. It also often creates better UIs than me. But after the basic interface is up, it’s better for me to connect it to the backend or add interactivity myself (with the help of inline copilot completions of course).
In the end, it does speed up my development of applications.
This is going into the wrong direction. Instead of using plain English to describe programs, we should rather use programming languages to describe the rules of our society.
Because nowadays lawyers, judges and politicians are trying to write and interpret the rules for our society in plain English and the outcome is a mess. E.g. all those tax loopholes are in principle programming bugs, which should have been caught by (automated) tests, but doing this in plain english is just much too ambiguous.
I'm thinking a lot about these tools, and the biggest challenge in the space seems to be figuring out the right unit of work for AI to take on.
A signle class/function is too small to be that helpful, a whole app is too big and complex. A whole ticket is also still too big. What's inbetween? If you could divide projects into units of work of consistent complexity calibrated to AI's abilities, then you could probably get really good results.
The problem with ai is the error checking piece. It’s nice that it can do this but I don’t see evidence it can validate what it generates at scale. Even then, how would you validate an ai validation ?
Might be the "new Excel" as in bringing some light programming to non technical people, but hopefully only for internal tools and things I don't have to use.
AI might be nice to generate stuff. But as all code generators it has the issue of supporting the generated code long term.
With AI it is even worse: You don't get reproducible runs: Trying to fix a simple layout bug and suddenly some updated AI generates something completely different because it learned some new tricks and you cannot go back to what you had before and suddenly the whole page doesn't work anymore?
Ya, even claude artifacts have replaced scripting for me to some extent. For example, if I need some data transformation thing "Create a web app that takes text input and does X, Y, and Z".
It's correct for me 99% of the time, and the remainder I can trivially ask it to tweak something. (especially since for those kind of one-off tools, I really don't care about the actual UI and styling)
Beats figuring out the right incantation of JQ, regex, whatever other tool I only use every 3 months, ... every time. And I can trivially just go back to that artifact and iterate on it later.