Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
What do people mean when they say “transpiler”? (composition.al)
77 points by tosh on Sept 3, 2017 | hide | past | favorite | 74 comments


Back in the mid-80s, I frequented BIX (the BYTE Information eXchange). There was a discussion group for the C programming language, and someone mentioned this new C++ language. I naively asked, "C++? That's a preprocessor, isn't it?" I was thinking of the original Cfront, of course, which translated C++ to C.

Bjarne Stroustrup jumped in and gently chewed me out: "The Cfront compiler is not a preprocessor! It is a full-fledged compiler just like any other. Sure, it uses C as the compilation target right now, but it won't always be that way. It could generate native code, but compiling to C let us get up and running more quickly and on more hardware architectures than we could if we targeted machine code right away."

(Those weren't his exact words, but definitely the gist of it. Some things like this stick in your mind.)

If "compiler" was good enough for Bjarne, it's definitely good enough for me.

Now jumping ahead to the present, consider Kotlin. Its compiler can target JVM bytecode and JavaScript, and they're working on native support using LLVM for the final code generation pass.

Is Kotlin a transpiler when it targets JavaScript, and only a true compiler when it targets the JVM or native code?

It just seems like a silly distinction. Why bother having a separate name? A compiler is a compiler.


It is silly. I think part of it is simply a cultural divide between old school C programmers and young front end web developers.

There used to be two spaces after a period, oxford commas, and 'literally' meant 'as it was written'. Languages change (for better or worse). Kids these days.


Hey, you're welcome on my lawn any time.

Some years ago I went on an editing binge on Wikipedia, finding a bunch of templates that "misused" the phrase "due to" and changed them to "because of". As we all know, you must get this right:

"The flight delay was due to bad weather."

vs.

"The flight was delayed because of bad weather."

but never:

"The flight was delayed due to bad weather."

The next day, someone reverted all my changes and asked, "Mike, have you looked at a dictionary lately?"

Oops.

In that moment I was enlightened: I realized I could care less.

I even leave out the Oxford comma as often as I use it. In fact I leave out a lot of commas I used to put in.

But there is one case where the Oxford comma is still necessary. If you leave it out of this joke it ruins it:

"There are two hard problems in computer science: naming things, cache expiration, and off by one errors."


> I realized I could care less.

I see what you did there. Well done.


Webster doesn't mention any change in use, simply:

"care less : not to care — used positively and negatively with the same meaning <I could care less what happens> <I couldn't care less what happens>"

¯\_(ツ)_/¯


I could or couldn't care less.

The material is very flammable or inflammable.

https://www.merriam-webster.com/words-at-play/flammable-or-i...

English is confusing.


The first doesn't mean the same as the second, though. O.o


(On the off-chance you're not being tounge in cheek): No, "used positively and negatively with the same meaning", as in the two differently phrased idioms mean the same thing:

http://www.dictionary.com/e/could-care-less/

http://www.slate.com/blogs/lexicon_valley/2016/04/05/the_rea...

https://xkcd.com/1576/ (!)


> the different intonations used in saying “I couldn’t care less” versus “I could care less.” [dictionary.com link]

> people who say “I could care less” are omitting the first part: “Like I could care less,” [Slate link]

These quotes seem to sum up those links.


I suppose, these days, for the sake of clarifying that I'm being sarcastic, I should just start saying: "I cloud care less".


To be perfectly honest, I'm kind of 50/50 between a genuine vexation with "could care less" being semantically wrong, and a cheerful agreement that usage defines meaning. The world's inconsistent and so occasionally, in response, am I. :P


Just use I'd'nt careless.


Though it could simply mean the realization that one can choose to care less about prescriptive language rules


Yes I think that's exactly what Stratoscope meant, and it was funny because that expression is a classic language peeve, as it is an illogical deformation (of the phrase "I couldn't care less") that is becoming the norm... but in that situation it made perfect sense. So I was showing appreciation for that nice touch.


Even funnier, when I wrote that I actually did just intend it in the sense of "could care less" and "couldn't care less" meaning the same thing. It was an example of how I had been too fussy about language rules.

But now that you've explained it, I see it works the other way too!

"Finally I learned to care less about these kinds of language rules."

There is something delicious about two seemingly contradictory phrases that mean the same thing, and one phrase that can mean two opposite things.


Cache invalidation. Cache expiration is easy.


Thank you! That's what I get for trying to tell a joke at 2AM...


Can you explain why the "never" was even "wrong"? That seems totally fine to me, and I don't understand why you would think otherwise.


No, I can't explain it. It was just an arbitrary rule I was taught as a kid - and that was quite some time ago! There was never a reason given for it, just one of many rules we were supposed to remember.

A lot of other people were taught the same rule, but hopefully they've realized as I finally did that it just doesn't matter.

Here are some interesting discussions from both sides of the fence:

https://www.google.com/search?q=due+to+because+of

Joseph Emonds argued in the paper "Grammatically Deviant Prestige Constructions" that the underlying reason for some of these arbitrary distinctions was as a social class signifier: if you used "correct" grammar it helped indicate to others that you were a member of a more prestigious class:

http://fine.me.uk/emonds


> 'literally' meant 'as it was written'

Literally has been used to mean figuratively for well over 100 years. (If I wasn't on my phone I'd paste in a list of usages from famous literature going back 150 years or so, that I have stored on my computer)

That's one of the big issues with prescriptivism—often the person making or enforcing the prescriptions has no idea of what the truth actually is. Especially in language, but it crops up in software as well.


You are correct about the word "literally," but I am in the camp that feels we should proceed cautiously in such matters. Just because a published professional got away with a mistake or made a deliberate artistic choice in 1905 doesn't mean it should become mainstream usage. When the word "literally" can mean, casually, either "literally" or "figuratively" then it ceases to have any real meaning. The figurative use is best when used with extreme obvious hyperbole. For me to say "I care enough about this post that I am literally standing up right now" when I am, indeed, sitting down, should be seen as a mis-use that weakens the word. Done frequently and it introduces avoidable entropy into language.

See also: enormity.

We really do need to take some care here or else we'll soon be spelling "lose" as "loose." Although that particular issue is probably coming from non-native speakers, it's being picked up and used by others and may eventually become accepted. That will not ease our ability to communicate.


The first time I saw 'enormity' used as a synonym of 'vastness', it was as if I had stumbled while walking, as nothing in the text up to that point indicated anything bad happening. 'Enormity', however, invites misunderstanding, and it would be better if it were retired, though that is not going to happen.


Yeah, when I see that so often it makes me think I am loosing my mind.

Wait, maybe that's a good thing: "loose" (v): to free from restraint, to make less rigid, tight, or strict. (M-W)

Here's an odd one for you. Many of my younger friends say "on accident" instead of "by accident", e.g. "He didn't mean to do that. It was on accident."

It bugs me to no end; it just seems wrong.

But it's surprisingly logical: after all, we say "on purpose", so why not "on accident" for the opposite?

I don't think I will start saying or writing it that way, though!


I noticed a lot of what was illogical about English became apparent when I started interacting with students and professors who had different native tongues.


> When the word "literally" can mean, casually, either "literally" or "figuratively" then it ceases to have any real meaning.

i don't think i can agree here, at least not with the gist. in a sense, words already don't have 'real meaning' (they're just noises or scribbles depending on the medium), their meaning or communicative value is bestowed by their actual use in a specific language, place, time, etc.

if you mean to say that we can't communicate the non-figurative sense of 'literally', consider: when someone whines 'you mean figuratively, not literally', this necessarily implies that the complainer understood what was said. if they can comprehend this distinction, it would seem utility remains and the complaint is informed mostly by a resistance to change (imo).

> We really do need to take some care here or else we'll soon be spelling "lose" as "loose."

this seems like a separate issue. the words 'lose' and 'loose' are not being used interchangeably (afaik); my sense of it is that it's just an issue of transcription because the spelling is extremely similar.


I would suggest that if someone "whines" 'you mean figuratively, not literally' that they may be double-checking (depends on context of course). Or perhaps the complaint may be due to the fact that the complainer does not know that "literally" can mean "figuratively."

I agreed that the word "literally" can be used figuratively, but this use needs to be obvious- kind of like pronouns. I personally prefer to avoid that use because I am not a particularly talented wordsmith.


> doesn't mean it should become mainstream usage

Perhaps, but what I'm arguing is something slightly different: that the idea "literally used to only mean literally but recently it started being used to mean figuratively" is a myth.


You are correct. I am whining about something like that happening, and noting that while correct its use (obvious hyperbole) is subtle and not one that many who use it that way pick up on.


I have prescriptivist tendencies, but I agree that "It has to be that way because it's always been that way" is a stupid argument.

I'm simply of the opinion that language that leads to misunderstanding is bad. That's why I correct anyone who I suspect to use the word "literally" for the opposite of its meaning, and why I use Oxford commas.

I also like when the meaning of a sentence is inferable by only knowing the words and simple grammar rules. That's why I hate "could of" or "I could care less".

I will also roll my eyes if you ask me to use double spaces after a period. That rule was born from a technical limitation that's long gone.

In essence I think that conserving ancient rules and meanings is just one branch of prescriptivism, and it's unfair to judge everything prescriptivist based on that one radical branch.


I like that you used the Oxford comma in the sentence about it.


People need to spend more time reading real books and less time reading made up rules.


'Transpiler' is an ugly smushword though. I hope it dies.


I don't know if I'm right or not but I tend to use the word "transpiler" to talk about something that transforms into the same language as before.

So for example Babel is a transpiler since it turns JavaScript into JavaScript - it just allows you to use certain features (usually of a later version, or a draft) in earlier versions. So I write es8 and use babel to turn it into es5 compatible javascript, I'd call that transpiling.

However, sticking with JavaScript, I'd say that coffeescript -> javascript is compilation since coffeescript is not the same language and very syntactically different.

So I guess converting one language into another syntactically different language is a compiler. While converting a language into the same language but transforming specific parts for whatever reason (such as interop or backwards compat) is a transpiler.

That's how I tend to think of it anyway.

But I agree, probably could just use compiler for all of them, especially if not using it means added ambiguity.


I'd say a transpiler is a specific type of compiler. But yeah, it's not clear if the extra name is really useful.


A "compiler" translates code from one form to another, and while historically that other form is machine code, ther's no requirement that this is the case: Java translates to JVM bytecode.

People get upset when told Ruby is a "compiled" language, as technically the code is compiled into bytecode, because overall this is in the context of an interpreted language. Things do get blurry when it boils down to JVM bytecode in JRuby, which then goes on to have a JIT pass done on it...

Even the CPU itself "compiles" the incoming instructions into internal ops. It's translating code into code.

It's compilers all the way down.


Many other people in this thread are saying this too, but they all define the "specific" characteristics of a "transpiler" differently :-)


> Is Kotlin a transpiler when it targets JavaScript, and only a true compiler when it targets the JVM or native code?

I'd say Kotlin is a language with different implementations, like clojure, ruby and others.


Does Kotlin have an abstract specification, or is it defined by a reference implementation?


RIP Byte


I have written transpilers and would not want to use the term compiler because it would be a deceptive description of the work involved. It would be a pretentious aggrandisement akin to saying I built a car because I changed its air freshener.

A production ready optimising compiler is one of the greatest achievements in software engineering. A level of complexity and sophistication few of us can reach. A transpiler is often just a trivial text transformation that a junior programmer can do. The purpose may seem superficially similar but the effort and expertise is so totally incomparable that a specific term is justified.

edit: it reminds me of the difference between a ship and boat. A transpiler will use some-else's compiler/interpreter but a compiler won't use someone else's transpiler. The two terms are meaningful to those intimate enough to need to distinguish degrees of sophistication but seem redundant to those who merely use them.


Sure, if you're doing a simple text transformation akin to the C preprocessor, then maybe that shouldn't be called a compiler. But we already have a word for that: "preprocessor". Which is probably why, as I mentioned in my other comment, Bjarne was miffed when I called Cfront a preprocessor - it was certainly nothing like the C preprocessor!

I guess in today's terminology I would have called it a "transpiler", because it simply transformed one source language to a fairly similar source language and didn't have to optimize the final machine code.

But there's a lot more to a compiler than optimization. Take TypeScript for example. Even though it generates JavaScript - and JavaScript that looks very much like the original TypeScript code if you're not having it translate newer syntax to older syntax - it does quite a bit of rather sophisticated work with all the type inference and type checking.

TypeScript doesn't need to worry too much about optimization, because it knows that the JavaScript engine that eventually runs the compiled code will do a bunch of JIT optimization.

Similarly, a compiler that targets LLVM or the JVM can rely on those engines to do much of the optimization. But the compiler is still a nontrivial piece of work.

Maybe I would suggest the term "preprocessor" for something that really is a just a simple text transformation, one that you might implement with regular expressions or hand off to a junior programmer.


I see a preprocesser as something distinct from a transpiler - a way to extend a language rather than transform it into another one.

This debate is really just about how terms fit a continuum from extension (supported by language constructs) -> preprocesser (add features that can't be supported by host language extensions) -> transpiler (support a totally different but conceptually similar language) -> compiler (conceptual differences from target requiring a more self-standing implementation that tackles fundamental problems).

The separation between the levels of sophistication are fuzzy and matters of degree. A single project might legitimately mature through the stages. For example, I don't object to calling Typescript a compiler once it began to generate code for sophisticated concepts not supported by the underlying language.


Sure, but there's a simpler and already-well-accepted convention for doing this. Say "interpreter" if it takes a program and causes its effects to happen, or "compiler" if it takes a program and emits an equivalent program. If somebody says "transpiler", just mentally replace it with "compiler". Done!

Composition of compilers still works as normal. You can use tombstone diagrams to reason about complex chains of compilers and interpreters. And it demystifies the act of building compilers, which is important because otherwise you'll be shaming junior engineers for attempting what is not just the pinnacle of computer science but also our bread and butter.

Seriously! Of all the definitions of "transpiler" that I've heard of so far, "compiler but shitty" is the worst yet.


I think you are making a very valid point, but is there any practical problem with simply calling them translators?


My take is that a "compiler" always compiles down (i.e. throws away information that cannot be recovered from the result), while a transpiler may actually be lossless—you could potentially write another transpiler to go in the other direction, and the two together would form a bijective function.

On the other hand, sometimes "transpiration" involves both throwing away information, and then either heuristically recovering information (i.e. compiling to assembler, followed by decompiling to the target language) or inventing information (i.e. compiling to object code and then wrapping that object code in a VM written in the target language.) You wouldn't call a program that involved either of these a "compiler"; it would most certainly be a "transpiler" only.


Citation(s) needed for "compiler" always compiles down.


Despite the variation in meaning, I think what ‘transpiler’ means locally to a particular technology group is usually well understood by its community. For example, if you’re a JavaScript developer and someone mentions a transpiler, I think it’s safe to say you immediately think of something like Babel (although interestingly, it describes itself as a compiler) and it’s generally well understood that it means a tool that takes either a JavaScript alternative or currently browser-unsupported JS and builds something that a wide range of browsers support.

Is local understanding good enough? I’d say probably, because if you’re unsure what a particular transpiler does, you could just research that particular tool. And, in the case of JavaScript at least, when you’re in deep enough to learn about transpilers and build systems, the myriad tools available for slightly different approaches to the same job is probably more of a concern than the generic word used to describe them.


I know it when I see it. Having more precise words makes communication faster and I agree that people who talk about transpilers know what they mean. I suspect that most people who criticise the term understand its meaning very well too. They just don't like it.


I'd say if it doesn't output bytecode, its a transpiler. In bytecode I'd include JVM, LLVM and actual machine code. There is some ambiguity for python, since python files are transformed to python bytecode before being interpreted.


So Chicken Scheme uses a transpiler and not a compiler? What about Nim?

I don't think that's s sufficient definition when these fully capable languages call their tools that output C code "compilers."

Honestly, the article's definition with levels of abstraction seems reasonable. The languages I mentioned are high-level and C operates at a lower level of abstraction, therefore they use compilers.


Indeed, Wikipedia also defines it this way. Compiling from Nim to C is moving from a high-level of abstraction to a lower level and so Nim is a compiler.


Why do they have to be mutually exclusive? Transpires are the subset of compilers that target high level languages.


Does that mean that a "compiler" which outputs the WebAssembly text format, relying on other tools to convert that to the WASM binary format, is a transpiler?


That is an interesting edge case. I might amend my definition based on 'ease of translation' arguing that e.g. webassembly text and macro-less x86 assembly are so close to bytecode as to be equivalent.

In the end though, it seems hopeless to keep a sharp distinction definition of transpilers or compilers for that matter. Would something that turns WebAssembly text into WASM binary qualify for a compiler? Does an x86 assembler qualify? What about headless browsers that create screenshots?


How about this definition?

A compiler is anything which takes an input and produces some kind of executable code. That "executable code" might be machine code, bytecode, another intermediate language (i.e WASM text, JavaScript, C, etc.), or whatever.

A transpiler is a form of a compiler which could be implemented as a parser + pretty printer; something where the semantics match an existing language, instead of having a complete "code generator". CoffeeScript is a good example of that, and so would for example a language which interpret Python code with braces and insignificant indentation, and outputs Python code without braces but with correct indentation.

One problem with this definition is that the C preprocessor would fall into the compiler category, because it produces C code. Maybe it would make sense to amend the definition of compiler to exclude macro languages; or maybe it would makes ense to include them in the overall compiler category and give them their own subcategory like transpilers.


And how about a C "compiler" that outputs assembly, and relies on an assembler to generate machine code?


One case the author left out was translation to "idiomatic" higher level language, which involves identifying abstractions that are intrinsically present, but where never explicitly expressed, in the original code. For example I've been working on a translator/converter from C to a memory-safe subset of C++[1]. The output is intended to be readable and maintainable as source code in its own right.

So when for example, encountering a pointer in the original C source, you have to determine from context whether the author is using it as an iterator to a fixed-sized array, an iterator to a dynamically-sized array, an "owning" reference to a dynamically allocated object, or just a weak/observer reference to an object, and translate to the appropriate "higher-level" element.

So in a sense it's a "decompiler" to a (higher-level) language the code was never compiled from. As a source-to-source transformer, presumably it would qualify as a "transpiler". But does that term have an connotation that the output is just an intermediate translation not intended to be maintained or used directly?

[1] https://github.com/duneroadrunner/SaferCPlusPlus-AutoTransla...


I think you are describing a source to source translator, and an awfully fancy one at that. A translator generally would be trying to map or maintain the structural idioms in the source code, allowing for human understanding or maintenance of the translated code. But to improve the coding style by detecting latent idioms seems a bit much even for most translators.

I always considered a transpiler to be closer to a translator than to a compiler, but without the translator's concern for maintaining human-readable code. For me, the boundary between transpiler and compiler is in the runtimes.

A transpiler would be targeting the whole runtime of a target language, i.e. one with non-trivial type systems, flow control/exception handling, and memory management/GC. The transpiler maps source language runtime concepts to target language runtime concepts in a relatively straightforward fashion. A compiler targets some lower-level abstract machine language and provides its own distinct runtime system as well.

A similar boundary exists for interpreters, where a meta-interpreter does relatively high-level source-to-source conversion before delegating to a target language interpreter and its runtime. These meta-interpreters are akin to transpilers, while full blown interpreters are akin to compilers, targeting a lower level abstract machine and providing their own runtime systems.


> where a meta-interpreter does relatively high-level source-to-source conversion before delegating to a target language interpreter

This is most emphatically not what Prolog meta-interpreters do. (Those are the ones I am most familiar with.) They do not build up new source code, they interpret given terms in "new" ways that are not built into the Prolog implementation. Systems that build new source code are called expansions or sometimes macros.

I don't think the Lisp world would call such systems meta-interpreters either. There, too, you have macros that transform source code.

Do you have a reference that uses the term "meta-interpreter" in the way you are describing?


Sorry, it's been many years and I misremembered the term "meta-circular evaluator", i.e. everybody's toy implementation of an interpreter with a REPL and very simplistic (if any) changes to the source language being interpreted.


> a translator/converter from C to a memory-safe subset of C++

There you go. Both "translator" and "converter" are great names for this. No extra jargon needed.

There really is way too much jargon already, and more often than not in obfuscates rather than clearing things up. Resist the temptation.


Perhaps a "compiler" transforms to lower-level languages (which includes Javascript, C code, VM bytecode, and chip instructions), a "decompiler" will transform to higher-level languages, and a "transpiler" will compile then decompile.


My main issue with the term "transpiler" is not its existence per se. It is the effect that its existence has on a large portion of developers, having them reject some kinds of technologies for the wrong reasons.

If we look back to when the term "transpiler" was made popular (not necessarily coined), it is fairly widely acknowledged that it was through CoffeeScript, which defined itself as such. In a sense, I think CoffeeScript was right (or at least not wrong) to define itself as "transpiler" rather than "compiler". After all, it was a syntax tree to syntax tree transformation, technically involving no more than a parser and a pretty-printer (not saying there's anything wrong with that).

However, because CoffeeScript was the first language to compile to JavaScript, swaths of developers have associated the term "transpiler" to "compiles to JS" or "compiles to another language that also happens to be used as source language". Now, all these developers will systematically refuse to call TypeScript, ClojureScript, Scala.js, etc. as "true compilers" (let me list a few more in alphabetical orther so I'm not perceived as totally biased: BuckleScript, Elm, Flow, Kotlin, PureScript). Instead they insist on calling them "transpilers" and associating their characteristics to that of CoffeeScript.

Now that CoffeeScript is falling out of favor (in part because most of what it brought to the table has been picked up by ES2015), this category of developers associates any kind of language that compiles to JS as a thing that no one would ever want to use. Worse, they will often consider such languages as insults to their craft. "Why don't you just learn JavaScript and code in it?", they ask. This is a cultural problem, because this mindset prevents from even considering what other languages can bring to them.

Technologically speaking, there is absolutely nothing separating ClojureScript/Scala.js from Clojure/Scala. The former compile to JavaScript; the latter to JVM bytecode; but the amount of compiler engineering that go into all these compilers is basically the same. However, somehow, culturally, they are fundamentally different: ClojureScript/Scala.js devs should just learn JavaScript, while it's OK for Clojure/Scala devs not to "learn Java".

In the end, the existence of the term "transpiler" has a negative effect on the perception that many developers have on the quality of languages that compile to JS. You will often read things like "such transpilers always leak JavaScript in the end", which is simply not true for ClojureScript and Scala.js. Or "interop with JS libraries is always an issue with transpilers"; again, not true. And those misconceptions make them reject and bash on similar technologies, for no good reason.

And that is why I loathe the term "transpiler".


In the end, the existence of the term "transpiler" has a negative effect on the perception that many developers have on the quality of languages that compile to JS.

Is this actually true? Have there been actual widespread complaints about, say, TypeScript's quality because CofeeScript was labeled similarly?


Fair enough. I think TypeScript enjoys some "protection" from the obvious complaints because it advertises itself as (and it is) "just JavaScript", with types. Therefore the complaints I often see about other languages compiling to JS are kind of moot or very easy to dismiss. For example "it eventually leaks JS" is dismissed as "yeah, duh! it is JS".

TBH I do not have too much contact with TS other than its type definitions, so the experience I describe may not be relevant to TypeScript. I have repeatedly seen it for languages that provide a different set of abstractions than JS, though.


The post you're responding to is probably referring to languages that are further afield from JavaScript than Typescript (or coffeescript).

After wrestling to get get purescript code within a factor of four difference between IE 11 and modern Chrome, I think the suspicion a lot of folks with a background in JS have for "compiled" v. "transpiled" is well deserved.

Just because you target a runtime doesn't mean the nuts and bolts of ergonomics like existing library usage, profiling, and debugging are covered well.


I would define a transpiler as a compiler that targets a language that is (i) high level and (ii) typically a source language in its own right. I think that accurately captures most people's usage of the term. I don't see anything objectionable about having a term for a specific subset of compilers.


Your definition also includes decompilers though, which go from low-level to high-level. Or is a decompiler a type of transpiler? Oh brother!


You want to know what they mean?

"I haven't studied CS and don't understand that all compilers are just translation layers from one language to another"


Words mean whatever the people want them to mean. Just pick whatever word you want. That's how language evolves.


Yes, especially people as a collective rather than individuals.


XKCD related to the topic: https://www.xkcd.com/1860/


Erm.. in social discourse maybe, but in technical language its generally quite explicit. This is why we have the design pattern language. To communicate effectively, technical concepts.

If someone refers to a MVC implementation as Command-Pattern, correct them. The same goes for language definitions.


I'd say AST to AST translation is a transpiler. It keeps the high-level structure of the program intact.

Once you break down the AST into basic blocks/CFG, then you have a compiler (if the output is from the "lowered" representation that has lost its high-level shape).




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: