Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Haskell's Type Classes: We Can Do Better (degoes.net)
97 points by andars on Sept 30, 2015 | hide | past | favorite | 72 comments


An alternative approach is under development [1] for OCaml : modular implicits [2]. This is a mix of ideas from SML/OCaml modules, Scala implicits and Haskell type classes.

More than one alternative module implementation may be provided for a module signature, and the actual implementation to be used is chosen on the implicit modules available in the current context. This results into an error if the implicit module cannot be chosen without ambiguity.

[1] https://github.com/ocamllabs/ocaml-modular-implicits

[2] http://www.lpw25.net/ml2014.pdf


(very similar systems are already implemented in Agda and Coq)


The requirement to be globally unambiguous is the #1 wart of type classes. Rather than enforcing it with the compiler it would be better to make that restriction unnecessary. It should be possible to define a type class instance local to a module. Locally unambiguous yes, globally unambiguous no.


That's not good. We can put an instance into a runtime box, pass it to another module, and now there are potentially two different instances to choose from in the other module. We'd have to ban moving around instances in runtime data if we are to preserve class coherence, which is unsatisfactory.


> and now there are potentially two different instances to choose from in the other module [...] to preserve class coherence

I know the standard arguments for class coherence (Set etc.), but I haven't found any of them convincing. Relying on two values coming from two different places to be the same indicates that the design of the code is wrong. It's like writing a function f x y but you must only ever call it with identical arguments x=y. The right thing to do then is to give f only one argument to begin with.


I consider coherence seriously mainly because Edward Kmett supports it, and he has immense experience regarding the relative merits of coherent vs. incoherent classes in practical development. One core Kmett argument that I subscribe to is that 95% percent of the time there is no newtype noise needed, and in that case coherence is very nice, since one doesn't have to think about class hierarchies (which can get very complex, with many alternate instance derivations) as far as the code typechecks.

But aside of that, I'm not all that convinced of the importance of coherence in a dependent setting either! My intuition is that types should encode all the properties that we'd like to be able to use in instances. In other words, even if we don't get coherence, we can use types to constrain the implementation of instances as much as we like.

Highly ambiguous instance resolution (like a bunch of Monoid Int instances lying around) is a bit of a pain though, and there could be some "using instance" feature/syntax that locally throws out all instances except one.

Additionally, as of now there are no large production code bases that use Coq style type classes in a proper dependent language. There are very few sizable applications in dependent languages to begin with. So we don't know a lot about the practical caveats of that design, while there is a lot of experience and know-how about Haskell-style classes, which is one reason why we should at least consider salvaging coherence in future languages.

Generally, I think that type classes are a rather ugly and haphazard construction compared to the succinct elegance of the type theories of core languages; still, eventually type classes worm their way into languages (Agda, Coq) because people really like them and like to use them, so a language designer should consider them. I wish there was an elegant/general formal treatment that captures the essential features, in any case.


What if you'd have a way to disambiguate in such cases, just like you do with method overriding?

Tie the type definition to the point where the data is created. Default to the local definition, but have a way to select the other one if you need some particular interaction between modules. Wouldn't that solve the conflict?


But then my ability to insert elements into a Set depends on the location where those elements were created...that sounds awful.


That's exactly the same problem you have with every other Haskell name. Have you got much trouble with those on function names?


The origin of a name is statically known in the location (i.e. file, expression, whatever). The origin of data is dynamic and can vary at runtime based on arbitrary previous evaluations. Unless you're going to propose making types unique (i.e., not compatible) depending on their origin and enforce never mixing them in the compiler.

At which point you have basically reinvented "newtype"


If you care about coherence, you should be very dissatisfied with GHC anyway.


Current GHC by default either prevents me from getting incoherence or at least throws warnings on orphan instances. What matters is whether a) someone not caring about incoherence writes coherent classes by default, just by not trying to actively screw up the system b) if you're actively trying to write coherent classes, you have a very easy job. On both accounts GHC does well enough.


Yes, as long as you never use any third-party library and place all your code into a single module, GHC doesn't completely fail on its promises.

My concern is that this PhD-style development suffices for most Haskell users, but is not practical for any non-small project in the real world with more than one developer.


> is not practical for any non-small project in the real world with more than one developer.

This is not my experience of working with Haskell in the real world on a non-non-small project with (many) more than one developer.


Nor mine, FWIW.


The point is a little more subtle than this, or how the article made it out to be.

Global coherence and confluence (which GHC attempts to enforce unless told otherwise) is great for refactoring and equational reasoning and (yes) mathematics and programming at large, as Ed Kmett's fantastic "Typeclasses v. the World" talk covers:

https://www.youtube.com/watch?v=hIZxTQP1ifo

Global uniqueness of instances (which GHC does NOT enforce) is bad for modular programming, or programming at large, which is a point some of the commenters here are making.

http://blog.ezyang.com/2014/07/type-classes-confluence-coher...

The submitted HN link here has a terrible title. What the author is proposing is enforcing typeclass laws, which is a great idea (but difficult in practice to implement). Practice and time has shown that lawless classes are the ones that people hate, the ones that don't add any benefit and in fact make it harder to reason about large Haskell programs.

I hope people don't take away from this link that classes in Haskell are bad. Haskell is three things: pure, functional, and typeclasses. They're part of the identity of Haskell; they're the reason we set off on this wonderful journey into the uncharted oceans of functional programming back in 1990. Typeclasses are AWESOME. And they can be improved.

Postscript: Kmett also has a package that persuades the compiler to let you have dynamically generates instances. It has the same user experience as local implicits, which I think a lot of people here are asking for:

http://hackage.haskell.org/package/reflection/src/examples/C...

    λ> using (Monoid (+) 0) $ mappend mempty 12
       12
GHC is a big tent.


Has anyone here shipped anything production worthy with dependently typed languages? I'm serious. There are like 10 PhDs in France working on a certified compiler toolchain for C and that's it. Meanwhile Go and JavaScript are taking over the world and they're anything but dependently typed. Maybe there's a reason for that? Maybe the reason is that after a certain point the validation overhead of advanced type systems is just not worth it.

Yes, I know the article isn't about dependent types per se but many of the features he wants get pretty close to dependent types since that is pretty much the state of the art when it comes to advanced type systems with lawful abstractions and whatnot.


The reason is twofold. One, research into dependent types is still ongoing. Two, type system researchers promote provably correct programming as the advantage of dependent types. IMO the far more important advantage of dependent types is expressiveness and simplicity: they generalize a vast swath of different type system features and extensions to a handful of simple constructs. Well, arguably just a single construct: it generalizes the usual function type A -> B to the dependent function type x:A -> B(x).

This subsumes a lot of things like generics, existential types, GADTs, higher kinded types, rank-n polymorphism, etc. You can even use it to do record types, polymorphic variants, modules, refinement types, etc. The dependent function type is a universal lego block for types.

Even more striking, it unifies the type level with the value level. Instead of having a language of types and a language of expressions there is just one language. The type List<T> takes a type T and gives you the type of lists with elements of type T. In dependently typed languages this is just a function List(T) that takes a type and returns a type. Instead of having F<T> at the type level and f(x) at the value level, there's only one kind of function that does both.

I have no doubt that we will eventually be programming in a dependently typed language, the only question is when.


    I have no doubt that we will eventually be 
    programming in a dependently typed language, 
    the only question is when.
For the reasons I give in another posting in this thread (inability to extend dependent types beyond pure functions, and forcing the full cost of verification on early states of software productions when specifications are unstable), I predict the opposite: dependently typed verification will give way to program logic, the only question is when.


Why would dependent types not extend beyond pure functions? Idris has both dependent types and non pure functions, is that not satisfactory? The fact that you can write software that provably satisfies a strict specification in dependently typed languages doesn't mean that you must. It is perfectly possible to program in them like you would in a normal language. That's what I was trying to get at: dependent types are also more expressive, not just more constraining. Now that I know about dependent types I regularly run into situations where I think "gosh, if I had dependent types I wouldn't have had to put in these type casts or handle these error cases that I know for sure cannot happen".

So even if you ultimately want to reason about your programs using an external program logic (which I predict we do not, but we'll see), then for expressiveness you still want your language to have dependent types for the reason I mentioned above:

> Two, type system researchers promote provably correct programming as the advantage of dependent types. IMO the far more important advantage of dependent types is expressiveness and simplicity: they generalize a vast swath of different type system features and extensions to a handful of simple constructs.

Furthermore, if you want it's possible to use a program logic style of verification in a dependent type system. You can write your program as normal and later prove lemmas about functions in the program. For instance instead of saying

    reverse : Vec n a -> Vec n a
or

    reverse : (xs : List a) -> {ys : List a | length xs == length ys}
you can do

    reverse : List a -> List a
and a lemma

    reverse_preserves_length : forall xs. length (reverse xs) == length xs
It's just not the only option you have.

What's this forall you say? A dependent function type of course!


Thanks your your reply.

   Idris has ... is that not satisfactory?
I'm not an expert in Idris, but I think effects are encapsulated / encoded somehow in Idris. I don't think full Idris a logic, only the pure sublanguage (please correct me if I'm wrong), so you loose the coincident between programs and proofs. That probably means that there is some "encoding tax" when reasoning about stateful computation. The more complicated the effects the higher the encoding tax. Just imagine you'd have to reduce full concurrency (e.g. pi-calculus) to lambda-calculus. That would be painful.

    It is perfectly possible to program in them like you would in a normal language.
I don't think that's true with current systems. For a start you have to prove termination, even when you use simple types. (Which leads to another shortcoming of the Curry-Howard approach to verification, non-termination.)

     I regularly run into situations where I think "gosh, if I had dependent types
Sure, and I often think the same. But that doesn't address the key point: (1) you can always do the same with program logic, prove that you don't need a cast, but more imporantly (2) in a lot of large-scale programming projects, you don't have good specifications up-front. Instead you have a vague idea, and try to knock out a prototype as soon as possible to show to the customer, who will typically then reply "that's not quite what I want instead ..." and you go back to the drawing board and iterate. The specifications and the program coevolve. In prototype stages you don't actually care if the program needs a cast or not, you just put in the simplest thing that works, and fix/optimise later, probably only after heavy profiling. Who cares if a missing-cast exception is thrown during the protoptye phase? And for large projects the prototyping phase might already involve a team of 50 programmers. Since good specifications are typically only available at the end of a project, when the software is stable, it only really makes sense to pay the (currenly enormous) cost of full verification at the end of the project. With program logic you can easily delay verification until the end of the project (but don't have to), with Curry-Howard based verification you can't.

    Two, type system researchers promote provably correct programming as the advantage of dependent types.
Program logic researchers promote provably correct programming as the advantage of program logic. Anyway, programs can only be provably correct w.r.t. to a specification, but in most programming projects, a meaningful specification is not available in the early stages, so any talk about provable correctness is inapplicable.

    they generalize a vast swath of different type system features 
I fully agree with you here, and it's beautiful. But the beauty stands in the way of type inference, and that makes full dependent types extremely costly for most (but not all) forms of software engineering. As B. Pierce says: the Hindley-Milner polymorphic typing system "has hit a sweetspot" (B. Pierce) between expressivity, type-inference and simplicity.

    Furthermore, if you want it's possible to use a program logic style of verification in a dependent type system.
I'm not totally sure I can' follow you here. How does this get around the totality issue?


I don't understand what you mean by encoding tax. Can you give an example? Certainly lots and lots of work to make an actual practical and usable system still needs to happen, but I don't see any fundamental limitations with respect to effects, including non-termination.

> For a start you have to prove termination, even when you use simple types.

This is not the case. E.g. in Idris you have non-terminating functions. In NuPRL the whole thing is built on potentially non-terminating functions. Of course the functions that you use as proof witnesses have to terminate, but that's true in any system since a non-terminating proof witness means that the proof is bogus ("Theorem: Socrates is a goat. Proof: Socrates is a goat because Socrates is a goat."). The functions in the program that that the specification is saying something about do not have to be proven to terminate.

> (1) you can always do the same with program logic, prove that you don't need a cast, but more imporantly (2) in a lot of large-scale programming projects, you don't have good specifications up-front

I don't want to prove that I didn't need the cast or check, I want to not have the cast or check in the program in the first place. This is about types that are expressive enough to model the actual shape of the data, not about proving that a program is correct relative to some specification. By analogy with lacking generics / parametric polymorphism like old versions of Java: if I write a program that uses a List I need to cast to and from Object. With generics I can use a List<T> and do not need the cast. I don't want to write the program with casts and then prove that I didn't need them, I want to write the program with List<T> without casts. For the same reason that you still want generics even if you use a program logic to reason about your programs, you still want dependent types even if you use a program logic to reason about your programs.

> With program logic you can easily delay verification until the end of the project (but don't have to), with Curry-Howard based verification you can't.

Why? You can write the same program without specification as you would in the program logic system, and then prove lemmas about it.

> the beauty stands in the way of type inference

Type inference is overrated anyway, and with dependent types you don't actually lose type inference. You do lose it for the parts of the programs that use fancy types, but you wouldn't be able to write those at all in a language with full type inference. There is no fundamental reason that a dependently typed language couldn't have the same type inference for non-dependently typed pieces of the program as the language that only supports non-dependent types.

By the way, and I admit this is more of a theoretical curiosity, IIRC full type inference for dependent types is decidable if you do not have implicit parameters. That does make sense: you can either infer types, or values, but not both at the same time. Of course in practice you much rather have implicit parameters than type inference. From the proof assistant side of Curry-Howard type inference is also kind of strange: you infer the theorem (the type) from the proof (the program). So it's like writing down a proof, but having it inferred what the proof is actually proving.

> the Hindley-Milner polymorphic typing system "has hit a sweetspot"

Where sweetspot = local optimum :) Meanwhile, OCaml is getting GADTs ;-)


    I don't understand what you mean by encoding tax. Can you give an
    example? Certainly lots and lots of work to make an actual
    practical and usable system still needs to happen, but I don't
    see any fundamental limitations with respect to effects,
    including non-termination.
Look at the theory of natural numbers. We can reason about them in Peano arithmetic, or in ZFC using one of the well-known ways of encoding natural numbers as sets. Because Peano arithmetic has a model in ZFC, every proof in Peano arithmetic can be converted into a proof in ZFC. But proof search in ZFC is more expensive than proof search in Peano arithmetic, because the latter is less general than the former. ZFC allows you to ask questions like "333 \in \pi", which is perfectly meaningful as a statement about sets, but not meaningful in Peano arithmetic. The "encoding tax" is the increase in proof search space when moving from a specific to a general theory.

Let me give you a second example. Most programming languages can be encoded into \pi-calculus, and pi-calculus has a good logic called Hennessy–Milner logic (HML). You can use HML to reason about all \pi-calculus behaviour. Since \pi-calculus is a very expressive model of computing, HML is extremely fine-grained. Now assume you want to reason about ML programs. You do this by translating them into \pi-calculus and then using HML. But due to the fine-grainedness of HML this will be very painful. It's much more efficient to develop a logic for ML itself, and avoid paying the "encoding tax".

As I said, I don't know Idris well, but I assume that the proof language does not have state and other effects, so they need to be coded up as pure functions. I expect there to be an "encoding tax" to be paid for this translation of effects into pure functions.

    This is not the case. E.g. in Idris you have non-terminating functions.
Yes, but they are not in your proof language. You can always code up anything.

    By analogy with lacking
    generics / parametric polymorphism like old versions of Java:
This is not a good analogy, because type-inference can handle parametric polymorphism, whereas general type-dependency on values cannot be dealt in that form.

    You can write the same program without specification as you
    would in the program logic system, and then prove lemmas about
    it.
My experiences in Coq and Agda suggest that that's not the case.

    There is no fundamental reason that a dependently typed language
    couldn't have the same type inference for non-dependently typed
    pieces of the program as the language that only supports
    non-dependent types.
If it's so simple, then we should have systems that do this. We don't have such systems, hence it's probably not that simple.

    By the way, and I admit this is more of a theoretical curiosity,
    IIRC full type inference for dependent types is decidable if you
    do not have implicit parameters.
I don't know what you mean by "implicit parameters". Type inference might be decidable for some weak inexpressive dependently typed systems, but MLTT, CoC, nope?

    OCaml is getting GADTs ;-)
Yep, and that preserves type inference, which is the key factor. So it's Hindley-Milner still.


> As I said, I don't know Idris well, but I assume that the proof language does not have state and other effects, so they need to be coded up as pure functions. I expect there to be an "encoding tax" to be paid for this translation of effects into pure functions.

There is general consensus that the best way to handle effects is a pure base language with an effect system on top. Idris takes a good shot at this, but I doubt it's the last word. Eff and Koka and F* seem particularly promising as well.

> This is not a good analogy, because type-inference can handle parametric polymorphism, whereas general type-dependency on values cannot be dealt in that form.

Why did type inference become the #1 criterion for a type system in this discussion? There are far more important criteria than type inference, such as expressiveness. If your type system is not expressive enough you wouldn't be able to write the program that can't be type inferred at all. Why is that a win?

> Yes, but they are not in your proof language. You can always code up anything.

No sound system can have non terminating functions as proof witnesses. You can reason about potentially non terminating functions just fine in dependently typed languages. Look in particular at NuPRL, which is build around the concept.

> If it's so simple, then we should have systems that do this. We don't have such systems, hence it's probably not that simple.

Doing type inference for non HM-typeable parts of programs is theoretically simple. For the parts of the program that do not have type annotations just type infer them with a HM inference algorithm, if it succeeds great if it fails signal a type error. Just like in ML. The reason that it hasn't been done is likely a combination of reasons (1) not publishable (2) a lot of work (3) type inference just isn't that important especially if you're only doing it for the HM-typeable fragment.

> My experiences in Coq and Agda suggest that that's not the case.

Coq and Agda are not the pinnacle of the possibilities of dependent types. As I said, it's ongoing research. My point is that the style of writing a program and then separately proving lemmas about it is perfectly possible in a dependently typed language.

I would also suggest that your experience may not yet have been that extensive if you didn't know about implicit parameters?

> I don't know what you mean by "implicit parameters". Type inference might be decidable for some weak inexpressive dependently typed systems, but MLTT, CoC, nope?

Implicit parameters let you mark function parameters as implicit and they will be inferred at the call site if the compiler can deduce that they must have one particular value in order for the program to type check. It's like the reverse of type inference: instead of inferring the type of an expression, we infer the expression from the type. It's kind of related to type classes, which let the compiler infer type class dictionaries from types.

Look at the Sage work for full inference for refinement types. I think the same strategy could work for full dependent types. Of course it's not really what you want in practice, since in a dependent type system the most precise type you infer for a function simply restates what the function computes. That is, if you have the function (\x -> x+2) then the type will simply say "a function that returns its argument +2". Inferring the most precise HM type gives reasonable results precisely because HM is weak, so you don't get a type that's "too precise" ;-)

> Yep, and that preserves type inference, which is the key factor. So it's Hindley-Milner still.

Type inference for GADTs is undecidable.


     There is general consensus that the best way to handle effects is
     a pure base language with an effect system on top.
No, there is absolutely no consensus on this. This approach is only popular with researchers who pray at the church of lambda. Essentially all programming languages that are not research prototypes and that are used in industrial practise don't follow this approach. There are valid reasons for this. A much better approach in my opinion is to start with an unrestricted, effectful language and identify fragments with fewer effects via types. The problem is that adding interesting effects effects on top of something super restricted like pure functions leads to a high "encoding tax". That's because pure functions are not very expressive.

     Why did type inference become the #1 criterion for a type system
     in this discussion?
We have been talking about large-scale software engineering, and whether Curry-Howard or program logic is a more scalable approach towards program verification. My claim is that in large-scale software engineering the single most important criterion is to minimise the cost of change. Type-inference has 0 cost of change, since it's fully automatic. I cannot see how a verification tool that I'm forced to use in prototype phases when there is no stable specification is cost-effective in large-scale software engineering.

     There are far more important criteria than type inference, such
     as expressiveness.
Sheer expressiveness is not interesting. You can produce trivial typing systems where every program is typable. What really matters is to find "sweetspots" between the following desirable features of typing systems.

- Expressiveness.

- Strength of properties guaranteed by typing.

- Speed (including decidability) of type-checking or type-inference.

Unfortunately there are hard trade-offs between them.

     Doing type inference for non HM-typeable parts of programs is
     theoretically simple.
I don't believe it's that simple to do it well, and people do publish in this area, see for example the Trellys project by Vilhelm Sjöberg. I think it's an unsolved problem. It may be possible to get a pragmatically viable system that combines proving and programming in a unified system, but we are not there yet, and whether the result is superior to, or on par with simple programming languages and an associated logic can only be considered when such a language is available. And note that work like Sjöberg's is about combining pure functional programming with constructive logic in one system. Even if some kind of polished Haskell with dependent types can be build, whether that then generalises to state, concurrency, timing etc in a clean way is another open question.

     I would also suggest that your experience may not yet have been
     that extensive if you didn't know about implicit parameters?
Having used Agda, I know about them, I just wasn't sure what you meant. The term is used in with different meanings in different contexts. For example they work differently in Scala. And no, it's not the implicit parameters that make type inference for Agda's MLTT undecidable.

     Look at the Sage work for full inference for refinement types.
Do you mean Cormac Flanagan's Sage? That's not full type inference. Sage use hybrid type-checking and inserts dynamic checks where type inference fails. This might be a very good way of doing things in practise, I don't know. Maybe not.

     Type inference for GADTs is undecidable.
Are you sure? http://research.microsoft.com/pubs/79812/outsidein-icfp09.pd...


> A much better approach in my opinion is to start with an unrestricted, effectful language and identify fragments with fewer effects via types.

This is precisely what an effect system is, and what the languages that I mentioned do. So it seems that you agree with the the church of lambda on this point?

> We have been talking about large-scale software engineering, and whether Curry-Howard or program logic is a more scalable approach towards program verification. [...] I cannot see how a verification tool that I'm forced to use in prototype phases when there is no stable specification is cost-effective in large-scale software engineering.

I've explained several times that the assumption that you need to do full on verification from the start with a CH style system is incorrect. You can write your prototype without any verification and add lemmas and theorems about it when the spec stabilizes. This is not a fundamental difference between CH style vs program logic style.

> Sheer expressiveness is not interesting. You can produce trivial typing systems where every program is typable.

Well yes, you also need "well typed programs don't go wrong" of course, which dependent type systems do have. I don't think decidability of type checking is desirable, by the way. What you really want is a system that checks statically as much as possible, but can check dynamically that which it can't verify statically. Rather than two possibilities "correct" vs "incorrect" you want three: "definitely correct", "definitely incorrect" and "can't determine statically, so let's check dynamically". In the last case conventional systems just reject the program as incorrect, but I think that is not as helpful.

> Even if some kind of polished Haskell with dependent types can be build, whether that then generalises to state, concurrency, timing etc in a clean way is another open question.

More open than a program logic system that does the same? I don't think the approaches are that different in this regard. With state/concurrency/etc you work with a description of the computation (like in Haskell you work with a description of an IO computation). Those satisfy certain laws which you can use to reason about them. There is of course a huge design space to be explored here, but that design space isn't all that different than the design space of a program logic style system for the same.

> The term is used in with different meanings in different contexts.

I see, yes. They're sometimes called implicit parameters, sometimes implicit arguments, and there's differences in whether only unification is used or also type class / canonical structures / scala style implicit parameters kind of non-canonical inference.

> That's not full type inference. Sage use hybrid type-checking and inserts dynamic checks where type inference fails.

The dynamic checks are for when type checking fails, not type inference. The type inference will always infer the type if the term is typeable. Of course it may sometimes infer a type that the type checker doesn't know statically whether it's valid or not, so you need dynamic checks then. There is no incompleteness defeating magic here, but if the term is typeable then the type inference will infer a type for the term, whether the type checker statically knows that it's valid or not. I did say this was more of a theoretical curiosity.

> Are you sure?

That paper itself says so in the first sentence?


   This is precisely what an effect system is, and what the
   languages that I mentioned do. So it seems that you agree with
   the the church of lambda on this point?
No, Eff starts with a pure lambda calculus and adds effects on top.

   I've explained several times that the assumption that you need
   to do full on verification from the start with a CH style
   system is incorrect. You can write your prototype without any
   verification and add lemmas and theorems about it when the
   spec stabilizes.
As I said, I don't really see that. There is research going on, like the aforementioned Trelly that seeks to build such systems, but they are not there yet.

   Well yes, you also need "well typed programs don't go wrong"
   of course, which dependent type systems do have.
Well typed programs do go wrong! Here is a well-typed Ocaml program:

    List.hd [];;
When I run it, an exception is thrown. The point is: what counts as "not going wrong" is a fairly arbitrary definition. And for the trivial typing system I can define programs that go wrong as the empty set.

   I don't think decidability of type checking is desirable, by
   the way. What you really want is a system that checks
   statically as much as possible, but can check dynamically that
   which it can't verify statically. Rather than two
   possibilities "correct" vs "incorrect" you want
   three: "definitely correct", "definitely incorrect" and "can't
   determine statically, so let's check dynamically". In the last
   case conventional systems just reject the program as
   incorrect, but I think that is not as helpful.
I don't think this will be useful, because than every typo will turn into a dynamic check, and you end up with a dynamically typed language for all practical purposes. I think

   More open than a program logic system that does the same?
Yes, vastly more open. Those problems are essentially solved for program logics.

   I don't think the approaches are that different in this
   regard. With state/concurrency/etc you work with a description
   of the computation
Yes, you can work by encoding, but then you pay the "encoding tax" in verification.

   The dynamic checks are for when type checking fails, not type
   inference. The type inference will always infer the type if the
   term is typeable.
That's not how I understand the Sage paper [1], which says: "Our implementation performs bidirectional type checking (Pierce and Turner 1998), allowing many types to be inferred locally, but does not yet perform full type inference (a much more technically challenging problem)."

   That paper itself says so in the first sentence?
But the paper also presents a complete inference algorithm for GADTs. The point is: it depends on exactly what you mean by GADTs. I don't know what Ocaml does here. I stopped using Ocaml just before GADTs were added.

[1] https://sage.soe.ucsc.edu/sage-tr.pdf


This is the paper about type inference: https://users.soe.ucsc.edu/~cormac/papers/esop07.pdf

I think I've understood your points and made mine, so lets see how it pans out :) Thanks for the discussion!


Does selling CompCert to Airbus count as begin "production worthy" ? I would (not) be amused when people try to put javascript in plane navigation systems.

Do not forget that most dependently type languages (and Coq the first) are proof languages. The goal is not (only) to write programs, but to prove things and a rather large number of theorem have been proven that way.

Dependently types languages for programming (as opposed to proving) are very much cutting edge research. Is it so surprising that they are not used in the wild ?

Edit: woops, missed the "here" in parent's comment. No, I don't work on compcert. :)


> Dependently types languages for programming (as opposed to proving) are very much cutting edge research. Is it so surprising that they are not used in the wild ?

How long will this be the case? Seems like they've been "cutting edge research" for a few decades, and the Curry-Howard correspondence itself is like 80 years old. Surely if there were a real demand for these kinds of programming languages, we would have seen some industry penetration at some point, right?


I think this is exactly what's happening, but through the back door: rather than everyone switching to Haskell, you see features pioneered by languages like Haskell being implemented in more mainstream languages.

Also, do not forget Haskell's roots were to create a programming language environment where people studying compilers could easily experiment with new features. Hence Haskell's large number of language extensions.


Right. What I'm saying is: how long do these dependently typed language features have to "incubate" before they make it into Java? Because implementations of dependently typed languages have existed for as long as Haskell has, and we've known that it's been possible to capture "proofs" in a type system for almost a century. So how much longer do you expect it will take?

Or, maybe, there just isn't a real industry demand for dependent types. Hell, many engineers still have trouble convincing their employers to write robust tests, let alone full proofs of correctness.


There's been plenty of industry penetration for languages with more advanced type systems. Not sure why there's the arbitrary line being drawn here of dependent type systems.

Some of this is about the penetration of certain ideas. Like, we had to wait for multiple cores on commodity hardware before functional programming started to cross over from the academic community to the professional web programmer.

That shift helps your average programmer to think a bit more mathematically, and then it's basically about whether the programmer starts to appreciate types - and to me that seems to be less about purity of argument than the professional atmosphere the programmer is in. Are you prototyping and moving fast and breaking stuff, or are you caring more about longevity and correctness.

And then once you start getting more into static typing and functional programming, a lot of these other ideas start to become more attractive and intriguing.

It's like a lot of ideas that circulate around for a long time before they really take hold; you're waiting for a certain critical mass of system inputs and momentum, and then things can start to change really fast.


True certified programming has a lot of overhead but there are definitely areas where it'll be worth it: medical devices, car systems, mars rovers, aviation, etc. The University of Washington is actively working on verifying radiation therapy machine software and early results are already used in production for example. I think the reason certified programming hasn't been more widely adopted yet is because the tooling is terrible at the moment and we still have some important bits of theory to work out (e.g. homotopy type theory & other equality issues). It'll get there. Projects like Lean & Dafny at MSR, Adam Chlipala's work at MIT, and the large amount of theory work being done at CMU, Cornell, UPenn, Inria and other places are very promising.

My bet is that the tooling & theory will continue to improve and eventually the overhead will become low enough that a significant portion (~10-20%) of software will be certified.

I recommend reading Adam Chlipala's vision statement for inspiration on what certified programming could enable: http://adam.chlipala.net/summary.html


Dependently type languages are wonderful in theory and a royal pain in the butt in practice. It's a great example of "devil in the details" kind of thing.

For example, let's say you define a type of list of size n. Great. You can only assign list of the same size and you can only append an element of the type "n" to a type "n-1" or the compiler will complain. Now you can even prove that your program is formally correct. What's not to like?

And then, reality knocks at your door.

How do you instantiate such a type? It needs to be instantiated right away with n elements or your program won't compile. But these elements are coming from the runtime, say, a database. How the hell does this work?

Dependently typed languages that go just a step beyond prototypes quickly realize how crippling this approach is and they will give you all kinds of escape hatches to weaken the type system just so you can use the darn thing. But it's futile, because at the end of the day, a program is a living thing that relies on side effects and external inputs, and sophisticated systems are just not designed to cope with human fuzziness.

The reason why dependently typed systems never became popular is not because the world hasn't realized yet how awesome they are: it's because they just aren't a good fit to solve real problems.


Your "I-can't-work-with-this-unsized-list-because-I-need-to-pass-it-to-something-that-takes-a-sized-list" problem seems like something that one might _think_ would be a problem if one understood advanced types on a conceptual or theoretical level but not how to use them in real-world code. It's actually pretty easy to deal with cases like this... At the end of the day, you just end up having to code for both the case where the list matches the size requirements of the code you want to call and the case where it doesn't. For example, you could pass the unsized list to a function which performs validation of its actual runtime size, returning, e.g., an `Option[List[Int] @@ MinSize[_5]]` (if I may use Scala-speak). You'd then have to handle both the case where the `Option` is a `Some` and the case where it's a `None`. Ultimately, it ends up being extremely similar to checked exceptions: you are simply forced to write code to handle all cases, rather than writing code that allows some of them to blow up at runtime (while telling yourself "that case could never happen in the real world" or "that's an edge case, we'll just ignore it").


> At the end of the day, you just end up having to code for both the case where the list matches the size requirements of the code you want to call and the case where it doesn't.

I don't follow. If your list is not the right size, your program doesn't compile, period.

If it does, you're not using a dependently typed language.

Am I missing something?


Yes, you're missing this, I think: When you call a function which takes, say, a list-5, all you need to do is to prove to the compiler that you've checked that the list is indeed of length 5 before you pass it along.

The key insight here is that the check may be performed at compile time or runtime as long as the compiler can see that you've performed the check. It can do this e.g. in pattern match clauses. For example in code like

        case inputList of
           [a,b,c,d,e] -> branch-A
           otherwise   -> branch-B
Idris will automatically have proof that inputList is length 5 in branch-A. Which means you'd be free to call any function which expects list-of-length-5 in branch-A without further ceremony. This will not be the case for branch-B.

(And if you think about it... if you didn't allow for runtime checks to satisfy such proof obligations there would essentially no way for any dependently programs to depend on any runtime values at all.)

Aside: Another thing which I think has been missed in this entire discussion is that you don't have to actually use List-of-size-N as your type. If you don't actually care about the size of the list in the code you're writing, you're also allowed to just use List-of-any-finite-size as your data type. (Of course this could cause a certain amount of friction if the standard library is all based on List-of-size-N, but it shouldn't be too bad in e.g. Idris.)


It depends on the language. For instance in Scala you can simulate collections of fixed size as in https://github.com/milessabin/shapeless/blob/master/core/src...

Then, at runtime, you can cast an unsized list to a sized list giving - say - a function List[A] => Option[Sized[List[A], N5]] that will fail if the list is not of size 5. From then on you can go on letting the compiler keep track of lengths.

Of course, this does not work at all if you have no clue what size your input will have


I believe that you don't need to know what size your input will have. Rather, you need to know what size the thing you will be calling will need, right?


I think you are. I think you are missing the possibility to write a total function of type `List[Int] => Option[List[Int]] @@ MinSize[_5]` (I'm still writing Scala-ish types because that's where I feel comfortable). This gets you into a world where you can deal with your previously-unsized `List` as a sized `List` (either you can map over your `Option` or you can fold the `Some` and `None` cases together by providing some default).


I wish we (both potential users, and developers of these languages) would start seeing these things more like enhanced tests than perfect manifestations of mathematical truth.

Nobody would ever question the value of a unit-testing framework because it requires you to instantiate objects and doesn't hit the database and the ORM. (Yes, there are testing strategies that cover all of that, but for unit-testing application logic, you generally want to skip all of that.) Dependent types should let you do the same thing that a test would do, except exhaustively: if you add a new email address to an account with N addresses, you can then index N+1 of those addresses, but no more. The N-address account coming from the ORM isn't considered an "escape hatch" that "weakens the test system": it's just how it works.

I want stronger types for exhaustive testing of business logic on individual functions -- the sort of things that a comment would say (e.g. "Returns a subset of this parameter if it is provided, otherwise..."), except machine-checked. I don't need my entire program to have a type that Messrs. Curry and Howard like. In all probability, my entire program is structured as an infinite loop anyway.

I don't know that any systems really let you get here (other than maybe extraction of particular functions into Haskell or ML, but really the people who would voluntarily write their entire app in Haskell or ML are not the right target audience).


> How the hell does this work?

It's easy, since it works as it always did. You check whether the list coming from the database has the correct size n, and then the type system knows that and lets you pass said list to a function that expects a list of size n.

    list = GetAListFromTheDatabase()
    if(list.size == n){
       // type checker knows that the size is n here
       foo(list);
    }else{
       error("Database returned a list of incorrect size.");
    }


In a dependently typed language, an if statement isn't just a run-time check, it also provides a compile-time proof that its condition is true in one branch and false in the other branch. So your insinuation that run-time checks are not possible in dependently typed languages is false; you just have to prove at compile-time that all possible results of the run-time check have a valid execution path. So if you have an unsized array A, you can't just write `return A[5]`, but you can write `if (A.length() > 5) { return A[5]; } else { return 0; }`.


You're misguided. What you describe is not really an issue for dependent types. Your attitude seems to be shared by a lot of people, so I'll try to clear things up a bit. I think it helps if I use refinement types (as used in F), which are just dependent types in disguise.

Taking your example, I would have written a function f* that takes a list (with an arbitrary type of elements) and produces an integer. Its basic ML type would be:

  f : list 'a -> int
Now taking your example, assume that, for whichever reason, f only works for lists of length 5. This would probably never arise in practice, but I'm just going with your example. First, we have to say what "works" means. Intuitively, it means that f produces a reasonable output. Let's say we have defined a function p which says whether the output is reasonable. In F, I can state all of that by giving my function f* the type:

  f : l:list 'a{length l = 5} -> x:int{p x = True} 
I can only give this type if I can convince the compiler that the result really satisfies p - this is one point I'll get back to later.

Now assume I want to use this function later in my code to deal with some user provided input. Assume I have a function read_int_list:

  read_int_list : () -> list int
I can now combine this function with f, but only if I ensure that the length of the read list is 5:

  val process_input =
    let inp = read_inst_list ();
    let res = if length inp = 5
                then f inp
                else -1  // error
This will now be accepted by the compiler, since it will be able to deduce that inp is an admissible input to f, because the function call was guarded by the if condition (I'll also get back to this). Note that the code above is absolutely no different than what you would write in any "regular" programming language (including ML). What is different is that in a regular language you could forget to add the condition - but not in F. Of course, the error handling code above is not sensible.

So interacting with user input and output is absolutely NOT a problem for dependently typed languages and their ilk. In fact, there is a full-blown implementation of the TLS protocol in F.

What is a problem, however, are the two points I hinted at: convincing the compiler that your statements are true. This is done in one of the two ways. First, you can manually give "hints" to the compiler, which is normally very tedious. Second, the compiler can try proving them automatically (F* for example leverages an SMT solver), which is great when it works, but often doesn't work.

So the real problem is that producing proofs is quite time consuming, and thus expensive. The world, as you put it, is not completely wrong: very often, a buggy program is cheaper than a correct one. Heck, people still stick to OpenSSL over the above verified TLS implementation - and for good reasons. The exceptions being programs used in NASA rovers and such - which is where you're starting to see dependent types being used. I certainly won't use them for my projects - at least not until the systems get massively better at producing proofs. It's partly happened with the advent of the SMT solvers, but it's nowhere near enough.


That's one of the reasons Pascal failed.


I've been doing Scala for 5+ years, real-world production code. From here it feels like the popularity of types is going up and up (and within Scala they're becoming more popular e.g. look at doobie). Haskell and OCaml look to be getting more popular, not less. Javascript is a special case because of the browser. Go is mostly displacing Python and Ruby. There's certainly no exodus of ex-Haskell programmers fleeing for less typed languages.


The value proposition I've always read is related to gains in expressiveness and correctness. Programming is eventually going to become like Advanced Chess (human+computer team) and entire classes of mistakes will cease to exist.

I don't think I can be convinced that correct-by-construction methods are wrongheaded. I am convinced that humans aren't very good at wrangling precise and complex and large ideas without failure.


    after a certain point the validation overhead of advanced type systems is just not worth it.
But what makes you think you need dependent types to do full specification and verification of a program? Instead you could use program logic. Write the program in any language you like and later, when it the program is finished verifiy that it meets its specification using program logic. This separates programming from verification completely. In my opinion this is the future of certified programming. Dependent types are too heavyweight to support any conventional programming process.

Another advantage is that unlike Curry-Howard based verification, program logic does not have to be constructive, instead you can use classical reasoning with excluded middle. This tends to yield shorter proofs.

See Microsoft's Daphny [1] or CFML [2] for prototype tool chains implementing program logic based verification. Facebook's infer tool [3], while not allowing full verification, is also based on program logic.

[1] http://research.microsoft.com/en-us/projects/dafny

[2] http://www.chargueraud.org/softs/cfml

[3] https://github.com/facebook/infer


> This separates programming from verification completely.

This is simply not the case. Significant verification projects always plan for verification and do verification in tandem. This is definitely true for all Dafny projects to date as well. Verification is just so much harder than unverified programming that you generally want to do everything you can while programming to make verification easier. This may become less true as proof automation improves but it will always be a tension.

> But what makes you think you need dependent types to do full specification [...] ?

Also, keep in mind that Dafny and similar tools can only express first order logic; this can significantly restricts the expressiveness of your specification language and intermediate lemmas.

> program logic does not have to be constructive, instead you can use classical reasoning with excluded middle.

This is a silly point. First off proof assistants like Coq aren't inconsistent with the law of the excluded middle (only NuPRL is I believe) so you can always assume it as an axiom if you'd like. Most projects don't do this because constructivism is a benefit not a burden in verification: it allows you to make distinctions that aren't possible in classical logic.

I agree that Dafny and similar projects are very exciting though; Dafny's tight integration with Z3 provides an awesome level of automation that will keep getting better and better.


     Significant verification projects always plan for verification
     and do verification in tandem.
That may be the case, but the point is that dependently typed programming languages force you to verify every intermediate step, while program logic gives you the option to do so.

A key problem in contemporary software engineering is that specifications constantly change (aka "customers don't know what they want"). Agile software engineering methodologies have mostly taken over from waterfalls because agile is designed from the start to minimise the cost of change (how successful they are at minimise the cost of change is another matter). If you have to redo not just the code itself and tests but also proofs every time your customer changes the spec, the costs of software development increase rapidly. Look at e.g. Java exception specifications, for a similar but much more benign problem. The program logic approach to verification does not force the full cost of verification on the programmer, because verification can be deferred until the program specification is stable and it's code passes all tests. This enables a hybrid approach to program correctness: in early states of the software engineering process, you use lightweight methods such as Hindley-Milner like typing systems, and testing, and only in later states you go for logic-based approaches. Let me repeat that the program logic approach always allows you to specify and verify from day one, so program logic gives you the best of both worlds.

    Dafny and similar tools can only express first order logic
That's a design choice of Dafny's and orthogonal to dependent types vs program logic. Moreover, all program logics where observational completeness has been proven(which means you can prove exactly the program properties that are contextually observable) were based on first-order logic. This is a pretty good indication that higher-order is not needed for sheer expressivity. Higher-order logics are but a convenience that enables shorter specifications and proofs.

    This is a silly point. First off proof assistants like Coq aren't
    inconsistent with the law of the excluded middle
It's a serious point. Look at the influential nominal approaches to handling bound names. The reason they were implemented in Isabelle and are heavily used there, but are basically not used in Coq, despite nominal techniques being perfectly constructive, is that nominal techniques have extremely short and uniform non-constructive proofs which can easily be handled by Isabelle's proof automation, while the corresponding constructive proofs are long, non-uniform and not easily handled by Coq's proof automation.

The real reason everybody uses dependent types is that they are perfectly matched for the verification of pure functional programs, since proofs are pure functional programs. This is beautiful. But as soon as you want to verify non functional programs, that beautiful coincidence evaporates.

This is another key shortcoming of the dependent types approach: it's unlikely that it can be extended in a beautiful way to low-level programming, stateful programming, timed programming, concurrent programming.


I'm shipping massive amounts of code written in Haskell - production worthy for a real product - and the benefits of using Haskell over any other mainstream imperative language have been fairly significant.


Haskell is not a real dependently type language. It may tries to quack like one, but compared to Idris/Agda, it's doing a very bad job at it.

I'm pretty sure Parent didn't included Haskell.


Haskell has decent experimental support for dependent types. I've been using the CLaSH Haskell to hardware conversion system, which makes heavy use of type level arithmetic, with pretty good results.


Type level arithmetic is sugar for clever GADTs, not full blown dependent types. Try to do things in Agda/Idris and come back to haskell, the expressiveness is not nearly comparable.


Are Nats in agda and idris not also just GADT based peano numbers?


The toyish version showed in tutorials, yes. The real stuff, no, it's just .. integers.

And anyway, my point was that you can do much much more than type level arithmetic.


His comparison sounded much more like "research languages vs. mainstream imperative languages" and Haskell definitely came from research land, just as Idris and Agda are.

Plus someone is seriously working on dependent types in Haskell - so no, I think it's fitting.


The formally-verified Odd Order Theorem[0] using Coq was quite an acomplishment (though probably not what you had in mind).

[0] https://hal.inria.fr/hal-00816699/file/main.pdf


Note the weakness of the Java solution. Since its interfaces offer no support for virtual contructors (hence the design pattern), you have to instanciate a factory before you can get an empty value.


Possible now with Java 8 default methods.


It's interesting to contrast this with the attitude in the D community (or at least Andrei and Walter), which has also come to the conclusion that typeclasses are a bad idea–but because they're too strongly typed rather than not strongly typed enough, and the solution is to throw them out in favor of dynamic "typing" at the type level through ad-hoc templates.

My two cents is that this is just like every other static vs. dynamic typing debate, and it's a deep tradeoff. I've been pretty pleased with typeclasses as an easy-to-use sweet spot between complexity/mental overhead and safety. But I'm certainly willing to believe we can do better, of course.


There are a number of good points here, but I am surprised by the fact that John wants a global coherence check, since in our conversations he has always insisted on the primacy of local reasoning.


I think the article mixes up "type" and "newtype". "newtype Email = String" doesn't make sense.


"type" wouldn't work in this context, as it is literally just an alias, so the compiler cannot use to determine what instance to use. In contrast, newtype actually creates a new type, that acts as a wrapper around the old type. I believe what the author meant to say was "newtype Email = Email String".


This post engages in plenty of armchair speculation, and is overall horribly vague, or just nonsense ("addOneTwo works on any type that’s provably isomorphic to Integer"). It's more like a plea for all things that seem intuitively nice and good, mashed together without any deep consideration for whether they can be realized consistently. Type classes are still an open research topic, especially when combined with dependent types. From the top of my head:

Class coherence is the property that any two instances of the same types are equal. It's a strong guarantee that boosts the robustness and usability of type classes significantly. Scala doesn't have it, instead it has a arcane set of rules that determine the resolution of implicits in the face of ambiguity, and it's a huge pain for more complicated use cases.

Named instances are incompatible with coherence. We could just put one named instance into a runtime object, and voila, it's indistinguishable from any other instance. We could remedy this by indexing the type of an instance with a label, which isn't better than choosing instances with newtype wrappers.

Similarly, local instances and explicit instance application imply incoherence.

Also, laws for classes are an already existing feature in Idris, Agda and Coq. The problem is that laws for anything are by far best realized via dependent types, and dependent types have a nontrivial interaction with classes that is an open problem. With dependent types, checking whether two instance heads are different could be just undecidable, depending on what constructs one allows there.

It's a possibility to neuter the language fragment that goes into instance heads, so that propositional equality becomes decidable over it - in a way Haskell already does this, by disallowing type families in instances. But in a dependent language it could be rather awkward that a fair amount of useful types cannot be made instances.

Alternatively, one could require proofs from the programmer in each instance that the instance head is propositionally different from all other instance heads in scope. But this would also require much thought, and I haven't given it yet.

Also, if our classes are actually coherent, we would like to have an internal proof of this property (as opposed to in current Haskell where we just sort of know that this is the case), but I have no idea how to do this best. Maybe the instance resolution procedure could be modeled internally in a strong enough language, and coherence could be given as a theorem, and there could be some syntactic sugar for code with classes.

Or we could have both coherent and incoherent classes, each explicitly marked as such. Or we could have coherent classes by proving that a certain class is necessarily coherent, independently from any instance resolution procedure. There are many such classes, but not nearly all that we frequently use, and some classes are coherent but we can't prove it in currently implemented type theories because the proof relies on parametricity (Functor is such a class).


Thanks for the informative post. Could you please clarify a couple of things for me?

> Named instances are incompatible with coherence. We could > just put one named instance into a runtime object

Could you give an example of what you mean in, e.g., Haskell (assuming some fictional facility for named instances existed)?

> With dependent types, checking whether two instance heads > are different

What are instance heads?


> instance heads

Instance heads are the types in the instance declaration that the compiler dispatches on, for example in `instance Show Int` the head is `Int`.

> named instances

You can see an example for named instances in Idris (very Haskell-like syntax) here (http://idris.readthedocs.org/en/latest/tutorial/classes.html...). We could put named instances into runtime objects (example in Haskell: `data ShowInstance a = Show a => Instance` which does exactly this), thereby forgetting the name, which could cause ambiguities down the line. But even without instances in runtime boxes, different instances for the same type necessitate that there are some rules for instance resolution that a programmer has to consider. For example, if we have two monoid instances for `Int`, one with addition and one with multiplication, we somehow need to make sure that the monoid operations mean what we want them to mean, so that we don't get unexpectedly addition instead of multiplication.


> It's a strong guarantee that boosts the robustness and usability of type classes significantly.

If there was a compiler which would enforce these rules.


There's some template Haskell magic that generates QuickCheck properties for the typeclass laws for your instances for you. That's taking care of some of the article's worries.


No need for TH, actually. Write a function which takes a `proxy a` and returns e.g. a Tasty `TestTree`: `testMyClassLaws :: proxy a -> TestTree`. Then in your test module, add something like `testGroup "MyClassLaws for MyType" $ testMyClassLaws (Proxy :: Proxy MyType)` somewhere in your list of tests.


John de Goes mentions at the start of his article how Scala implements type classes without an extra language construct. But he doesn't show what his example would look like in Scala. So I wrote that out for my own amusement.

You can find the gist here. https://gist.github.com/joost-de-vries/949fbea6345fec1ac4ef




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: