Return type polymorphism in Haskell

0
23




Tags
Haskell

In this brief post I want to discuss a fairly unusual feature of Haskell –
functions that can be parameterized by their return type.

Parametric vs. ad-hoc polymophism

It’s worth beginning with a quick discussion of the two most common kinds of
compile-time polymorphism present in Haskell: parametric polymophism and
ad-hoc polymorphism.

Parametric polymorphism is possible when we can define a certain operation to
work similarly on any type. A simple example is the list type [a], on which
many operations are defined in a way that completely disregards what the actual
type a is. For instance, the function length can find the length of the
list without relying or caring about a. A slightly more interesting example
is map, defined as follows for lists:

map :: (a -> b) -> [a] -> [b]
map _ []     = []
map f (x:xs) = f x : map f xs

This will work on any list, regardless of what it contains – integers, other
lists or some complicated user-defined type. The definition is written in a way
that is oblivious to the actual type of a. This kind approach is commonly
called generic programming; in C++, it’s represented with templates.

This approach is sometimes limiting, however, because we may actually want
functions to do something slightly different for every type (or at least for
some types). This brings us to ad-hoc polymoprhism, which is represented by
either function overloading or template specialization in C++.

Ad-hoc polymorphism in Haskell is achieved by using typeclasses. As an example,
consider the built-in class Ord. If you define the <= operator for your
type, it's then considered to comply with the Ord class and we can do some
interesting things with it . For example, we can implement merge-sorting as
follows:

merge :: Ord a => [a] -> [a] -> [a]
merge [] ys = ys
merge xs [] = xs
merge (x:xs) (y:ys)
  | x <= y = x : merge xs (y : ys)
  | otherwise = y : merge (x : xs) ys

msort :: Ord a => [a] -> [a]
msort [x] = [x]
msort xs = merge (msort left) (msort right)
  where
    (left, right) = (take halflen xs, drop halflen xs)
    halflen = length xs `div` 2

Note that msort works on a list of some type a, similarly to map; in
this respect it's parametrically polymorphic, with a twist. The type constraint
Ord a => tells the compiler that it's only legal to invoke msort on
lists of type a if a implements the Ord class.

Most built-in Haskell types implement the Ord ord class, so we can use
msort right away:

> msort [2,1,3,8,5]
[1,2,3,5,8]

However, if we want to use it on user-defined types, we'll need to implement the
Ord class manually:

data Person = Person
  { lastName :: String
  , firstName :: String
  } deriving (Eq, Show)

-- Simple example of making Person sortable by defining Ord.
instance Ord Person where
  (Person lx fx) <= (Person ly fy) =
    if lx == ly
      then fx <= fy
      else lx <= ly

Note that this definition of <= relies on some semantic properties of the
custom type (that last name is usually sorted before first name) that Haskell
has no way of knowing. This is why ad-hoc is often necessary - per-type
definitions describe a real-life domain in some way; for example, had Person
had some unique ID it would be perhaps more correct to sort people by this ID.

Having done this, we can now sort a list of Persons:

> let folks = [(Person "Jones" "Jan"), (Person "Jones" "Areo"),
               (Person "Falcon" "Hugo"), (Person "Spearson" "Britney")]
> msort folks
[Person {lastName = "Falcon", firstName = "Hugo"},
 Person {lastName = "Jones", firstName = "Areo"},
 Person {lastName = "Jones", firstName = "Jan"},
 Person {lastName = "Spearson", firstName = "Britney"}]

It's important to reiterate that this example showcases both kinds of
polymoprhism. msort is parametrically polymoprhic - it's the same code that
will work for any type a, as long as a implements Ord. On the other
hand, the <= operator is ad-hoc polymorphic - it works on different types,
but each type should define its own version.

Return-type polymophism

Now that we have the above covered, it's time to turn to the main topic of this
post - return-type polymophism. This is a fairly unique and cool aspect of
Haskell, especially if you come from the C++ world where templates provide some
degree of parametricity but it's limited in certain other ways.

Let's start with an example, the built-in function read:

read :: Read a => String -> a

The read function reads input from a string, which must be completely consumed
by the input process.

Something interesting is happening here: read is parameterized by type
a, but this type is not one of its arguments; no, the argument of read
is simply a String, which is a concrete type. a appears only in the
return type.

Let's try to parse an integer by using read:

> read "1"

:46:1:
    No instance for (Read a0) arising from a use of `read'
    The type variable `a0' is ambiguous
    Possible fix: add a type signature that fixes these type variable(s)
    <...>

Oops, what went wrong? The issue here is that 1 could be of several types,
and Haskell doesn't know which one to choose. We can fix that with an explicit
type annotation:

But 1 can also be of other types:

So read can truly return multiple types , depending on how it's being
called. This is return-type polymorphism.

Here's a more intriguing example:

> putStrLn (take (read "2") (read ""haskell""))
ha

Haskell didn't complain about read "2", even though no type annotation was
provided. What's going on? The answer is type inference. Haskell looks at that
code and sees the result of read "2" being fed into take, as its first
argument. The type of take is Int -> [a] -> [a], so the result of
read "2" is placed in an Int. This interaction of type inference with
return-type polymophism is one of the most impressive features of Haskell,
IMHO!

Now let's turn to a slightly more interesting example - monoids. I've mentioned
the Monoid type class before, in the context of folds.

class Monoid a where

  The class of monoids (types with an associative binary operation that has an
  identity).

To create a new Monoid, we have to define two functions:

mempty :: a

  Identity of mappend

mappend :: a -> a -> a

  An associative operation

Note the type of mempty - it takes no arguments, but returns an arbitrary
type a - return-type polymorphism. Here's how Haskell defines Monoid for
lists:

instance Monoid [a] where
        mempty  = []
        mappend = (++)
        mconcat xss = [x | xs <- xss, x <- xs]

With this definition, we could use mempty for strings as follows:

For numbers, things are somewhat more interesting because there are two ways to
define monoids on integers. One is using addition as the associative operation,
another is using multiplication. In the former case, the zero element is 0, in
the latter it's 1. Since there is no scenario which is clearly better than
the other, Haskell doesn't define Monoid for Int:

> mempty :: Int

:11:1:
    No instance for (Monoid Int) arising from a use of `mempty'
    Possible fix: add an instance declaration for (Monoid Int)

Rather, it adds two new types that wrap Int: Sum and Product. Here's
an example:

> mempty :: Sum Int
Sum {getSum = 0}
> mappend (Sum 6) (Sum 7)
Sum {getSum = 13}

As before with read, note how mempty returns a different type based on
what's expected from it. Type inference picks the right overload! In the
following sample, type inference knows that mappend takes two arguments of
the same type and the first one is a String, so it invokes the
String-returning version of mempty:

> mappend "Foobar" mempty
"Foobar"

Similarly, here the Sum-returning version is invoked due to type inference:

> mappend (Sum 10) mempty
Sum {getSum = 10}

Providing multiple capabilities from the same function

I'll conclude with another cool example of return-type polymoprhism from the
Haskell standard library: regular expression matching . Haskell defines the
RegexLike typeclass as an interface for regex-like matchers. It has the
usual zoo of methods one can define (and if undefined, will use one another as
the default implementation), with another class for the more generic usage:

-- | RegexContext is the polymorphic interface to do matching.  Since
-- 'target' is polymorphic you may need to suply the type explicitly
-- in contexts where it cannot be inferred.
-- <...>
class (RegexLike regex source) => RegexContext regex source target where
  match :: regex -> source -> target
  matchM :: (Monad m) => regex -> source -> m target

Feel free to dig in the source to see how it's all hooked up (a good example of
building high-level abstractions with Haskell!), but match is wrapped in the
=~ operator, which behaves polymophically based on the expected return type.
In Bool context, is finds whether the match exists at all (this corresponds
to the matchTest method of RegexLike):

> "january" =~ "an(ua)*" :: Bool
True

While in Int context, it counts the number of matches (this corresponds to
matchCount):

> "january" =~ "an(ua)*" :: Int
1
> "january" =~ "a" :: Int
2

And so on... there are a few more possibilities (like returning the actual list
of matches). Depending on your point of view, this is either extremely cool or
scary, because sometimes the programmer has to perform type inference in their
head to follow the path the compiler is taken, which can make code tricky to
understand.


Read More

This site uses Akismet to reduce spam. Learn how your comment data is processed.