Archive

Posts Tagged ‘Programming’

Fake World Haskell, Part 1

October 7th, 2009

Introduction

Real World Haskell (RWH) is an amazing book. Along with Learn You a Haskell, it’s one of the main resources available for Haskell newbies/intermediates who are looking for a comprehensive guide. It especially focuses on using “everyday life” coding examples to show how things are done. Many concepts are taught by example in this way.

RWH is a big book. And it’s not just fluff: it’s big and concentrated–you can tell that it could easily be twice as big. I was unable to digest it on one reading, and have read many of the chapters many times. I’m not the sharpest knife in the drawer, but I’m an experienced programmer in other languages, and I’m sure there are others in my boat: programmers who are intrigued by Haskell and functional programming, but are looking for more resources to make the journey easier.

I’m writing this, therefore, to be a sort of “companion” to RWH. From it, I’m going to steal much of the organization and flow. The authors of RWH didn’t have the luxury of being able to go back and explain things they’d already explained, and as such I found that following the examples could be difficult. The goal of this text is to provide a more gentle guide through the examples presented in RWH.

Who is this for?

Hopefully, any Haskell newbie will benefit from reading this, even if they haven’t had any issues following along with RWH. But it’s especially geared toward those who have found RWH’s pace to be a little too rigorous, and wouldn’t mind some reinforcement about concepts already touched upon. I’m going to assume that syntax largely isn’t a problem; the first couple chapters of RWH cover most of what I think you’d need to know to follow along (and do so quite well). I won’t, however, assume total familiarity with the concepts the syntax represents (in other words, a datatype definition shouldn’t look weird, but it’s okay if you don’t know what a “data constructor” is).

I’m also going to break a longstanding tradition as far as variable names. As RWH points out, since function definitions in Haskell are short and succinct, short variable names often don’t hamper readability much once you are used to them. For me however, this was (and is) quite an adjustment. I believe the new reader will have plenty of opportunity to acclimate to this style without needing to do so while also trying to figure out what’s going on conceptually. I’m going to take care to avoid one-letter variables and abbreviations where possible.

Part 1: Companion to Chapter 5

I’m going to start with the first “real world” example in RWH, taking some time to re-explain stuff that was brought up earlier in the book as I see fit. This article will cover the definition of a datatype and implementation of accessor functions.

Onward to JSON

JSON (JavaScript Object Notation) is a mini-language designed to represent data, usually to store or transmit said data. It’s an alternative to column-based formats like CSV, and has a simple syntax. We’d like to be able to use Haskell to store data in this format, and then read it back again.

JSON has some notion of datatypes, based (unsurprisingly) on JavaScript’s types: strings, numbers, booleans, arrays, objects, and null.

It would make sense to have representations of those types in Haskell. Of course, most of them are trivial to implement, as they already exist in Haskell. We’ll make a new algebraic data type to represent any JSON type; let’s call it JsonType:

data JsonType =

Remember, “JsonType” is the type constructor, which will be how we refer to the type in the type signatures of functions. On the other side of the = we’ll define our data constructors, which is how we create JsonType data.

data JsonType = JsonString String
    | JsonNumber Double
    | JsonBool Bool
    | JsonNull

We can use these data constructors like functions in order to create JsonType values. We can see that the first 3 constructors are just wrappers for the existing Haskell types String, Double, and Bool. This means we can write something like JsonBool True to represent the true value that would appear in a JSON file.

Why not just use the types we already have?

Types exist so that we can more exactly specify our intent, both for programmers (including ourselves), and for programming tools that can help prove the correctness of a given program. In C, for example, it is common to use a #define or an enum to give a name to an otherwise ambiguous value.

#define TRUE 1

Now we can more clearly express our intent in situations like flag = TRUE. This is a step in the right direction, but there is nothing stopping you from doing things like taking the square root of flag, because the name TRUE is really just veneer for an integer, and the compiler sees it as such.

We made a JsonType so we can distinctly operate on values associated with our task. The JSON language has strings, but does not care about the things Haskell strings/chars care about (length, case, etc.). What is significant to JSON is how they will be parsed and represented in comparison to numbers. From our standpoint, JSON’s strings and numbers are both JsonTypes that we will write functions to work on, sharing the fact that they represent a JSON value, and notwithstanding the fact that they bear resemblance to existing Haskell types.

On the other hand, it would be silly to not take advantage of similar functionality that would work just as well on a JsonString as a Haskell String, if we need to. For that reason, we’ll see it is painless to “extract” the Haskell value from its JsonString wrapper and play with it as needed.

Compound JSON Types

We haven’t yet talked about a couple of important JSON types yet: objects and arrays. These let us describe structured data (”dictionary style,” with key-value pairs, and laundry list style, respectively). We can represent compound types in our Haskell library quite simply:

-- ...
    | JsonArray [JsonType]

Whoa, let’s stop there already. On the surface, we’re doing the same thing as we did for JsonNumber and JsonBool: we make a data constructor, and make it take an existing Haskell type as a parameter (in this case, a list). But there’s an additional wrinkle: we need to state what kind of values this “JSONy list” should be able to hold. Well, in JSON, you can put any JSON type inside an array: a string, a boolean, even another array if you want. We need a way to express “this list can hold all JSON types.” And it just so happens we have a type that represents all JSON types, including JSON arrays themselves: JsonType! This is an example of a recursive type definition, or a type that refers to itself in its own definition. Don’t worry about the fact that we aren’t finished defining it yet!

And finally:

-- ...
    | JsonObject [(String, JsonType)]

This is also a recursive definition, as it’s a JsonType that contains a JsonType. If you recall, comma-delimited values inside parentheses represent a tuple (a fixed-sized collection of values). Thus we’re saying a JsonObject is a list of pairs with a String key and a JsonType value.

Let’s add default instances to our new type–Haskell can automatically give our new type the ability to be compared, sorted, and printed. We just have to say the words!

data JsonType = JsonString String
    | JsonNumber Double
    | JsonBool Bool
    | JsonNull
    | JsonArray [JsonType]
    | JsonObject [(String, JsonType)]
    deriving (Eq, Ord, Show)

The above complete type definition is suitable for some playin’ around in GHCi. Save it to a file (for example, SimpleJSON.hs) and open GHCi in the directory where you saved it. Use the :load command to tell it about your new definition.

ghci> :load SimpleJSON
[1 of 1] Compiling Main ( SimpleJSON.hs, interpreted )
Ok, modules loaded: Main.
ghci> :type JsonString “hello”
JsonString “hello” :: JsonType
ghci> 3.1 + JsonNumber 2.2
…error…

The important thing to note is that using a value constructor like JsonString creates a value of type JsonType. As the programmer who wrote the type, you know that it is a thin veneer over existing Haskell types, but other programmers who use the type don’t (and shouldn’t). For example, just because you know that a JsonNumber is basically a Double, trying to treat it like one (as in the last example above) is illegal.

But um, so?

In its current form, our new type isn’t very useful; we can really only use JsonType values for making JsonArrays or JsonObjects (or with the automatic stuff Haskell did thanks to our deriving statement). Creating a library in Haskell (and in many languages) is a matter of creating the necessary types, and the operations that work on those types. That means we need to define some functions!

What functions do we need? Now’s a good time for a look at the Big Picture:

What is our goal?

I think it’s important to not lose focus of why we created JsonType in the midst of learning how to do it. So keeping in mind that our end goal is translating data to JSON and back again, it makes sense that we need some functions to help with this conversion.

Our JsonType serves as a good “programmy” representation of JSON; we can now write functions that read JSON-formatted files or streams (for example, customer order data from a website) and convert that text into Haskell by outputting JsonType values. Likewise, we can imagine writing functions that then extract the useful data out of JsonTypes, for actual use (for example, in order to total up the line items of a customer’s order, we need to get the “number part” out of JsonNumber so we can add them). Finally, we’ll probably want a way to spit out JSON to the outside world, and we can write functions that can output JSON when given data in JsonType form.

Let’s put that in words. Programmy words.

A great way to write programs in a language with an expressive type system is to express our goal via “function blueprints,” if you will–just writing the type signatures without worrying about implementation yet. Let’s take a shot at it, writing types for what we outlined as desired functionality:

-- Going from real JSON to our Haskell JsonType
fromJson :: String -> JsonType
-- Extracting actual data for manipulation
extractData :: JsonType -> ??
-- Saving data as JSON
toJson :: JsonType -> String

Not too bad so far. We can imagine how getting fromJson and toJson will work; given a string containing JSON-formatted text, we will figure out a way to parse the text and end up with a JsonType. Similarly, if we have a JsonType, we can figure out how to print it with JSON syntax. The tricky thing to think about the type of seems to be extracting values from our new type; after all, the result could be a number of different types (a Double, a String, a list of stuff, etc.). So we’ll need to break that down more:

extractString :: JsonType -> String
extractDouble :: JsonType -> Double
extractBool   :: JsonType -> Bool
extractObject :: JsonType -> [(String, JsonType)]  -- Remember, "object" in JSON terms is a collection of key-value pairs
extractArray  :: JsonType -> [JsonType]  -- Remember, we are representing a JSON array as a list

Now we’re getting somewhere! Oh wait, we forgot one of our JsonTypesJsonNull. Looking at our type definition, we didn’t wrap an existing Haskell type to make a JsonNull; “null” can really only be one thing (the absence of a value), and therefore it’s not necessary to “extract” data out of it (we’d always get the same thing!). Of course, we do need a way of telling if we are dealing with a null value:

isNull :: JsonType -> Bool

Why boilerplate?

Users of dynamically-typed languages are probably not excited about having to write a function for every different type that can be extracted from a JsonType. Haskell recognizes that dealing with types sometimes requires more typing (no pun intended), and provides a shortcut or two to help out. The primary one is via record syntax, which I won’t talk about here. Suffice to say it’s quite easy to have Haskell automatically create “accessor functions” like our extract functions above for a type we define. So don’t worry too much!

Implementation

Now that we’ve sketched out some functions, we can start implementing them. During this process, we might think of other functions that would be nice to have. I personally like to pretend such methods existed, then go back and implement them once I’m done with whatever I was doing.

Stubbing out functions
It often happens that we want to compile our source file in order to test a particular function we’ve written. It’s probably the case that we want to do this before implementing every function we’ve dreamed up. We shouldn’t let unimplemented functions prevent us from a quick compile to test what we’ve already done!

It’s therefore useful to be able to “stub out” a function that we haven’t written yet. This is done very simply:

someFunction = undefined

You might say the special value undefined counts as any type, so code using it will compile even if you’ve already written the type signature for the stubbed function.

Let’s start off by implementing our extractor functions, as they are straightforward. The key here is going to be pattern matching.

extractString :: JsonType -> String
extractString (JsonString innerString) = innerString

Remember, when we define functions in Haskell, we can not only specify parameters, but how those parameters were created. Our type signature says we accept a JsonType parameter, and in our definition we specify a pattern that says specifically we want a JsonType that was created via the JsonString data constructor. That’s already cool, but furthermore it allows us to give the underlying data the type was created with a name. That is, in this case, we can now name the Haskell String inside the JsonString and use it! And use it we do–by returning it as the value of our extractString function.

Error handling

We just made our first accessor function by matching a JsonType value that was created with the JsonString constructor. This works fine for situations like:

ghci> let valueParsed = JsonString “hello”
ghci> extractString valueParsed
“hello”

But try this:

ghci> let valueParsed = JsonNumber 23
ghci> extractString valueParsed

Whoops–our extractString function can take any sort of JsonType, so the above code compiles, but then it blows up when it realizes its parameter wasn’t created via the JsonString constructor.

Since our ultimate goal is to parse arbitrary JSON data, this won’t do. If extractString fails to parse its argument, it doesn’t mean our whole program should crash, it just means we need to try a different extractor function. So clearly we need a way of indicating failure in a less drastic way, without throwing an exception. Many languages leave how to achieve this up to you: Perhaps a special value is returned, like -1, or false, or null. Then future code has to know that special value and check for it explicitly.

Haskell has a nice solution to “returning null” that leverages the type system so there’s no guesswork in future code: we can write a function that maybe returns a useful value, and there’s an actual type for that! This means that code that uses the return value knows in advance that it’s only maybe going to get a useful value; it can’t blindly march forward without checking first. Hence, you’ll never run into the “null reference exception” problem that we’ve all dealt with in other languages.

So, how do we do this? Well, we need to find out how the JsonType that is passed in was created, and if it matches the JsonString constructor, then we return a useful value. If not, we return a failure value. Future code can then check if the failure value was returned, and if so, try a different extractor or whatever.

Ugh, does that mean boilerplate again?

We all hate writing “tests for null.” They are ubiquitous and as such lead to our code containing a lot of boilerplate if-statements. Fortunately, Haskell provides a neat way of hiding this particular breed of inelegance, and we will learn about it in the future. To avoid introducing too much at once, however, we’ll explicitly check for the time being.

Our extractor implementations, version 2

The type that represents “maybe a value” is called, interestingly enough, Maybe. Maybe is the name of the type constructor, just like JsonType is the name of our type’s type constructor. So we write our type signatures something like this:

extractString :: JsonType -> Maybe String

That reads pretty well, doesn’t it? Notice how we can still specify the type of the value that might be returned (String in this case).

Now we need to know how to create a Maybe value. Just like JsonType has a number of value constructors (JsonString, JsonNumber, etc.), Maybe has a couple as well. The constructor for making a value that represents “no useful value” is called Nothing. It’s kinda like our JsonNull constructor: it doesn’t wrap any other value, because there’s no value to wrap. The constructor for representing a useful value is called Just. Just takes a parameter–the value to wrap. Here is Maybe in action:

extractString :: JsonType -> Maybe String
extractString (JsonString innerString) = Just innerString
extractString somethingElse            = Nothing

So now it looks like we have two versions of our extractString function, and indeed we do: one for taking JsonStrings, and one for anything else. So if the function receives a JsonString, we create the “useful” Maybe type, wrapping the String we got. Otherwise, we just return Nothing.

As an (important) aside, note that we can use whatever name we want for “somethingElse.” This is because we aren’t trying to match an argument’s specific constructor; we’re basically just saying “match anything and give it the name somethingElse.” It is pointless (and can be misleading) to give a name to a value we aren’t going to use. In our extractor function, we don’t care about the value of somethingElse; we are simply interested in catching all values that weren’t created via JsonString. Therefore, Haskell programmers use the name _ (the underscore) in cases like this. It still matches everything, but makes it clear you aren’t interested in what was matched. We’ll use the underscore from now on for this purpose.

The rest of our extractors are similar:

extractDouble (JsonNumber number) = Just number
extractDouble _                   = Nothing
 
extractBool (JsonBool bool)       = Just bool
extractBool _                     = Nothing
 
extractObject (JsonObject object) = Just object
extractObject _                   = Nothing
 
extractArray (JsonArray array)    = Just array
extractArray _                    = Nothing

We can also write our isNull function quite easily:

isNull jsonData = jsonData == JsonNull

That was easy since Haskell figured out how to compare one JsonType with another automatically, because in our type declaration we told it to derive Eq. We’ll talk about what Eq is later.

Wrapping up part 1
To review, we’ve talked about making our own types, and what a type constructor and data constructor are. We’ve talked about the value of having distinct representations of distinct entities. You should feel comfortable defining accessor functions to get at data contained in a type, and using the Maybe type to represent a possible lack of value. We’ll continue next time with more from RWH, chapter 5.

Vaguely Instructional , , ,