CS 3110 Spring 2015 Problem Set 3 Version 0 (last modified March 12, 2015) Soft deadline: 3/26/15 11:59pm Hard deadline: 3/28/15 1:00pm Overview In this assignment you will implement several functions over infinite streams of data, an algorithm for inferring types, and an interpreter for OCalf - a functional language containing the core features of OCaml. Objectives This assignment should help you develop the following skills: • Developing software with a partner. • Working with a larger OCaml source code base. • Using software development tools like source control. • Understanding the environment model and type inference. • Experience with infinite data structures and lazy evaluation. Additional references The following supplementary materials may be helpful in completing this assignment: • Lectures 7, 8 • Recitations 7, 8 • The OCaml List Module and the Printf Module • Git Tutorial • Real World OCaml, Chapter 6 Academic integrity You are allowed to work with one other student in this class for this problem set. The sharing of code is only allowed between you and your partner; sharing code among groups is strictly prohibited. Please review the course policy on academic integrity. 1 Provided code This is a larger project than those you have worked on in PS1 and 2. Here is an overview of the release code. We strongly encourage you to read all of the .mli files before you start coding, so that you know what functions are available and understand the structure of the source code. Here is a brief list of the provided modules: Streams is the skeleton code for part 2. Eval and Infer are the skeleton code for parts 3 and 4. In addition to the functions that you must implement, there are various unimplemented helper functions defined in the .ml files. We found these helper functions useful when we implemented the assignment, but you should feel free to implement them, change them, extend them, or remove them. They will not be tested. Printer contains functions for printing out values of various types. This will be extremely helpful for testing and debugging. Examples and Meta These modules contain example values to help you start testing. Examples contains all of the examples from this document; Meta contains an extended example for parts 3 and 4. Ast and TypedAst These modules contain the type definitions for parts 3 and 4 respectively. The TypedAst module also contains the annotate and strip function for converting between the two. Parser contains parse functions that allow you to read values from strings. It will be useful for testing and debugging parts 3 and 4. Running your code In addition to using cs3110 test to test your code, we expect you to interact extensively with your code in the toplevel. To support this, we have provided a .ocamlinit file1 that automatically makes all of your compiled modules available in the toplevel. For convenience, it also opens many of them. A .ocamlinit file is just an OCaml file that is automatically #used by the toplevel when it starts. We encourage you to add to it; any time you find yourself typing the same thing into the toplevel more than once, add it to your .ocamlinit! In order for the files to load correctly, you must compile them first. If you change them, the changes will not be reflected in the toplevel until you recompile, even if you restart it. The file examples.ml references all other files in the project, so you should be able to recompile everything by simply running cs3110 compile examples.ml. 1 Note that on Linux, files whose names start with ‘.’ are hidden; use ls -A to include them in a directory listing. 2 Part 1: Source control (5 points) You are required to use some version control system for this project. We recommend using git. As part of good software engineering practices, you should use version control from the very beginning of the project. For a tutorial on git, see the git tutorial in the additional references section. You will submit your version control logs in the file logs.txt. These can be extracted from git by running “git log --stat > logs.txt” at the command line. Part 2: Lazy Evaluation (30 points) A stream is an infinite sequence of values. We can model streams in OCaml using the type type ’a stream = Stream of ’a * ( unit -> ’a stream ) Intuitively, the value of type ’a represents the current element of the stream and the unit -> ’a stream is a function that generates the rest of the stream. “But wait!” you might ask, “why not just use a list?” It’s a good question. We can create arbitrarily long finite lists in OCaml and we can even make simple infinite lists. For example: # let rec sevens = 7 :: sevens ;; val sevens : int list = [7; 7; 7; 7; 7; 7; ...] However, there are some issues in using these lists. Long finite lists must be explicitly represented in memory, even if we only care about a small piece of them. Also most standard functions will loop forever on infinite lists: # let eights = List . rev_map ((+) 1) sevens ;; ( loops forever ) In contrast, streams provide a compact, and space-efficient way to represent conceptually infinite sequences of values. The magic happens in the tail of the stream, which is evaluated lazily—that is, only when we explicitly ask for it. Assuming we have a function map_str : (’a -> ’b) -> ’a stream -> ’b stream that works analogously to List.map and a stream sevens_str, we can call # let eights_str = map_str ((+) 1) sevens_str ;; val eights_str : int stream and obtain an immediate result. This result is the stream obtained after applying the map function to the first element of sevens_str. Map is applied to the second element only when we ask for the tail of this stream. In general, a function of type (unit -> ’a) is called a thunk. Conceptually, thunks are expressions whose evaluation has been delayed. We can explicitly wrap an arbitrary expression (of type ’a) inside of a thunk to delay its evaluation. To force evaluation, we apply the thunk to a unit value. To see how this works, consider the expression 3 let v = failwith " yolo " ;; Exception : Failure " yolo " which immediately raises an exception when evaluated. However, if we write let lazy_v = fun () -> failwith " yolo " ;; val lazy_v : unit -> ’a = <fun > then no exception is raised. Instead, when the thunk is forced, we see the error lazy_v ();; Exception : Failure " yolo " In this way, wrapping an expression in a thunk gives rise to lazy evaluation. Exercise 1: Implementing Stream Functions (20 points). (a) Implement a function take : int -> ’a stream -> ’a list that returns the first n elements of the stream s in a list. If n < 0, return an empty list. (b) Implement a function repeat : ’a -> ’a stream that produces a stream whose elements are all equal to x. (c) Implement a function map : ( ’ a -> ’b ) -> ’a stream -> ’b stream that (lazily) applies the input function to each element of the stream. (d) Implement a function diag : ’a stream stream -> ’a stream that takes a stream of streams of the form s0,0 s0,1 s0,2 s1,0 s1,1 s1,2 s2,0 s2,1 s2,2 .. .. .. . . . ··· ··· ··· ... and returns the stream containing just the diagonal elements s0,0 s1,1 s2,2 · · · 4 (e) Implement a function suffixes : ’a stream -> ’a stream stream that takes a stream of the form s0 s1 s2 s3 s4 · · · and returns a stream of streams containing all suffixes of the input s0 s1 s2 s1 s2 s3 s2 s3 s4 .. .. .. . . . ··· ··· ··· .. . (f) Implement a function interleave : ’a stream -> ’a stream -> ’a stream that takes two streams s0 s1 · · · and t0 t1 · · · as input and returns the stream s0 t0 s1 t1 s2 t2 · · · Exercise 2: Creating Infinite Streams (10 points). In this exercise you will create some interesting infinite streams. We highly recommend using the printing functions we have provided in printer.ml. A part of being a good software engineer is being able to implement tools that will help you test and implement code. You will have the opportunity to implement your own print function in a later exercise, but for now, you may use the ones we have provided. See printer.mli for more details. (a) The Fibonacci numbers ai can be specified by the recurrence relation a0 = 0, a1 = 1, and an = an−1 + an−2 . Create a stream fibs : int stream whose nth element is the nth Fibonacci number: print_stream_int ( fibs ());; [0 , 1 , 1 , 2 , 3 , 5 , 8 , 13 , ...] (b) The irrational number π can be approximated via the formula ∞ X (−1)n π=4 . 2n + 1 n=0 Write a stream pi : float stream whose nth element is the nth partial sum in the formula above: 5 print_stream_float ( pi ());; [4. , 2.66666666667 , 3.46666666667 , 2.89523809524 , 3.33968253968 , 2.97604617605 , 3.28373848374 , 3.017071871707 , ...] (c) The look-and-say sequence ai is defined recursively; a0 = 1, and an is generated from an−1 by reading off the digits of an−1 as follows: for each consecutive sequence of identical digits, pronounce the number of consecutive digits and the digit itself. For example: – the first element is the number 1, – the second element is 11 because the previous element is read as “one one”, – the third element is 21 because 11 is read as “two one”, – the fourth element is 1211 because 21 is read as “one two, one one”, – the fifth element is 111221 because 1211 is read as “one one, one two, two one”. Write a stream look_and_say : int list stream whose nth element is a list containing the digits of an ordered from most significant to least significant. Similar to the Fibonacci and pi streams, we strongly suggest you test your implementation. To print the look-and-say sequence, you can use print_stream_int_list ( look_and_say ());; [[ 1 ] , [ 1 1 ] , [ 2 1 ] , [ 1 2 1 1 ] , [ 1 1 1 2 2 1 ] , [ 3 1 2 2 1 1 ], [ 1 3 1 1 2 2 2 1 ], [ 1 1 1 3 2 1 3 2 1 1 ] , ...] 6 Part 3: ML interpreter (40 points) In this part of this assignment, you will implement the evaluation of an interpreter for a subset of OCaml called OCalf. This will provide functionality similar to the top-level loop we have been using for much of the semester. Background on interpreters An interpreter is a program that takes a source program and computes the result described by the program. An interpreter differs from a compiler in that it carries out the computations itself rather than emitting code which implements the specified computation. A typical interpreter is implemented in several stages: 1. Parsing: the interpreter reads a string containing a program, and converts it into a representation called an abstract syntax tree (AST). An AST is a convenient data structure for representing programs: it is a tree containing a node for each subexpression; the children of the node are the subexpressions of the expression. For example, here is an abstract syntax tree representing the program fun x -> 3 + x: fun parse x "fun x -> 3 + x" print + 3 x utop # parse_expr " fun x -> 3 + x " ;; - : expr = Fun ( " x " , BinOp ( Plus , Int 3 , Var " x " )) 2. Type checking: the interpreter verifies that the program is well-formed according to the type system for the language—e.g. the program does not contain expressions such as 3 + true. 3. Evaluation: the interpreter evaluates the AST into a final value (or runs forever). We have provided you with a type Ast.expr that represents abstract syntax trees, as well as a parser that transforms strings into ASTs and a printer that transforms ASTs into strings. Your task for this part will be to implement evaluation (typechecking is part 4). 7 Your tasks For this part of the assignment, you will be working in the file eval.ml. In particular, you will implement the function eval : Eval . environment -> Ast . expr -> Eval . value which evaluates OCalf expressions as prescribed by the environment model. For example, we can evaluate the expression “if false then 3 + 5 else 3 * 5” in the empty environment as follows: eval [] ( parse_expr " if false then 3 + 5 else 3 * 5 " ));; - : value = VInt 15 OCalf has the following kinds of values: 1. Unit, integers, booleans, and strings. 2. Closures, which represent functions and the variables in their scopes. 3. Variants, which represent user-defined values like Some 3 and None; unlike OCaml, all variant values must have an argument. For example, None and Nil are represented as None () and Nil (). 4. Pairs, which represent 2-tuples. Unlike OCaml, all tuples are pairs; you can represent larger tuples as pairs of pairs (e.g. (1,(2,3)) instead of (1,2,3)). As a simple example of evaluation, using the notation learned in class, [] :: 7 + 35 || 42. Your interpreter should evaluate both sides of the + operator, which are already values (7 and 35), and then return an integer representing the sum of values 7 and 35, i.e. 42. More details and examples on the environment model can be found in the lecture notes. Because you can run OCalf programs without type checking them first, your interpreter may come across expressions it can’t evaluate (such as 3 + true or match 3 with | 1 -> false). These expressions should evaluate to the special value VError. Exercise 3: Implement eval without LetRec or Match (30 points). We recommend the following plan for this exercise: (a) Implement eval for the primitive types (Unit, Int, etc.), BinOp (which represents binary operations like +, * and ^), If, Var, Fun, Pair, and Variant. Note: for OCalf, the comparison operators =, <, <=, >, >=, and <> only operate on integers. (b) Implement eval for App and Let. 8 Exercise 4: Implement LetRec (5 points). Extend eval to include the evaluation of LetRec. Evaluating let rec expressions is tricky because you need a binding in the environment for the defined before you can evaluate its definition. In lecture we extended the definition of a closure to account for recursive functions; in this assignment we will use another common technique called backpatching. To evaluate the expression let rec f = e1 in e2 using backpatching, we first evaluate e1 in an environment where f is bound to a dummy value. If the value of f is used while evaluating e1, it is an error (this would occur if the programmer wrote let rec x = 3 + x for example), however it is likely that e1 will evaluate to a closure that contains a binding for f in its environment. Once we have evalutated e1 to v1, we imperatively update the binding of f within v1 to refer to v1. This “ties the knot”, allowing v1 to refer to itself. We then proceed to evaluate e2 in the environment where f is bound to v1. To support backpatching, we have defined the environment type to contain binding refs instead of just bindings. This enables the bindings to be imperatively updated. Exercise 5: Implement Match (5 points). Extend eval to support match statements. Take care to correctly bind the variables that are defined by the pattern. You may find it helpful to implement a function find_match : pattern -> value -> environment option to check for a match and return the bindings if one is found. If a given match expression does not match any of the patterns in a match expression, the pattern matching is said to be inexhaustive. However, given pattern matchings, you do not need to check if the pattern matchings are inexhaustive or if there are repetitive match cases. Your implementation should return VError if pattern matching fails at run-time. 9 Part 4: Type inference (50 points) For the third part of the assignment, you will implement type inference for your interpreted language. OCaml’s type inference algorithm proceeds in three steps. First, each node of the AST is annotated with a different type variable. Second, the AST is traversed to collect a set of equations between types that must hold for the expression to be well-typed. Finally, those equations are solved to find an assignment of types to the original variables (this step is called unification). For example, suppose we were typechecking the expression (fun x -> 3 + x) (this example is infer_example in the Examples module of the provided code). The annotation phase would add a new type variable to each subexpression, yielding print_aexpr ( annotate infer_example );; ( fun ( x : ’ t04 ) -> (( x : ’ t02 ) + (3 : ’ t01 ) : ’ t03 ) : ’ t05 ) fun x fun:’t05 annotate + 3 x x:’t04 +:’t03 3:’t02 x:’t01 Next, we would collect equations from each node. From our typing rules, we know that (fun x -> e) : t1 -> t2 if e:t2 under the assumption x:t1. Stated differently, (fun (x:t1) -> e:t2) : t3 if and only if t3 = t1 -> t2. Therefore, while collecting constraints from the fun node above, we output the constraint ’t05 = ’t04 -> ’t03. Similarly, we must also collect constraints from the + node. Our typing rules tell us that e1 + e2 : int if and only if e1 : int and e2 : int. Put another way, (e1:t1) + (e2:t2) : t3 if and only if t1 = int, t2 = int, and t3 = int. Therefore, while collecting constraints from the + node above, we output the three equations ’t03 = int, ’t02 = int, and ’t01 = int. We would also recursively collect the constraints from the 3 node and the x node. The rules for typing 3 say that 3 : t if and only if t = int, so we output the constraint ’t02 = int. The rules for variables tell us that if x had type t when it was bound then it has type t when it is used. Therefore, when collecting constraints from the x:’t01 node, we output the equation ’t01 = ’t04. Putting this together, we have print_eqns ’ t01 ’ t02 ’ t02 ( collect [] ( annotate infer_example ));; = int ’ t03 = int = int ’ t04 = ’ t01 = int ’ t05 = ’ t04 -> ’ t03 Finally, these equations are solved in the unification step, assigning int to ’t01 through ’t04, and int->int to ’t05. Substituting this back into the original expression, we have 10 print_aexpr ( infer [] infer_example );; ( fun ( x : int ) -> ((3 : int ) + ( x : int ) : int ) : int -> int ) If the constraints output by collect are not satisfiable (for example they may require int = bool or ’t = ’t -> ’t), then it is impossible to give a type to the expression, so a type error is raised. If the system is underconstrained, then there will be some unsubstituted variables left over after the unified variables are plugged in. In this case, these variables could have any value, so the typechecker gives them user-friendly names like ’a and ’b. Your type inference tasks We have provided the annotate and unify phases of type inference for you; your job is to implement the collect phase. Exercise 6: Implement collect without variants (40 points). We recommend the following plan for implementing collect. 1. Implement collect for Fun, Plus, Int, and Var. This will allow you to typecheck the infer_example example as described above. 2. Implement the collect function for all syntactic forms except Variant and Match. 3. Implement the collect function for Match statements, but leave out the handling of PVariant patterns. Match statements are a bit trickier because you need to make sure the bindings from the patterns are available while checking the bodies of the cases. Exercise 7: Implement collect with variants (10 points). Extend collect to handle variant types. Variant types are tricky because you need to make use of a type definition to determine how the types of arguments of a constructor relate to the type that the constructor belongs to. Type definitions are represented by TypedAst.variant_specs, and a list of them is provided to the collect function. Each variant spec contains a list vars of “free variables” (like the ’a in ’a last), a type name ("list" in the case of ’a list), and a list of constructors (with their types). See Examples.list_spec and Examples.option_spec for examples. Deriving the correct constraints from a variant_spec requires some subtlety. Consider the following example: ( Some (1 : t1 ) : t2 , Some ( " where " : t3 ) : t4 ) 11 (this is Examples.infer_variant). A naive way to typecheck it would be to collect the constraints ’t2 = ’a option and ’t1 = ’a from the Some 1 subexpression, and the constraints ’t4 = ’a option and ’t3 = ’a from the Some "where" subexpression. However this would force ’a to be both string and int, so the expression would not typecheck. A better approach is to think of the constructor type as a template from which you create constraints. You can correctly generate the constraints by creating a new type variable for each of the free variables of the variant_spec, and substituting them appropriately in the constructor’s type. In the above example, you might create the variable ’x for ’a while collecting Some 1, and output the constraints ’t1 = ’x and ’t2 = ’x option, and you might create the variable ’y for ’a while collecting Some "where", and output the constraints ’t3 = ’y and ’t4 = ’y option. This would allow infer_variant to type check correctly. Exercise 8: [Optional] Implement let-polymorphism (0 points). Note: This question is completely optional and will not affect your grade in any way. Note 2: This problem may require you to reorganize your code somewhat. Make sure you make good use of source control, helper functions, and unit tests to ensure that you don’t break your existing code while working on it! The type inference algorithm described above allows us to give expressions polymorphic types: fun x -> x will be given the type ’a -> ’a. However, it does not let us use them polymorphically. Consider the following expression: let ( any : t1 ) = ( fun ( x : t3 ) -> ( x : t4 )): t2 in ( any 1 , any " where " ) (this is Examples.infer_poly). OCaml will give this expression the type int * string, but our naive inference algorithm fails. The problem is similar to the subtlety with variants above: we will generate the constraints (typeof any) = (typeof 1) -> t2 and (typeof any) = (typeof "where") -> t2, but these constraints can only be solved if int = typeof 1 = typeof "where"= string. The solution is also similar: every time a variable defined by a let expression is used, we create a new type variable corresponding to the use. We use the constraints generated while type checking the body of the let as a template for a new set of constraints on the new variable. Using the constraints from the bodies of lets as templates is called let-polymorphism. In the above example, while type checking the body of the let expression, we will generate the constraints t1 = t2, t2 = t3->t4, and t3 = t4. While checking the expression (any:t5) 1, we can recreate these constraints with new type variables t1’, t2’, and so on. We will output the constraints t1’=t2’, t2’ = t3’->t4’ and t3’=t4’. We also will output the constraint t5=t1’. Because the type variables are cloned, they are free to vary independently. Because we have cloned their constraints, they must be used in a manner that is consistent with their definition. 12 Part 5: Written questions (10 points) In the file written.txt or written.pdf, answer the following questions. Exercise 9: Unification (5 points). We did not ask you to implement unify, but the algorithm is essentially the same as the one you use to solve algebraic equations. To give you a sense of how it works, solve the following type equations for ’t1, ’t2, and ’t3. ’ t5 = ’ t4 -> ’ t7 ’ t5 = int -> ’ t6 ’ t6 = bool ’ t1 = ’ t6 -> ’ t5 ’ t2 = ’ t7 list ’ t3 = bool * ’ t4 Exercise 10: Factoring (5 points). The type definitions Ast.expr and TypedAst.annotated_expr are nearly identical. Moreover, there are a number of simple helper functions that look very similar. Describe how you might create a single type and helper function that make the definitions of expr, annotated_expr, annotate, strip, and subst_expr all one-liners. Be sure to make it possible to create an expr without any reference to types (that is, don’t just define expr as a synonym for annotated_expr). Exercise 11: Feedback (0 points). Let us know what parts of the assignment you enjoyed, what parts you didn’t enjoy, and any other comments that would help us improve the assignment for the future. What to submit You should submit the following files on CMS: • streams.ml containing your stream implementations • eval.ml containing your interpreter • infer.ml containing your type checker • written.txt or written.pdf containing your responses to the written questions • logs.txt containing your source control logs 13
© Copyright 2024