In this repository you will find an haskell's implementation of the portion of the micro-ml programming language proposed for the first part of the pratical assignment of ICL. You will find a typechecker, a parser, an interpreter and a compiler for the language.
Due to circumstances we've decided to write the report here in this README file.
To build you need to have ghc
(haskell compiler), gnu make
and a version of java installed (for the compiler). After making sure you have both installed you can just run make
, at the root of the repo, and it will drop out of some box an executable which full name from the root the project will be bin/main
.
Its one executable and not two. The professor can run just the executable providing no arguments and it will display what it expects to receive. But to sum it up if you run bin/main -i <filename>
it will run the interpreter and print the results. If you run bin/main -c <filename>
(this one really needs to executed in the root of the project as shown here) it will generate a jar file with the same base name as <filename>
but ending with .jar
. Which means if you run bin/main -c path/to/file.mml
it will generate a file name file.jar
on the current directory. (To run the jar file use java -jar <generated-jar-filename>
). You can also run a repl but using bin/main repl
and a life editor (which unfortunately only interprets and doesn't compile) by using bin/main editor
and it will allow you to write code and when you hit CTRL+D it will run the interpreter of the text you provide.
Note: The compiler will use the jar file stored in the bin/ folder to run jasmine that one of the reasons you will need to run the executable from the root of the project when using the compiler. It will also compile the java files in the folder stdlib/ and add to the final jar.
When compiling one can also see the code jasmine
assembler generated by the compiler by providing an environment variable named DEBUG, for example DEBUG=1 bin/main -c path/to/file
will print on the terminal the code for each jasmine
file generated.
Here below we list the context-free grammar
of our parser:
<program> ::= <decl> EOF
<decl> ::= <sequence> | <letBlock> | <whileExpr> | <ifExpr> | <funDecl>
<sequence> ::= <assigment> (";" <sequence>)*
<assigment> ::= <expr> (":=" <assigment>)?
<expr> ::= <logicalOr> ( "&&" <logicalOr> )*
<logicalOr> ::= <comparison> ( "||" <comparison>)*
<comparison> ::= <term> (( ">" | "<" | "==" | "!=" | ">=" | "<=" ) <term> )*
<term> ::= <factor> (( "+" | "-" ) <term> )*
<factor> ::= <primary> (( "*" | "/" ) <factor> )*
<unary> ::= ("-"|"~"|"!"|"new"|"print" | "println") <unary> | <call>
<call> ::= <primary> (<primary>)*
<primary> ::= "true" | "false" | Num | "()" | "(" <decl> ")" | ID
<letBlock> ::= "let" ( Id (":"<type>)? "=" <decl> ";;" )+ "in" <decl> "end"
<funDecl> ::= "fun" <func-pars> "->" <decl> "end"
<ifExpr> ::= "if" <expr> "then" <decl> ("else" <decl>)? "end"
<whileExpr> ::= "while" <expr> "do" <decl> "end"
<func-pars> ::= "()" | ( "(" Id ":" <type> ")" | Id )+
<type> ::= "int" | "bool" | "unit" | "ref" <type> | <type> "->" <type>
A few things are worth noting:
- You will need to provide
;;
at the end of each variable attribution this is due to what is explained in the next bullet point. - You don't need parenthesis neither commas to do function calls. Its much like you do in Haskell which means you can't do
f -1
since function calls have the highest priority, it has to bef (-1)
etc... - Function declaration is pretty much like you do in
Ocaml
- There no actual way to on the function declaration specify what it returns unless if you
:
after the variable name (that will hold the function) - Also as you might already see in the type definition (which is pretty much what the professor required) it doesn't make difference of types like
int -> int -> bool -> bool
andint -> (int -> bool) -> bool
, which might seen to be the same but aren't since the latter expects a function as the 2nd argument which the first an integer. This is just a parser inconvenience, one can always dofun (x : int) (f : int -> bool) (b : bool) -> ... end
and it will work as expected.
About the AST, we tried to be clever and spare as much as possible, not to endup with tons of nodes so the full extend of it might not be easy to understand. But in the beginning of of src/Types.hs you will find a list of all the official nodes. I don't know if you will even peek at the code, but I left this message just in case.
As extra features we've done:
- Type inference and polymorphism
- Partial application of functions
- We have some tests on the folder
tests/
(the ones that haveerror
on their name should fail on the type checker with the sole exception oftests/func-partial-hard.mml
- explained below) - Our error messages show the line where it occur (it also shows the position but its not great since we started to do unifications ... One can be sure that the error is on the token that starts at where it points but not exactly at the position is pointing when it is reporting a typing error)
Regarding the partial application, it works as one would expect but due to technical problems and lack of time it doesn't support polymorphism. One example that illustrates that is the one found in the file tests/func-partial-hard.mml
file:
let
print2 = fun x y -> println x ; println y end ;;
print1 = print2 10 ;;
in
print1 () ;
print1 true
end
One would expect that it would work, but the truth is that it won't since the type of print1
becomes int -> unit -> unit
after the line print1 ()
. This is due to let-polymorphism technique, that in this case would need to see which variables should still be free after doing print2 10
. I don't know if was supposed to work or not, I didn't manage to get enough time to think about it.
Fortunately or unfortunately all the program files are in the src/
dir which might confuse one that tries to see how the usage of each one. I thought in using subfolders to but delayed it and in the end couldn't do it due to time. But if I was to create a folder it would look something like this:
.
└── src
├── compiler
│ ├── Compiler.hs
│ └── Serializer.hs
├── core
│ ├── Core.hs
│ └── Types.hs
├── error
│ └── Errors.hs
├── frontend
│ ├── Parser.hs
│ └── Scanner.hs
├── interpreter
│ └── Interpreter.hs
├── Main.hs
└── typecheck
├── TypeChecker.hs
├── Unification.hs
└── UnionFind.hs
Hopefully this gives some insight into how everything is organized and what does what.
As the professor can see there is a folder named stdlib/
which contains some java classes and functions that is used on the run time by our compiler. For things like partial application, references and functions definition. They are included in the final jar that will also the rest of the program.
In order to support this, we needed to use a uniform representation of functions, and so we did. All our functions implements the interface found in stdlib/Func.java
which basically has the method apply that has the following signature:
Object apply(Object[] args);
This means that we do wrap and unwrapping of values to and from Object in order to support polymorphism.
In order to support partial application we made a function that can be found on stdlib/Utils.java
that basically receives an instance of stdlib/Func.java
and an array of Object
and does some trick. Its pretty simple java code, and when an partial application is needed that function is called.
We also wanted to have a single references so all the references are actually object of the class found in stdlib/Ref.java
which has a field name value
that has the type Object
. So we also do wrap and unwrapping when storing and restoring what the reference store.
To represent a unit we used the class found in stdlib/Unit.java
which has the static final
field named SINGLE
this is loaded to the stack wherever the language produces a unit type.
One small detail I wanted to point out is that although a branch of the sum type that makes the AstNode
named PartialApplication
. This one is not produced by the compiler but, later when doing the substitutions of the types in the end of typecheck function to produced the final TypedAst
when a call is made with less than the number of arguments expected, that one is turned into a ParatialApplication
node. Furthermore there was made a small change on the unification algorithm part found in the unify
function in src/Unification.hs
file. Which basically when is unifying two functions type if one has more types/expects more types than the other, it will basically match up until the last of the smaller and then unify the result of the smallest with a function that is the rest that wasn't matched. If this explanation sounds weird, just go and check it out. The code is pretty simple.