Pire - Pi-forall's Refactorer

Andreas Reuleaux

2014-2018

Introduction

Pi-forall is a small dependently typed programming language developed by Stephanie Weirich, mainly for educational purposes, and implemented in Haskell.

While less powerful than its better known cousins (Idris, Agda, Coq etc.), its internals may be more easily comprehensible than theirs, due to its small size, but thanks also to Stephanie's great OPLSS lectures in 2013 and 2014, where she presented Pi-forall in a step by step manner (not to mention numerous other great presentations of hers), cf. section #p4a_101 for some more details about Pi-forall.

There are thus several different versions of Pi-forall available, and the one considered here is the final version (in the final directory) of the code accompanying her OPLSS14 presentation (in Pi-forall's 2014 branch, its main branch i.e.).

Interesting therefore also as a target for refactoring, Pire aims to be Pi-forall's refactorer (pronounced any way you like: "pi-re" for "Pi-forall re-factorer", or "pire" like "hire", or as we do: in French, where it sounds even worse :-)).

Some noteworthy points:

Pi-forall 101

Starting from just the bare minimum of language constructs:

a, A :=
  x            -- variables
  \x . a       -- lambda abstraction
  x b          -- function application
  (x: A) -> B  -- function type constructor (aka pi-type)
  x: Type      -- the type of types

Pi-forall is presented at various stages of increasing complexity in the course of Stephanie's discussion:

Pi-forall has all the characteristics of a dependently typed language:

The language is somehow limited:

At the same time some aspects of Pi-forall are relatively elaborate:

Pi-forall's type system is certainly its most interesting part, and in the center of Stephanie's discussion: how the typing rules can be broken down into two different groups: checking and inference rules, as reflected in the implementation.

Pire: Source code and Installation

Installation instructions as of November 2018 (ghc 8.2.2, cabal 2.0), some sections yet to be refined.

Pire consists of

and includes Pi-forall's original source code in particular. Pire is organised in several sub-projects1 (sub-directories):

In theory it should be possible to install (and use) Pire with basic git and cabal commands only, cf. section #standard_installation below.

In practice it may be easier, however, to use our shake tool chain instead (which we provide as well for convenience, cf. section #shake), the shake commands being useful for Pire development in particular.

standard installation with git and cabal

clone Pire, and create a sandbox

git clone git://git.a-rx.info/pire
cd pire
cabal sandbox init

update the available packages, and do a dry run first, to see, what packages would be installed

cabal update
cabal install --dry-run

tip: maybe use highlight-versions to see the details in a more fine grained manner (in colour)

cabal update
cabal install highlight-versions
cabal install --dry-run | highlight-versions

for example

and if you are happy with this dry run, do the real installation:

cabal install

continue exploring Pire (cf. section #simple_examples below), and exercise cabal's possibilities, depending on interest for example:

simple examples (new syntax)

If your installation went well, you should be able to use Pire on the cabal repl2 command line.

cabal new-repl

Tip: depending on your installation (with or without testing enabled e.g.) it may be necessary to specify cabal's target more precisely (pire is our main target):

cabal new-repl pire

Try parsing some expressions for example (or create them with smart constructors: lam etc.), pretty print them, etc., start by loading some Pire modules:

>>> :m +Parser NoPos

or load a predefined list of Pire modules (_pire is a laundry list of further imports, for convenience):

>>> :l _pire

then continue (get rid of position information with nopos):

>>> parse expr   "\\x  .  a  "
...

>>> nopos $ parse expr   "\\x  .  a  "
LamPAs [(RuntimeP,"x",Nothing)] (Scope (V (F (V "a"))))

>>> pp $ nopos $ parse expr   "\\x  .  a  "
\x . a

>>> lam "x" $ V "z"
Lam "x" (Scope (V (F (V "z"))))

>>> pp $ lam "x" $ V "z"
\x . z

>>> nopos $ parse expr  "if cond then    a else   b"
If (V "cond") (V "a") (V "b") Nothing

>>> pp $ nopos $ parse expr  "if cond then    a else   b"
if cond then a else b 

You may want to try out Pire's white space aware parsers (use expr_ instead of expr for expressions, and likewise for declarations, matches etc), which all parse (absy, token tree) pairs:

>>> nopos $ parse expr_  "\\x  .  a  "
(LamPAs [(RuntimeP,"x",Nothing)] (Scope (V (F (V "a")))),Node [Pair (Token "\\") (Ws ""),Node [Pair (Binder "x") (Ws "  ")],Pair (Token ".") (Ws "  "),Abstract ["x"] (Scope (Pair (Id (F (Id "a"))) (Ws "  ")))])

>>> nopos $ parse expr_  "if cond then    a else   b"
(If (V "cond") (V "a") (V "b") Nothing,Node [Pair (Token "if") (Ws " "),Pair (Id "cond") (Ws " "),Pair (Token "then") (Ws "    "),Pair (Id "a") (Ws " "),Pair (Token "else") (Ws "   "),Pair (Id "b") (Ws "")])

>>> ppr $ fst $ nopos $ parse expr_  "if cond then    a else   b"
If (V "cond") (V "a") (V "b") Nothing

>>> ppr $ snd $ nopos $ parse expr_  "if cond then    a else   b"
Node [Pair (Token "if") (Ws " ")
     ,Pair (Id "cond") (Ws " ")
     ,Pair (Token "then") (Ws "    ")
     ,Pair (Id "a") (Ws " ")
     ,Pair (Token "else") (Ws "   ")
     ,Pair (Id "b") (Ws "")]

>>> pp $ fst $ nopos $ parse expr_  "if cond then    a else   b"
if cond then a else b 

>>> pp $ snd $ nopos $ parse expr_  "if cond then    a else   b"
if cond then    a else   b

etc. Building upon these parsers, navigation in the syntax tree, and renaming are the next steps in your journey of exploring refactorings.

old syntax

Maybe you want to try out Pire's old syntax (defined in modules OldSyntax, OldParser, OldSmart, OldNavigation, etc.), where white space is kept within the absy.

The old syntax can be explored with the -DOLDSYNTAX flag (with :l _pire loading the same predefined list of Pire modules as above, but with OldSyntax instead of Syntax, OldParser / Parser etc.).

>>> :set -DOLDSYNTAX
>>> :l _pire
...

>>> ppr $ nopos $ parse expr_  "if cond then    a else   b"
If_ (IfTok "if" (Ws " "))
    (Ws_ (V "cond") (Ws " "))
    (ThenTok "then" (Ws "    "))
    (Ws_ (V "a") (Ws " "))
    (ElseTok "else" (Ws "   "))
    (Ws_ (V "b") (Ws ""))
    (Annot Nothing)

>>> pp $ nopos $ parse expr_  "if cond then    a else   b"
if cond  then    a  else    b 

>>> nopos $ parse expr_  "\\x  .  a  "
LamPAs_ (LamTok "\\" (Ws "")) [(RuntimeP,Binder_ "x" (Ws "  "),Annot Nothing)] (Dot "." (Ws "  ")) (Scope (Ws_ (V (F (V "a"))) (Ws "  ")))

>>> pp $ nopos $ parse expr_  "\\x  .  a  "
\x  .  a 

>>> ppr $ nopos $ parse expr_  "\\x  .  a  "
LamPAs_ (LamTok "\" (Ws ""))
        [(RuntimeP,Binder_ "x" (Ws "  "),Annot Nothing)]
        (Dot "." (Ws "  "))
        (Scope (Ws_ (V (F (V "a"))) (Ws "  ")))

shortcuts

Tip: These are handy shortcuts thus for cabal repl

(put them in your ~/.ghci):

:{
let cmdP _ = return $ unlines
             $ ""
             :":l _pire"
             :[]
     :}

-- aka "pire"
:def p cmdP


:{
let cmdOldP _ = return $ unlines
             $ ""
             :":set -DOLDSYNTAX"
             :":l _pire"
             :[]
      :}

-- aka "old pire"
:def oldp cmdOldP

load Pire modules then:

We haven't found a convenient way yet to unset the -DOLDSYNTAX flag, i.e. having loaded Pire's old syntax:

>>> :oldp

switch back to Pire's new syntax :

our advice for now is thus to quit the cabal repl and start anew:

$ cabal new-repl
>>> :p

shake tool chain

The shake tool chain builds upon cabal, and allows (among other things):

and there are various good reasons to use it, for development, or just for building Pire:

We therefore offer our shake tool chain as well as an option for building Pire (files Build.hs, Boot.hs, build.sh and their dependencies: Tools.hs, TShake.hs, Flags.hs etc.), and discuss the shake commands, that we typically use for installation, and other administrative tasks (cf. section #standard_installation above for the more standard cabal commands).

We only require that the cabal files distributed with Pire via git are built without any extra flags:

Here is a list of tasks (and flags), that we perform with shake typically, as well as some background information (organising principles):

Here is a sample session of ours:

clean out our code base (to start anew)

./build.sh -boot -R -B -V cl

bootstrap a sandbox

./build.sh -boot -R -B -V boostrap

and build cabal files (with some additional dependencies, flags for testing, and docs)

./build.sh -boot -R -B -V --moredeps --morecode --docs --discover --roundtrip cb

do a dry run (we are lazy and keep the flags, not needed really, target dry is short for --dry install)

./build.sh -boot -R -B -V --moredeps --morecode --docs --discover --roundtrip dry

(if happy with our dry run:) install

./build.sh -boot -R -B -V --moredeps --morecode --docs --discover --roundtrip install

run our test suite (cf. #testing below)

./build.sh -B -V runalltests

build our haddocks (copied to directory html)

./build.sh -B -V hd

build our orgmode/html/pdf docs (docs/html, docs/pdf)

./build.sh -B -V html
./build.sh -B -V pdf

etc.

testing

this section needs updating (in particular the old part below), short summary: the recommended way is to use the different test suites, relying on our shake tool chain (runutests for our unit tests):

$ ./build.sh -B -V runutests
$ ./build.sh -B -V runfiletests
$ ./build.sh -B -V rundoctests
$ ./build.sh -B -V runroundtrip
$ ./build.sh -B -V runtestsuite

or all of them together with:

$ ./build.sh -B -V runalltests

these require additional dependencies, and flags to be set, however (runtestsuite eg. requires the executable discover, which is not built with Pire's core installation, etc.).

these dependencies can be ensured (flags can be set) by providing additional flags (at cabal file creation):

$ ./build.sh -B -V --discover cb

maybe use

$ ./build.sh -B -V --moredeps --discover cb

for even more dependencies, and recursively if you wish:

$ ./build.sh -R -B -V --moredeps --discover cb

this will set the discover flag then, and thus build discover as well.

Tip

we try to keep dependencies to a minimum (and want Pire's core installation to be fast):

strictly speaking the additional dependencies above (--moredeps etc.) are not needed, if you rely on cabal's testing features (Pire's #standard_installation):

ie. create a cabal file without any additional dependencies as the default pire.cabal:

./build.sh -B -V cb

and install with

./build.sh -boot -R -B -V --tests --doctests --roundtrip  --discover  install

(maybe add --dry-run for a dry run).

this calls:

cabal install --enable-tests -f "roundtrip" -f "discover" -f ...

he downside is that all the tests are then run at installation time (which can be slow) …

To test the old syntax modules (OldSyntax, OldParser, OldSmart, OldNavigation, etc.), cf. section #oldsyntax:

run the test suites above with --oldsyntax, this works currently only for the unit tests, and the testsuite:

$ ./build.sh -B -V --oldsyntax runutests
$ ./build.sh -B -V --oldsyntax runtestsuite

(and fails for some cases in testsuite still, yet to be sorted out), but then it shouldn't hurt to run the complete testsuite with --oldsyntax (even though not all of our tests take the old syntax into account yet).

$ ./build.sh -B -V --oldsyntax runalltests

the individual tests, one by one

Here is a more detailed list of our individual tests (yet to be updated):

  1. unit tests: #utests
  2. file tests: #filetests
  3. doctests
  4. roundtrip test

unit tests

$ cabal configure --enable-tests
$ cabal repl utest

load them with either

>>> :l tests/Tests.hs

or

>>> :l Tests

and run them

>>> main' "*"

or just:

>>> main

file tests

likewise for the file tests, eg.:

>>> :l tests/FileTests.hs
>>> main' "*"

Pi-forall in Pire

As mentioned already, Pire includes Pi-forall by means of two sub-projects (sub-directories):

It is possible then for example to use Pi-forall's parser, and main program goFilename:

$ cabal new-repl pire-pi-forall
>>> :m +PiForall.Parser
>>> parseExpr   "   \\x  .  a  "
Right (Pos "<interactive>" (line 1, column 4) (Lam (<(x,{Annot Nothing})> Pos "<interactive>" (line 1, column 11) (Var a))))

>>> :m +PiForall.Main
>>> goFilename "../pitests/Nat.pi"
...

etc.

export more parsers

old docs, yet to be updated/refined:

One thing that was useful in Pire's old installation and should therefore be rescued (as it is not provided out of the box by the above approach): make more of Pi-forall's parsers available: PiForall.Parser exports just parseExpr, but for more fine grained experimentation/testing one may want to test Pi-forall's individual parsers as well: expr, decl etc., in addition to the commands above one can then test for example (the expr parser here):

$ cabal new-repl pire-pi-forall
>>> runFreshM (evalStateT (runParserT (do { whiteSpace; v <- expr; eof; return v}) [] "<interactive>" " Succ n") emptyConstructorNames)
...

The Trifecta parsers: a tour

old docs, yet to be updated

Start the cabal repl and load the _pire.hs file (just a laundry list of further imports, for convenience), the system answers with a long list of modules being loaded (shortened here):

$ cabal new-repl
...
Loaded GHCi configuration from /home/rx/cfg/hs/dot-ghci
[ 1 of 51] Compiling Hint             ( Hint.hs, interpreted )
[ 2 of 51] Compiling OldSyntax        ( OldSyntax.hs, interpreted )
[ 3 of 51] Compiling OldSmart         ( OldSmart.hs, interpreted )
[ 4 of 51] Compiling OldNoPos         ( OldNoPos.hs, interpreted )
[ 5 of 51] Compiling OldNavigation    ( OldNavigation.hs, interpreted )
[ 6 of 51] Compiling OldForget        ( OldForget.hs, interpreted )
[ 7 of 51] Compiling OldText2String   ( OldText2String.hs, interpreted )
[ 8 of 51] Compiling PrettyCore       ( PrettyCore.hs, interpreted )
[ 9 of 51] Compiling PrettyCommon     ( PrettyCommon.hs, interpreted )
[10 of 51] Compiling Refactor         ( Refactor.hs, interpreted )
[11 of 51] Compiling Size             ( Size.hs, interpreted )
[12 of 51] Compiling Syntax           ( Syntax.hs, interpreted )
...

(or cabal new-repl pire …)

>>> :l _pire
...
[ 1 of 52] Compiling Hint             ( Hint.hs, interpreted )
[ 2 of 52] Compiling PrettyCore       ( PrettyCore.hs, interpreted )
[ 3 of 52] Compiling PrettyCommon     ( PrettyCommon.hs, interpreted )
[ 4 of 52] Compiling Refactor         ( Refactor.hs, interpreted )
[ 5 of 52] Compiling Size             ( Size.hs, interpreted )
[ 6 of 52] Compiling Syntax           ( Syntax.hs, interpreted )
[ 7 of 52] Compiling Smart            ( Smart.hs, interpreted )
[ 8 of 52] Compiling Pretty2          ( Pretty2.hs, interpreted )
[ 9 of 52] Compiling Desugar          ( Desugar.hs, interpreted )
[10 of 52] Compiling TC               ( TC.hs, interpreted )
[11 of 52] Compiling TT               ( TT.hs, interpreted )
[12 of 52] Compiling Term             ( Term.hs, interpreted 
...
>>>

Parse an identifier:

>>> parse (whiteSpace *> ide) "  wow this rocks"
"wow"

Now parse the same identifier in a white space aware parser, ie. including the whitespace following it

>>> parse (whiteSpace *> id2_) "  wow this rocks"
("wow",Pair (Id "wow") (Ws " "))

Naming conventions are as follows:

old syntax

In the old syntax both expr, and expr_ parse the same data structures: expressions (Expr), likewise do decl and decl_ parse declarations (Decl) etc., in Expr can then pairs of constructors be found,

More complicated expressions can be parsed, too:

>>> nopos $ parse (whiteSpace *> expr) "   \\n . if nat_leq n 0 then 1 else mult n (fac (minus n 1))  "
LamPAs [(RuntimeP,"n",Annot Nothing)] (Scope (If ((V (F (V "nat_leq")) :@ V (B 0)) :@ Nat 0) (Nat 1) ((V (F (V "mult")) :@ V (B 0)) :@ Paren (V (F (V "fac")) :@ Paren ((V (F (V "minus")) :@ V (B 0)) :@ Nat 1))) (Annot Nothing)))

And the equivalent expression in concrete syntax (expr_ instead of expr):

>>> nopos $ parse (whiteSpace *> expr_) "   \\n . if nat_leq n 0 then 1 else mult n (fac (minus n 1))  "
LamPAs_ (LamTok "\\" (Ws "")) [(RuntimeP,Binder_ "n" (Ws " "),Annot Nothing)] (Dot "." (Ws " ")) (Scope (If_ (IfTok "if" (Ws " ")) ((Ws_ (V (F (V "nat_leq"))) (Ws " ") :@ Ws_ (V (B 0)) (Ws " ")) :@ Nat_ 0 "0" (Ws " ")) (ThenTok "then" (Ws " ")) (Nat_ 1 "1" (Ws " ")) (ElseTok "else" (Ws " ")) ((Ws_ (V (F (V "mult"))) (Ws " ") :@ Ws_ (V (B 0)) (Ws " ")) :@ Paren_ (ParenOpen "(" (Ws "")) (Ws_ (V (F (V "fac"))) (Ws " ") :@ Paren_ (ParenOpen "(" (Ws "")) ((Ws_ (V (F (V "minus"))) (Ws " ") :@ Ws_ (V (B 0)) (Ws " ")) :@ Nat_ 1 "1" (Ws "")) (ParenClose ")" (Ws ""))) (ParenClose ")" (Ws "  "))) (Annot Nothing)))

Of course one wants some kind of pretty printing here

>>> pp $ fromSuccess $ parseString (runInnerParser $ evalStateT (whiteSpace *> expr) piInit) beginning " \\n . if nat_leq n 0 then 1 else mult n (fac (minus n 1))  "
\n . if nat_leq n 0 then 1 else mult n ((fac ((minus n 1))))

So what?, you might say. - See what happens if we parse the same expression with some comments in between: First with the expr parser as above:

>>> pp $ nopos $ parse (whiteSpace *> expr) "   \\n . if nat_leq n 0 then 1 else mult n (fac (minus n 1))  "
\n . if nat_leq n 0 then 1 else mult n (fac (minus n 1))

The comments are lost - they are comments after all for the absy parser. But now parse them with white-space aware parser (expr_ instead of expr):

(new syntax:)

>>> pp $ fst $ parse expr_ "\\n . if {-foo-} nat_leq n 0 then {-bar-} 1 else mult n (fac (minus n 1))  "
\n . if nat_leq n 0 then 1 else mult n (fac (minus n 1))


>>> pp $ snd $ parse expr_ "\\n . if {-foo-} nat_leq n 0 then {-bar-} 1 else mult n (fac (minus n 1))  "
\n . if {-foo-} nat_leq n 0 then {-bar-} 1 else mult n (fac (minus n 1))  

See how they are kept!

OK, lots more stuff to explain, and to do of course, just a rough overview at this point:

The Zipper implementation

As of November 2018 the zipper implementation has changed completely, this section thus needs updating.

(old docs as of August 2015, describing changes in our zipper implementation at the time): the basic idea is this:

The original zipper implementation (closely following the Zippers chapter of Learn you a Haskell) uses a list of breadcrumbs like so:

type Zipper a = (a, [Crumbs a])

the first part of this pair being the current focus, and the crumbs (or context) a means to get up in the tree, or reconstruct the original tree.

Thus, eg. in an apply, if we take a left turn:

left (Exp (lhs :@ rhs), bs) = rsucceed (Exp lhs, (AppLeftCrumb $ rhs) : bs)

we return the left hand side as the new focus, and keep track of the fact that we have taken a left turn, together with the right hand side in the AppLeftCrumb (the Exp is just a means to cope with the fact, that our tree has exprs/decls/modules etc.)

When going up, we have to handle those breadcrumbs

up (Exp e, AppLeftCrumb rhs : bs) = rsucceed (Exp $ e :@ rhs, bs)

This works so well, but obviously there are many different situations to take into account, and therefore many different constructors for breadcrumbs.

I have changed the breadcrumbs now to be a list of functions instead

type Zipper a = (Tr a, [Tr a -> Tr a])

(where Tr a is the tree of exprs, decls or modules). Thus to take a left turn, we add to the list of functions just another one:

left (Exp (l :@ r), bs) = rsucceed (Exp l, (\(Exp l') -> Exp $ l' :@ r) : bs)

This functions serves the same purpose: tells us how to get back, no need to handle any data structures for breadcrumbs though, resulting thus in much shorter code.

Another advantage is that the left turn and the function how to get back up are visually in the same place (whereas formerly they where in two different functions: left and up). up is trivial now: just an application.

The only caveat I would say is that one cannot easily look at these functions as breadcrumbs (but I can imagine even implementing some Show instance for these Tr a → Tr a functions at some point).

I owe this idea of using functions to Darryl McAdams, author of the From Zipper to Lens tutorial. This is a nice tutorial (meant as a motivation for Lens), but I would claim, that he is not implementing zippers there: zippers do allow to climb up the tree in a step by step manner, and we need that, for upToBinding / upToBinding' at least (and therefore keep a list of functions). With just a single function kept in his zippers though, one can only get back straight to the top of the tree.

Pure lenses are not suited for zippers, for this very reason, as I understand, but there are of course other zipper implementation, one of them being based on lenses: Control.Zipper

I am doing some experimentation with them currently, likewise with Lens.Action, and the code has therefore some some additional dependencies on lens-action and zippers currently. They are not strictly necessary, and may be removed in the future again.

(Another generic approach would be scrap you zippers, maybe worth looking into).

Anyway, for the time being, I have decided to stick with our simple hand written zippers, and have stolen just this one idea: to use functions (lists of functions ie.) as breadcrumbs. Pi-forall being not that huge, fortunately, this approach is still feasible. I am glad, that the zipper implementation gets simpler this way (and I can imagine that it could be shortened even more, with Control.Zipper eg.).

Pi-forall's Parsec parser

While Pi-forall's parser is arguably its least interesting part (and barely mentioned in Stephanie's lectures) parsing is the starting point for any serious implementation (and hence refactoring) work.

Moreover, as the absy parser developed here is merely a rewrite of Pi-forall's Parsec parser in Trifecta, some examples, of how this Parsec parser can be used, may be worth looking at, both as a Parsec recap, and to get in first touch with some Pi-forall code.

Say we are given a small Pi-forall input file Sample.pi with some function declarations:

-- -*- haskell -*-
-- Sample.pi

module Fac where

import Nat

-- data Nat : Type where
--   Zero
--   Succ of (Nat)

two : Nat
two = Succ (Succ Zero)

-- adapted from nat_eq in Nat
-- cf also http://mazzo.li/posts/AgdaSort.html

nat_leq : Nat -> Nat -> Bool
nat_leq = \ x y .
  case x of
     Zero -> True
     Succ m -> case y of
            Zero -> False
            Succ n -> nat_leq m n

fac: Nat -> Nat
fac = \n . if nat_leq n 0 then 1 else mult n (fac (minus n 1))

foo: Nat -> Nat
foo = \n. fac (mult 2 n)

bar: Nat -> Nat
bar = \n. let fac = ((\x . plus x 2) : Nat -> Nat) in plus (foo n) (fac (mult 3 n))

Never mind too much the details of those declarations at this point, just note a few points:

The usual way of interacting with Pi-forall is calling its goFilename function in Main.hs, which parses, typechecks, and pretty-prints the given Pi-forall source file, in the case of Sample.pi above:

>>> import PiForall.Main
>>> goFilename "samples/Sample.pi"
processing Sample.pi...
Parsing File "samples/Nat.pi"
Parsing File "samples/Sample.pi"
type checking...
Checking module "Nat"
Checking module "Fac"
module Fac where
import Nat
two : Nat
two = 2
nat_leq : Nat -> Nat -> Bool
nat_leq = \x y .
            case x of
              Zero -> True
              (Succ m) ->
                case y of
                  Zero -> False
                  (Succ n) -> nat_leq m n
fac : Nat -> Nat
fac = \n . if nat_leq n 0 then 1 else mult n (fac (minus n 1))
foo : Nat -> Nat
foo = \n . fac (mult 2 n)
bar : Nat -> Nat
bar = \n . let fac = \x . plus x 2 in plus (foo n) (fac (mult 3 n))

As can be seen:

While goFilename is a convenient way to parse, typecheck and pretty-print our Pi-forall code all in one go, sometimes we want more fine grained control, want to see intermediate results, the abstract syntax tree ie..

The following examples are meant as hints, how to "look under the hood" and discover more details, rather than as a complete description of the system.

Tip

The output of some of the following examples is very long, and has thus been shortened — not just by hand, but on the ghci (cabal repl) command line as well, to use them as doctests nevertheless. Having imported some modules initially:

import PiForall.Modules
import PiForall.PrettyPrint
import Control.Monad.Except
import Data.Either.Combinators

we'd just write

runExceptT (getModules ["samples"] "Sample.pi")

eg., and see a list of modules being parsed, many lines of output ie., and we'd see the same list of modules, if we just return them after parsing:

runExceptT (getModules ["samples"] "Sample.pi") >>= return

We can get the result of the Either monad with fromRight' from the either package:

runExceptT (getModules ["samples"] "Sample.pi") >>= return . fromRight'

and we'd see the same output, if we show that list of modules and print the resulting String:

runExceptT (getModules ["samples"] "Sample.pi") >>= putStrLn . show . fromRight'

even though, technically, we are looking at the String representation of the modules here. Taking this one step further, we can shorten the output to the first 400 characters:

>>> runExceptT (getModules ["samples"] "Sample.pi") >>= putStrLn . ((++"...") . take 400) . show . fromRight'
Parsing File "samples/Nat.pi"
Parsing File "samples/Sample.pi"
[Module {moduleName = "Nat", moduleImports = [], moduleEntries = [Data "Nat" Empty [ConstructorDef "samples/Nat.pi" (line 12, column 3) "Zero" Empty,ConstructorDef "samples/Nat.pi" (line 13, column 3) "Succ" (Cons Runtime _ (Pos "samples/Nat.pi" (line 13, column 12) (TCon "Nat" [])) Empty)],Sig is_zero (Pos "samples/Nat.pi" (line 15, column 11) (Pi (<(_1,{TCon "Nat" []})> TyBool))),Def is_zero (Po...

Adjust the following examples to your needs in this manner.

Assuming the module imports above, ie. initially just:

import PiForall.Modules
import PiForall.PrettyPrint
import Control.Monad.Except
import Data.Either.Combinators

you will need more imports in the course of the following examples (import them now, or later when needed):

import PiForall.Syntax
import PiForall.Parser
-- import Unbound.LocallyNameless
import Unbound.Generics.LocallyNameless
import Control.Monad.State.Lazy
import Text.Parsec
import qualified Data.Set as S

Given that our Sample.pi file imports the Nat module, we cannot use the parsing functions from Parser.hs directly, but only those from Module.hs (namely getModules). Cheating at the source code of goFilename we can get the abstract syntax tree of our Sample.pi file by running getModules in the Control.Monad.Except monad, like so:

>>> runExceptT (getModules ["samples"] "Sample.pi") >>= putStrLn . ((++"...") . take 400) . show . fromRight'
Parsing File "samples/Nat.pi"
Parsing File "samples/Sample.pi"
[Module {moduleName = "Nat", moduleImports = [], moduleEntries = [Data "Nat" Empty [ConstructorDef "samples/Nat.pi" (line 12, column 3) "Zero" Empty,ConstructorDef "samples/Nat.pi" (line 13, column 3) "Succ" (Cons Runtime _ (Pos "samples/Nat.pi" (line 13, column 12) (TCon "Nat" [])) Empty)],Sig is_zero (Pos "samples/Nat.pi" (line 15, column 11) (Pi (<(_1,{TCon "Nat" []})> TyBool))),Def is_zero (Po...

The result is a list of modules, really: Nat.pi and Sample.pi, in the Either monad. Normally we'd be interested only in the last module parsed (Sample.pi ie. here, and again, only the beginning of the output is shown):

>>> runExceptT (getModules ["samples"] "Sample.pi") >>= putStrLn . ((++"...") . take 400) . show . last . fromRight'
Parsing File "samples/Nat.pi"
Parsing File "samples/Sample.pi"
Module {moduleName = "Fac", moduleImports = [ModuleImport "Nat"], moduleEntries = [Sig two (Pos "samples/Sample.pi" (line 13, column 7) (TCon "Nat" [])),Def two (Pos "samples/Sample.pi" (line 14, column 7) (DCon "Succ" [Arg Runtime (Paren (Pos "samples/Sample.pi" (line 14, column 13) (DCon "Succ" [Arg Runtime (DCon "Zero" [] (Annot Nothing))] (Annot Nothing))))] (Annot Nothing))),Sig nat_leq (Pos ...

Now we can use disp for pretty printing the result:

>>> runExceptT (getModules ["samples"] "Sample.pi") >>= return . disp . last . fromRight'
Parsing File "samples/Nat.pi"
Parsing File "samples/Sample.pi"
module Fac where
import Nat
two : Nat
two = 2
nat_leq : Nat -> Nat -> Bool
nat_leq = \x y .
            case x of
              Zero -> True
              (Succ m) ->
                case y of
                  Zero -> False
                  (Succ n) -> nat_leq m n
fac : Nat -> Nat
fac = \n . if nat_leq n 0 then 1 else mult n ((fac ((minus n 1))))
foo : Nat -> Nat
foo = \n . fac ((mult 2 n))
bar : Nat -> Nat
bar = \n .
        let fac = ((\x . plus x 2) : Nat -> Nat) in
        plus ((foo n)) ((fac ((mult 3 n))))

If we were looking at the contents of just a single self-contained file, without any imports of other modules ie., say at a file Simple.pi:

-- Simple.pi


-- fromSuccess  `liftM` parseFromFileEx (runInnerParser $ evalStateT (whiteSpace *> many decl) piInit) "samples/Simple.pi"
-- resp
-- parseFromFileEx (runInnerParser $ evalStateT (whiteSpace *> many decl) piInit) "samples/Simple.pi" >>= return . fromSuccess
-- parseFromFileEx (runInnerParser $ evalStateT (whiteSpace *> many decl) piInit) "samples/Simple.pi" >>= return . nopos . fromSuccess



module Fac where


-- caseExpr w/ prelude
-- case x of
--    Zero -> True
--    Succ m -> False


-- caseExpr w/ prelude
-- case y of
--     Zero -> False
--     Succ n -> nat_leq m n



-- caseExpr w/ prelude
-- case x of
--    Zero -> True
--    Succ m -> case y of
--           Zero -> False
--           Succ n -> nat_leq m n


-- lambdaPAs w/ prelude
-- \ x y .
--   case x of
--      Zero -> True
--      Succ m -> case y of
--             Zero -> False
--             Succ n -> nat_leq m n



data Nat : Type where
  Zero
  Succ of (Nat)


two : Nat
two = Succ (Succ Zero)


nat_leq : Nat -> Nat -> Bool
nat_leq = \ x y .
  case x of
     Zero -> True
     Succ m -> case y of
            Zero -> False
            Succ n -> nat_leq m n


fac: Nat -> Nat
fac = \n . if nat_leq n 0 then 1 else mult n (fac (minus n 1))

then we could use parseModuleFile from Parser.hs directly. (We need to provide emptyConstructorNames from Syntax.hs, so that the parser can collect information about data types and constructors found (Nat, Zero, Succ,…​) running in the State monad internally):

>>> runExceptT (parseModuleFile  emptyConstructorNames  "samples/Simple.pi") >>= return . disp . fromRight'
Parsing File "samples/Simple.pi"
module Fac where
data Nat : Type where
  Zero
  Succ of (_ : Nat)
two : Nat
two = 2
nat_leq : Nat -> Nat -> Bool
nat_leq = \x y .
            case x of
              Zero -> True
              (Succ m) ->
                case y of
                  Zero -> False
                  (Succ n) -> nat_leq m n
fac : Nat -> Nat
fac = \n . if nat_leq n 0 then 1 else mult n ((fac ((minus n 1))))

The situation is less complicated (does not require any file I/O) if we just want to parse a string as an expression:

>>> parseExpr   "   \\x  .  a  "
Right (Pos "<interactive>" (line 1, column 4) (Lam (<(x,{Annot Nothing})> Pos "<interactive>" (line 1, column 11) (Var a))))
>>> fromRight' $ parseExpr   "   \\x  .  a  "
Pos "<interactive>" (line 1, column 4) (Lam (<(x,{Annot Nothing})> Pos "<interactive>" (line 1, column 11) (Var a)))

Again, pretty printing is convenient for reading the result:

>>> disp $ fromRight' $ parseExpr   "   \\x  .  a  "
\x . a

Parser.hs exports only parseModuleFile and parseExpr, but we can exercise the individual building blocks of the parser of course, by loading Parser.hs directly (the import as above, assumed already): ie. we can parse expressions, declarations, signature definitions, variables etc. (shown is parsing an expression):

>>> runFreshM (evalStateT (runParserT (do { whiteSpace; v <- expr; eof; return v}) [] "<interactive>" " Succ n") emptyConstructorNames)
Right (Pos "<interactive>" (line 1, column 2) (App (Var Succ) (Var n)))

This time we have to "unpeel the onion" of our monad transformer stack ourselves: the aforementioned StateT keeps track of constructor names, and FreshM from unbound is responsible for names and variable bindings.

Note how, without any further knowledge of Succ, Succ n is just parsed as a function application. This may not be what we want: An admittedly ad-hoc prelude helps recognising Succ as a data constructor rather than as a regular function:

>>> let prelude = ConstructorNames (S.fromList ["Nat"]) (S.fromList ["Zero", "Succ"])
>>> runFreshM (evalStateT (runParserT (do { whiteSpace; v <- expr; eof; return v}) [] "<interactive>" " Succ n") prelude)
Right (Pos "<interactive>" (line 1, column 2) (DCon "Succ" [Arg Runtime (Var n)] (Annot Nothing)))

Backtracking in parsers and Pi-forall mimicry

This chapter describes some implementation details: how Pire's Trifacta parser behaved differently from Pi-forall's Parsec parser originally, and how it mimics Pi-forall / Parsec's exact behaviour now.

These details may not be essential for understanding Pire's approach to refactorings, but maybe they are interesting as a bug hunting experience (if one may call Pire / Trifecta's original behaviour a bug), and they should give some insight, what's happening under the hood.

We have solved the problem of parsing A->BA->B (and similar more complicated cases, cf. the examples below) in exactly the same manner as Pi-forall does.

To make a long story short: All of Pi-forall's test files parse fine now (comparing the result of untie $ parse …​ with Pi-forall's parser), as covered by the tests.

tl;dr

Strictly speaking: yes, maybe it's not that important, if the fresh variables used read as

_, _1, _2, _3, …​ - as always in Pire's Trifecta parser's case (so far)

or as in Pi-forall / Parsec's case:

depending on the level of parentheses / number of arrows etc.

Nevertheless, we feel much more comfortable now, that we have found a way to make our Trifecta parser behave exactly in Pi-forall's manner (Pi-forall mimicry), and understand what's going on.

We can thus continue to be very strict in our comparisons of results: parse & untie vs. Pi-forall's original parser, and would hope that this way, other possible subtle differences / future issues won't go unnoticed.

A few examples may be worth looking at (and yes, there is a system to this, it may just not be obvious in the first place, how it works :-) ):

> import PiForall.Parser as P
> import Pire.NoPos
> import Data.Either.Combinators (fromRight')

> nopos $ fromRight' $ P.parseExpr "A->B"
Pi (<(_,{Var A})> Var B)

> nopos $ fromRight' $ P.parseExpr "A->B->C"
Pi (<(_,{Var A})> Pi (<(_1,{Var B})> Var C))

> nopos $ fromRight' $ P.parseExpr "A->B->C->D"
Pi (<(_,{Var A})> Pi (<(_1,{Var B})> Pi (<(_2,{Var C})> Var D)))

> nopos $ fromRight' $ P.parseExpr "A->B->C->D->E"
Pi (<(_,{Var A})> Pi (<(_1,{Var B})> Pi (<(_2,{Var C})> Pi (<(_3,{Var D})> Var E))))

> nopos $ fromRight' $ P.parseExpr "(A->B)"
Paren (Pi (<(_,{Var A})> Var B))

> nopos $ fromRight' $ P.parseExpr "(A->B->C)"
Paren (Pi (<(_,{Var A})> Pi (<(_1,{Var B})> Var C)))

> nopos $ fromRight' $ P.parseExpr "(A->B->C->D)"
Paren (Pi (<(_,{Var A})> Pi (<(_1,{Var B})> Pi (<(_2,{Var C})> Var D))))

> nopos $ fromRight' $ P.parseExpr "(A->B->C->D->E)"
Paren (Pi (<(_,{Var A})> Pi (<(_1,{Var B})> Pi (<(_2,{Var C})> Pi (<(_3,{Var D})> Var E)))))

> nopos $ fromRight' $ P.parseExpr "((A->B))"
Paren (Paren (Pi (<(_2,{Var A})> Var B)))

> nopos $ fromRight' $ P.parseExpr "((A->B->C))"
Paren (Paren (Pi (<(_4,{Var A})> Pi (<(_5,{Var B})> Var C))))

> nopos $ fromRight' $ P.parseExpr "((A->B->C->D))"
Paren (Paren (Pi (<(_6,{Var A})> Pi (<(_7,{Var B})> Pi (<(_8,{Var C})> Var D)))))

> nopos $ fromRight' $ P.parseExpr "((A->B->C->D->E))"
Paren (Paren (Pi (<(_8,{Var A})> Pi (<(_9,{Var B})> Pi (<(_10,{Var C})> Pi (<(_11,{Var D})> Var E))))))

> nopos $ fromRight' $ P.parseExpr "(((A->B)))"
Paren (Paren (Paren (Pi (<(_8,{Var A})> Var B))))

> nopos $ fromRight' $ P.parseExpr "(((A->B->C)))"
Paren (Paren (Paren (Pi (<(_16,{Var A})> Pi (<(_17,{Var B})> Var C)))))

> nopos $ fromRight' $ P.parseExpr "(((A->B->C->D)))"
Paren (Paren (Paren (Pi (<(_24,{Var A})> Pi (<(_25,{Var B})> Pi (<(_26,{Var C})> Var D))))))

> nopos $ fromRight' $ P.parseExpr "(((A->B->C->D->E)))"
Paren (Paren (Paren (Pi (<(_32,{Var A})> Pi (<(_33,{Var B})> Pi
(<(_34,{Var C})> Pi (<(_35,{Var D})> Var E)))))))

> nopos $ fromRight' $ P.parseExpr "((((A->B))))"
Paren (Paren (Paren (Paren (Pi (<(_26,{Var A})> Var B)))))

> nopos $ fromRight' $ P.parseExpr "((((A->B->C))))"
Paren (Paren (Paren (Paren (Pi (<(_52,{Var A})> Pi (<(_53,{Var B})> Var C))))))

> nopos $ fromRight' $ P.parseExpr "((((A->B->C->D))))"
Paren (Paren (Paren (Paren (Pi (<(_78,{Var A})> Pi (<(_79,{Var B})> Pi (<(_80,{Var C})> Var D)))))))

etc.

We started out reducing Pire's parser and later Pi-forall's parser to the bare minimum of functions that would still show the above behaviour (or lack thereof in Pire's case initially), i.e. got rid of declarations, and lots more stuff.

Then we added some writing capabilities to Pire's parser: ran it in the RWS (Reader-Writer-State) monad, rather than just in the State monad, which allowed us sprinkle some tell me's in the code, and also we added some increment function calls here and there (that would increment the state's counter for fresh variables).

We expirmented with commenting out some of the code, eg. choices of the grammar as well.

That way we found that one particular choice function within the local function definiton of beforeBinder in expProdOrAnnotOrParens was at the culprit, originally:

beforeBinder = parens $
  choice [do e1 <- try (term >>= (\e1 -> colon >> return e1))
             e2 <- expr
             return $ Colon' e1 e2
         , do e1 <- try (term >>= (\e1 -> comma >> return e1))
              e2 <- expr
              return $ Comma' e1 e2
         , Nope <$> expr]

We could change Pire's parsing behaviour by adding some increments and tell me's (it just wouldn't behave like Pi-forall's parser still), like so:

choice [
  do
    tell "in colon branch\n"
    e1 <- try (term >>= (\e1 -> colon >> return e1))
    e2 <- expr
    return $ Colon' e1 e2
, do
   tell "in comma branch\n"
   e1 <- try (term >>= (\e1 -> comma >> return e1))
   e2 <- expr
   return $ Comma' e1 e2
, do
    tell "in comma branch\n"
    increment
   , Nope <$> expr
]

Likewise commenting out for example the middle comma choice above (and in Pi-forall's parser) had an impact in terms of fresh variables' choice.

We were still far from real Pi-forall mimicry, no matter what we tried: where we put our increment function calls etc.: we couldn't make Pire's parser behave in Pi-forall's way. But at least that got us on the right track for understanding what's going on:

Pi-forall's parser is defined as:

type LParser a = ParsecT
                    String                      -- The input is a sequence of Char
                    [Column] (                  -- The internal state for Layout tabs
                       StateT ConstructorNames
                       FreshM)                  -- The internal state for generating fresh names,
                    a                           -- the type of the object being parsed

Never mind the details, just notice that that the monad transformer stack is:

ParsecT(…​(StateT…​(FreshM))

with FreshM within ParsecT. It is possible to use Parsec the other way around (Parsec inside StateT)

StateT…​ Parsec

it's just not done that way in Pi-forall, and the two behave differently:

the StateT …​ Parsec does backtracking for the state in case a choice fails,

the ParsecT(…​(StateT…​(FreshM)) does not do (cannot do) any backtracking, cf. for example Roman Cheplyaka' blog: https://ro-che.info/articles/2012-01-02-composing-monads

The Trifecta-parser is written in this later style:

StateT …​(inner trifecta parser)

and does backtracking. We were happy with the parser so far, (this is the style of the ' parser), and if we wouldn't want to mimic Pi-forall's behaviour, our choice of fresh vars seemed more natural.

Trifecta parsers cannot be written in a monad transformer (TrifectaT) style (not that we are aware of: it is not in the documentation, we have never seen a Trifecta parser as a monad transformer wrapping a state, it's always the state wrapping the Trifecta parser).

A reduced to the bare minimum Pi-forall grammar shows what is going on:

expr ->
        -- pi-type, right-associativity with the help of buildExpressionParser table
        term "->" term

        | term

term ->
        -- App
        factor @: factor

        | factor

factor -> var

          -- annotation Ann
          | "(" term ":" expr ")"

          -- Prod
          | "(" term "," expr ")"

          -- Paren
          | "(" expr ")"

Now see, what happens when we parse A -> BA -> B as an expr: well, we must find a term, and in turn a factor then:

Oh, it starts with (, cannot just be a var then, but must be one of the other cases, so try them (in this order):

| "(" term ":" expr ")"
| "(" term "," expr ")"
| "(" expr ")"

so have a look what's inside the outer parentheses: (A->B), and try to parse that as a term: well that's possible: (A->B) is a term (therby using some fresh variables for the Π-type found), it is just not followed by a :, so put back.

in the Trifecta case: backtracking clears the state as if we had never used any fresh vars for the Π-type found

in Pi-forall's case: the fresh vars for the term / Π-type (A->B) are used forever.

try next possibility, similar: parsing (A->B) as a term succeeds (und renders fresh variables used in the case of Pi-forall), just not followed by ,

so finally go for the last case: ( expr ).

This also explains, why longer arrow chains A->B->C->DA->B->C->D make a difference: use _, _1, _2 in the first attempt of parsing the inner (A->B->C->D), _3, _4, _5 in the second attempt, and finally return _6, _7, _8

So how could we mimic Pi-forall's behaviour?

Well the solution was to rely on lookAhead:

choice [do { e1 <- try (term >>= (\e1 -> colon >> return e1)) ;
               e2 <- expr ;
               return $ Colon' e1 e2
             }
         , do { lookAhead term ;
                e1 <- try (term >>= (\e1 -> comma >> return e1)) ;
                e2 <- expr ;
                return $ Comma' e1 e2
              }
          , do { lookAhead term ;
                 lookAhead term ;
                 Nope <$> expr
               }
          ]

If we are in the second (comma) case, we must already have tried the first case (and succeeded with term): lookAhead doesn't consume the input, put triggers the state's counter increment (as if no backtracking had happened)

Likewise: if we are in the third case: we must have looked at the other two case already (and found a term each time): thus call lookAhead twice.

OK, we are very glad, we got this sorted out, and will continue with the finishing touches for a new Pire release, soon, and turn my attention to the refactorings.

Changes

complete rewrite as of December 2017, summary of major changes and some design decisions (ordered by topic), recent and older ones:

concrete syntax and white space aware parsing, naming conventions

testing

t2s is handy when working on the command line

position decoration…

not currently implemented…

pretty printing

simple exprs

described here are the steps how to build the (html, pdf) docs from their .adoc sources.

as mentioned already: rendered versions of these docs can be found on http://a-rx.info/pire

building the docs

building the code snippets

doctest snippets

building the html

building the font metric files

building the pdf

remarks


  1. (sub-projects being a feature of cabal 2.x, recognised by the cabal new-* commands, and organised in file cabal.project).

  2. We use the general term cabal repl to mean its newer form cabal new-repl in particular.