2019-07-29

Programming Algorithms: A Crash Course in Lisp

The introductory post for this book, unexpectedly, received quite a lot of attention, which is nice since it prompted some questions, and one of them I planned to address in this chapter.

I expect that there will be two main audiences, for this book:

  • people who’d like to advance in algorithms and writing efficient programs — the major group
  • lispers, either accomplished or aspiring who also happen to be interested in algorithms

This introductory chapter is, primarily, for the first group. After reading it, the rest of the book’s Lisp code should become understandable to you. Besides, you’ll know the basics to run Lisp and experiment with it if will you so desire.

For the lispers, I have one comment and one remark. You might be interested to read this part just to understand my approach of utilizing the language throughout the book. Also, you’ll find my stance regarding the question that was voiced several times in the comments: whether it’s justified to use some 3rd-party extensions and to what extent or should the author vigilantly stick to only the tools provided by the standard.

The Core of Lisp

To effortlessly understand Lisp, you’ll have to forget, for a moment, any concepts of how programming languages should work that you might have acquired from your prior experience in coding. Lisp is simpler; and when people bring their Java, C or Python approaches to programming with it, first of all, the results are suboptimal in terms of code quality (simplicity, clarity, and beauty), and, what’s more important, there’s much less satisfaction from the process, not to mention very few insights and little new knowledge gained.

It is much easier to explain Lisp if we begin from a blank slate. In essence, all there is to it is just an evaluation rule: Lisp programs consist of forms that are evaluated by the compiler. There are 3+2 ways how that can happen:

  • self-evaluation: all literal constants (like 1, "hello", etc.) are evaluated to themselves. These literal objects can be either built-in primitive types (1) or data structures ("hello")
  • symbol evaluation: separate symbols are evaluated as names of variables, functions, types or classes depending on the context. The default is variable evaluation, i.e. if we encounter a symbol foo the compiler will substitute in its place the current value associated with this variable (more on this a little bit later)
  • expression evaluation: compound expressions are formed by grouping symbols and literal objects with parenthesis. The following form (oper 1 foo) is considered a “functional” expression: the operator name is situated in the first position (head), and its arguments, if any, in the subsequent positions (rest). There are 3 ways to evaluate a functional expression:
    • there are 25 special operators that are defined in lower-level code and may be considered something like axioms of the language: they are pre-defined, always present, and immutable. Those are the building blocks, on top of which all else is constructed, and they include the sequential block operator, the conditional expression if, and the unconditional jump go, to name a few. If oper is the name of a special operator, the low-level code for this operator that deals with the arguments in its own unique way is executed
    • there’s also ordinary function evaluation: if oper is a function name, first, all the arguments are evaluated with the same evaluation rule, and then the function is called with the obtained values
    • finally, there’s macro evaluation. Macros provide a way to change the evaluation rule for a particular form. If oper names a macro, its code is substituted instead of our expression and then evaluated. Macros are a major topic in Lisp, and they are used to build a large part of the language, as well as provide an accessible way, for the users, to extend it. However, they are orthogonal to the subject of this book and won’t be discussed in further detail here. You can delve deeper into macros in such books as On Lisp or Let Over Lambda

It’s important to note that, in Lisp, there’s no distinction between statements and expressions, no special keywords, no operator precedence rules, and other similar arbitrary stuff you can stumble upon in other languages. Everything is uniform; everything is an expression in a sense that it will be evaluated and return some value.

A Code Example

To sum up, let’s consider an example of the evaluation of a Lisp form. The following one implements the famous binary search algorithm (that we’ll discuss in more detail in one of the following chapters):

(when (> (length vec) 0)
  (let ((beg 0)
        (end (length vec)))
    (do ()
        ((= beg end))
      (let ((mid (floor (+ beg end) 2)))
        (if (> (? vec mid) val)
            (:= beg (1+ mid))
            (:= end mid))))
    (values beg
            (? vec beg)
            (= (? vec beg) val))))

It is a compound form. In it, the so-called top-level form is when, which is a macro for a one-clause conditional expression: an if with only the true-branch. First, it evaluates the expression (> (length vec) 0), which is an ordinary function for a logical operator > applied to two args: the result of obtaining the length of the contents of the variable vec and a constant 0. If the evaluation returns true, i.e. the length of vec is greater than 0, the rest of the form is evaluated in the same manner. The result of the evaluation, if nothing exceptional happens, is either false (which is called nil, in Lisp) or 3 values returned from the last form (values ...). ? is the generic access operator, which abstracts over different ways to query data structures by key. In this case, it retrieves the item from vec at the index of the second argument. Below we’ll talk about other operators shown here.

But first I need to say a few words abut RUTILS. It is a 3rd-party library that provides a number of extensions to the standard Lisp syntax and its basic operators. The reason for its existence is that Lisp standard is not going to change ever, and, as eveything in this world, it has its flaws. Besides, our understanding of what’s elegant and efficient code evolves over time. The great advantage of the Lisp standard, however, which counteracts the issue of its immutability, is that its authors had put into it multiple ways to modify and evolve the language at almost all levels starting from even the basic syntax. And this addresses our ultimate need, after all: we’re not so interested in changing the standard as we’re in changing the language. So, RUTILS is one of the ways of evolving Lisp and its purpose is to make programming in it more accessible without compromising the principles of the language. So, in this book, I will use some basic extensions from RUTILS and will explain them as needed. Surely, using 3rd-party tools is the question of preference and taste and might not be approved by some of the Lisp old-times, but no worries, in your code, you’ll be able to easily swap them for your favorite alternatives.

The REPL

Lisp programs are supposed to be run not only in a one-off fashion of simple scripts, but also as live systems that operate over long periods of time experiencing change not only of their data but also code. This general way of interaction with a program is called Read-Eval-Print-Loop (REPL), which literally means that the Lisp compiler reads a form, evaluates it with the aforementioned rule, prints the results back to the user, and loops over.

REPL is the default way to interact with a Lisp program, and it is very similar to the Unix shell. When you run your Lisp (for example, by entering sbcl at the shell) you’ll drop into the REPL. We’ll preceede all REPL-based code interactions in the book with a REPL prompt (CL-USER> or similar). Here’s an example one:

CL-USER> (print "Hello world")
"Hello world" 
"Hello world"

A curious reader may be asking why "Hello world" is printed twice. It’s a proof that everything is an expression in Lisp. :) The print “statement”, unlike in most other languages, not only prints its argument to the console (or other output stream), but also returns it as is. This comes very handy when debugging, as you can wrap almost any form in a print not changing the flow of the program.

Obviously, if the interaction is not necessary, just the read-eval part may remain. But, what’s more important, Lisp provides a way to customize every stage of the process:

  • at the read stage special syntax (“syntax sugar”) may be introduced via a mechanism called reader macros
  • ordinary macros are a way to customize the eval stage
  • the print stage is conceptually the simplest one, and there’s also a standard way to customize object printing via the Common Lisp Object System’s (CLOS) print-object function
  • and the loop stage can be replaced by any desired program logic

Basic Expressions

The structural programming paradigm states that all programs can be expressed in terms of 3 basic constructs: sequential execution, branching, and looping. Let’s see how these operators are expressed in Lisp.

Sequential Execution

The simplest program flow is sequential execution. In all imperative languages, it is what is assumed to happen if you put several forms in a row and evaluate the resulting code block. Like this:

CL-USER> (print "hello") (+ 2 2)
"hello"
4

The value returned by the last expression is returned as the value of the whole sequence.

Here, the REPL-interaction forms an implicit unit of sequential code. However, there are many cases when we need to explicitly delimit such units. This can be done with the block operator:

CL-USER> (block test
           (print "hello")
           (+ 2 2))
"hello"
4

Such block has a name (in this example: test). This allows to prematurely end its execution by using an operator return-from:

CL-USER> (block test
           (return-from test 0)
           (print "hello")
           (+ 2 2))
0

A shorthand return is used to exit from blocks with a nil name (which are implicit in most of the looping constructs we’ll see further):

CL-USER> (block nil
           (return 0)
           (print "hello")
           (+ 2 2))
0

Finally, if we don’t even plan to ever prematurely return from a block, we can use the progn operator that doesn’t require a name:

CL-USER> (progn
           (print "hello")
           (+ 2 2))
"hello"
4

Branching

Conditional expressions calculate the value of their first form and, depending on it, execute one of several alternative code paths. The basic conditional expression is if:

CL-USER> (if nil
             (print "hello")
             (print "world"))
"world"
"world"

As we’ve seen, nil is used to represent logical falsity, in Lisp. All other values are considered logically true, including the symbol T or t which directly has the meaning of truth.

And when we need to do several things at once, in one of the conditional branches, it’s one of the cases when we need to use progn or block:

CL-USER> (if (+ 2 2)
             (progn
               (print "hello")
               4)
             (print "world"))
"hello"
4

However, often we don’t need both branches of the expressions, i.e. we don’t care what will happen if our condition doesn’t hold (or holds). This is such a common case that there are special expressions for it in Lisp — when and unless:

CL-USER> (when (+ 2 2)
           (print "hello")
           4)
"world"
4
CL-USER> (unless (+ 2 2)
           (print "hello")
           4)
NIL

As you see, it’s also handy because you don’t have to explicitly wrap the sequential forms in a progn.

One other standard conditional expression is cond, which is used when we want to evaluate several conditions in a row:

CL-USER> (cond
           ((typep 4 'string)
            (print "hello"))
           ((> 4 2)
            (print "world")
            nil)
           (t
            (print "can't get here")))
"world"
NIL

The t case is a catch-all that will trigger if none of the previous conditions worked (as its condition is always true). The above code is equivalent to the following:

(if (typep 4 'string)
    (print "hello")
    (if (> 4 2)
        (progn
          (print "world")
          nil)
        (print "can't get here")))

There are many more conditional expressions in Lisp, and it’s very easy to define your own with macros (it’s actually, how when, unless, and cond are defined), and when there arises a need to use a special one, we’ll discuss its implementation.

Looping

Like with branching, Lisp has a rich set of looping constructs, and it’s also easy to define new ones when necessary. This approach is different from the mainstream languages, that usually have a small number of such statements and, sometimes, provide an extension mechanism via polymorphism. And it’s even considered to be a virtue justified by the idea that it’s less confusing for the beginners. It makes sense to a degree. Still, in Lisp, both generic and custom approaches manage to coexist and complement each other. Yet, the tradition of defining custom control constructs is very strong. Why? One justification for this is the parallel to human languages: indeed, when and unless, as well as dotimes and loop are either directly words from the human language or are derived from natural language expressions. Our mother tongues are not so primitive and dry. The other reason is because you can™. I.e. it’s so much easier to define custom syntactic extensions in Lisp than in other languages that sometimes it’s just impossible to resist. :) And in many use cases they make the code much more simple and clear.

Anyway, for a complete beginner, actually, you have to know the same number of iteration constructs as in any other language. The simplest one is dotimes that iterates the counter variable a given number of times (from 0 to (- times 1)) and executes the body on each iteration. It is analogous to for (int i = 0; i < times; i++) loops found in C-like languages.

CL-USER> (dotimes (i 3)
           (print i))
0
1
2
NIL

The return value is nil by default, although it may be specified in the loop header.

The most versatile (and low-level) looping construct, on the other hand, is do:

CL-USER> (do ((i 0 (1+ i))
              (prompt (read-line) (read-line)))
             ((> i 1) i)
           (print (pair i prompt))
           (terpri))
foo

(0 "foo") 
bar

(1 "bar") 

2

do iterates a number of variables (zero or more) that are defined in the first part (here, i and prompt) until the termination condition in the second part is satisfied (here, (> i 1)), and as with dotimes (and other do-style macros) executes its body — rest of the forms (here, print and terpri, which is a shorthand for printing a newline). read-line reads from standard input until newline is encountered and 1+ returns the current value of i increased by 1.

All do-style macros (and there’s quite a number of them, both built-in and provided from external libraries: dolist, dotree, do-register-groups, dolines etc.) have an optional return value. In do it follows the termination condition, here — just return the final value of i.

Besides do-style iteration, there’s also a substantially different beast in CL ecosystem — the infamous loop macro. It is very versatile, although somewhat unlispy in terms of syntax and with a few surprising behaviors. But elaborating on it is beyond the scope of this book, especially since there’s an excellent introduction to loop in Peter Seibel’s "LOOP for Black Belts".

Many languages provide a generic looping construct that is able to iterate an arbitrary sequence, a generator and other similar-behaving things — usually, some variant of foreach. We’ll return to such constructs after speaking about sequences in more detail.

And there’s also an alternative iteration philosophy: the functional one, which is based on higher-order functions (map, reduce and similar) — we’ll cover it in more detail in the following chapters, also.

Procedures and Variables

We have covered the 3 pillars of structural programming, but one essential, in fact, the most essential, construct still remains — variables and procedures.

What if I told you that you can perform the same computation many times, but changing some parameters… OK, OK, pathetic joke. So, procedures are the simplest way to reuse computations, and procedures accept arguments, which allows to pass values into their bodies. A procedure, in Lisp, is called lambda. You can define one like this: (lambda (x y) (+ x y)). When used, such procedure — also often called a function, although it’s quite different from what we consider a mathematical function — and, in this case, it’s called an anonymous function as it doesn’t have any name — will produce the sum of its inputs:

CL-USER> ((lambda (x y) (+ x y)) 2 2)
4

It is quite cumbersome to refer to procedures by their full code signature, and an obvious solution is to assign names to them. A common way to do that in Lisp is via the defun macro:

CL-USER> (defun add2 (x y) (+ x y))
ADD2
CL-USER> (add2 2 2)
4

The arguments of a procedure are examples of variables. Variables are used to name memory cells whose contents are used more than once and may be changed in the process. They serve different purposes:

  • to pass data into procedures
  • as temporary placeholders for some varying data in code blocks (like loop counters)
  • as a way to store computation results for further reuse
  • to define program configuration parameters (like the OS environment variables, which can also be thought of as arguments to the main function of our program)
  • to refer to global objects that should be accessible from anywhere in the program (like *standard-output* stream)
  • and more

Can we live without variables? Theoretically, well, maybe. At least, there’s the so-called point-free style of programming that strongly discourages the use of variables. But, as they say, don’t try this at home (at least, until you know perfectly well what you’re doing :) Can we replace variables with constants, or single-assignment variables, i.e. variables that can’t change over time? Such approach is promoted by the so called purely functional languages. To a certain degree, yes. But, from the point of view of algorithms development, it makes life a lot harder by complicating many optimizations if not totally outruling them.

So, how to define variables in Lisp? You’ve already seen some of the variants: procedural arguments and let-bindings. Such variables are called local or lexical, in Lisp parlance. That’s because they are only accessible locally throughout the execution of the code block, in which they are defined. let is a general way to introduce such local variables, which is lambda in disguise (a thin layer of syntax sugar over it):

CL-USER> (let ((x 2))
           (+ x x))
4
CL-USER> ((lambda (x) (+ x x))
          2)
4

While with lambda you can create a procedure in one place, possibly, assign it to a variable (that’s what, in essence, defun does), and then apply many times in various places, with let you define a procedure and immediately call it, leaving no way to store it and re-apply again afterwards. That’s even more anonymous than an anonymous function! Also, it requires no overhead, from the compiler. But the mechanism is the same.

Creating variables via let is called binding, because they are immediately assigned (bound with) values. It is possible to bind several variables at once:

CL-USER> (let ((x 2)
               (y 2))
           (+ x y))
4

However, often we want to define a row of variables with next ones using the previous ones’ values. It is cumbersome to do with let, because you need nesting (as procedural arguments are assigned independently):

(let ((len (length list)))
  (let ((mid (floor len 2)))
    (let ((left-part (subseq list 0 mid))
          (right-part (subseq list mid)))
      ...)))

To simplify this use case, there’s let*:

(let* ((len (length list))
       (mid (floor len 2))
       (left-part (subseq list 0 mid))
       (right-part (subseq list mid)))
  ...)

However, there are many other ways to define variables: bind multiple values at once; perform the so called “destructuring” binding when the contents of a data structure (usually, a list) are assigned to several variables, first element to the first variable, second to the second, and so on; access the slots of a certain structure etc. For such use cases, there’s with binding from RUTILS, which works like let* with extra powers. Here’s a very simple example:

(with ((len (length list))
       (mid rem (floor len 2))
       ;; this group produces a list of 2 sublists
       ;; that are bound to left-part and right-part
       ;; and ; character starts a comment in lisp
       ((left-part right-part) (group mid list)))
 ...

In the code throughout this book, you’ll only see these two binding constructs: let for trivial and parallel bindings and with for all the rest.

As we said, variables may not only be defined, or they’d be called “constants”, instead, but also modified. To alter the variable’s value we’ll use := from RUTILS (it is an abbreviation of the standard psetf macro):

CL-USER> (let ((x 2))
           (print (+ x x))
           (:= x 4)
           (+ x x))
4
8

Modification, generally, is a dangerous construct as it can create unexpected action-at-a-distance effects, when changing the value of a variable in one place of the code effects the execution of a different part that uses the same variable. This, however, can’t happen with lexical variables: each let creates its own scope that shields the previous values from modification (just like passing arguments to a procedure call and modifying them within the call doesn’t alter those values, in the calling code):

CL-USER> (let ((x 2))
           (print (+ x x))
           (let ((x 4))
             (print (+ x x)))
           (print (+ x x)))
4
8
4

Obviously, when you have two lets in different places using the same variable name they don’t affect each other and these two variables are, actually, totally distinct.

Yet, sometimes it is useful to modify a variable in one place and see the effect in another. The variables, which have such behavior, are called global or dynamic (and also special, in Lisp jargon). They have several important purposes. One is defining important configuration parameters that need to be accessible anywhere. The other is referencing general-purpose singleton objects like the standard streams or the state of the random number generator. Yet another is pointing to some context that can be altered in certain places subject to the needs of a particular procedure (for instance, the *package* global variable determines in what package we operate — CL-USER in all previous examples). More advanced uses for global variables also exist. The common way to define a global variable is with defparameter, which specifies its initial value:

(defparameter *connection* nil
  "A default connection object.")  ; this is a docstring describing the variable

Global variables, in Lisp, usually have so-called “earmuffs” around their names to remind the user of what they are dealing with. Due to their action-at-a-distance feature, it is not the safest programming language feature, and even a “global variables considered harmful” mantra exists. Lisp is, however, not one of those squeamish languages, and it finds many uses for special variables. By the way, they are called “special” due to a special feature, which greatly broadens the possibilities for their sane usage: if bound in let they act as lexical variables, i.e. the previous value is preserved and restored upon leaving the body of a let:

CL-USER> (defparameter *temp* 1)
*TEMP*
CL-USER> (print *temp*)
1
CL-USER> (progn
           (let ((*temp* 2))
             (print *temp*)
             (:= *temp* 4)
             (print *temp*))
           *temp*)
2
4
1

Procedures in Lisp are first-class objects. This means the one you can assign to a variable, as well as inspect and redefine at run-time, and, consequently, do many other useful things with. The RUTILS function call[1] will call a procedure passed to it as an argument:

CL-USER> (call 'add2 2 2)
4
CL-USER> (let ((add2 (lambda (x y) (+ x y))))
           (call add2 2 2))
4

In fact, defining a function with defun also creates a global variable, although in the function namespace. Functions, types, classes — all of these objects are usually defined as global. Though, for functions there’s a way to define them locally with flet:

CL-USER> (foo 1)
;; ERROR: The function COMMON-LISP-USER::FOO is undefined.
CL-USER> (flet ((foo (x) (1+ x)))
           (foo 1))
2
CL-USER> (foo 1)
;; ERROR: The function COMMON-LISP-USER::FOO is undefined.

Comments

Finally, there’s one more syntax we need to know: how to put comments in the code. Only losers don’t comment their code, and comments will be used extensively, throughout this book, to explain some parts of the code examples, inside of them. Comments, in Lisp, start with a ; character and end at the end of a line. So, the following snippet is a comment: ; this is a comment. There’s also a common style of commenting, when short comments that follow the current line of code start with a single ;, longer comments for a certain code block precede it, occupy the whole line or a number of lines and start with ;;, comments for code section that include several Lisp top-level forms (global definitions) start with ;;; and also occupy whole lines. Besides, each global definition can have a special comment-like string, called the “docstring”, that is intended to describe its purpose and usage, and that can be queried programmatically. To put it all together, this is how different comments may look like:

;;; Some code section

(defun this ()
  "This has a curious docstring."
  ...)

(defun that ()
  ...
  ;; this is an interesting block don't you find?
  (block interesting
    (print "hello")))  ; it prints hello

Getting Started

I strongly encourage you to play around with the code presented in the following chapters of the book. Try to improve it, find issues with it, and come up with fixes, measure and trace everything. This will not only help you master some Lisp, but also understand much deeper the descriptions of the discussed algorithms and data structures, their pitfalls and corner cases. Doing that is, in fact, quite easy. All you need is install some Lisp (preferrably, SBCL or CCL), add Quicklisp, and, with its help, RUTILS.

As I said above, the usual way to work with Lisp is interacting with its REPL. Running the REPL is fairly straightforward. On my Mint Linux I’d run the following commands:

$ apt-get install sbcl rlwrap
...
$ rlwrap sbcl
...
* (print "hello world")

"hello world" 
"hello world"
* 

* is the Lisp raw prompt. It’s, basically, the same as CL-USER> prompt you’ll see in SLIME. You can also run a Lisp script file: sbcl --script hello.lisp. If it contains just a single (print "hello world") line we’ll see the “hello world” phrase printed to the console.

This is a working, but not the most convenient setup. A much more advanced environment is SLIME that works inside Emacs (a similar project for vim is called SLIMV). There exists a number of other solutions: some Lisp implementations provide and IDE, some IDEs and editors provide integration.

After getting into the REPL, you’ll have to issue the following commands:

* (ql:quickload :rutilsx)
* (use-package :rutilsx)
* (named-readtables:in-readtable rutilsx-readtable)

Well, that’s enough Lisp you’ll need to know, to start. We’ll get acquainted with other Lisp concepts as they will become needed for the next chapters of this book. Yet, you’re all set to read and write Lisp programs. They may seem unfamiliar, at first, but as you overcome the initial bump and get used to their paranthesised prefix surface syntax, I promise that you’ll be able to recognize and appreciate their clarity and conciseness.

So, as they say in Lisp land, happy hacking!


Footnotes:

[1] call is the RUTILS abbreviation of the standard funcall. It was surely fun to be able to call a function from a variable back in the 60’s, but now it has become so much more common that there’s no need for the prefix ;)

No comments: