2019-08-19

Programming Algorithms: Linked Lists

Linked data structures are in many ways the opposite of the contiguous ones that we have explored to some extent in the previous chapter using the example of arrays. In terms of complexity, they fail where those ones shine (first of all, at random access) — but prevail at scenarios when a repeated modification is necessary. In general, they are much more flexible and so allow the programmer to represent almost any kind of a data structure, although the ones that require such level of flexibility may not be too frequent. Usually, they are specialized trees or graphs.

The basic linked data structure is a singly-linked list.

Just like arrays, lists in Lisp may be created both with a literal syntax for constants and by calling a function — make-list — that creates a list of a certain size filled with nil elements. Besides, there's a handy list utility that is used to create lists with the specified content (the analog of vec).

CL-USER> '("hello" world 111)
("hello" WORLD 111)
CL-USER> (make-list 3)
(NIL NIL NIL)
CL-USER> (list "hello" 'world 111)
("hello" WORLD 111)

An empty list is represented as () and, interestingly, in Lisp, it is also a synonym of logical falsehood (nil). This property is used very often, and we'll have a chance to see that.

If we were to introduce our own lists, which may be quite a common scenario in case the built-in ones' capabilities do not suit us, we'd need to define the structure "node", and our list would be built as a chain of such nodes. We might have wanted to store the list head and, possibly, tail, as well as other properties like size. All in all, it would look like the following:

(defstruct list-cell
  data
  next)

(defstruct our-own-list
  (head nil :type (or list-cell nil))
  (tail nil :type (or list-cell nil)))

CL-USER> (let ((tail (make-list-cell :data "world")))
           (make-our-own-list
            :head (make-list-cell
                   :data "hello"
                   :next tail)
            :tail tail))
#S(OUR-OWN-LIST
   :HEAD #S(LIST-CELL
            :DATA "hello"
            :NEXT #S(LIST-CELL :DATA "world" :NEXT NIL))
   :TAIL #S(LIST-CELL :DATA "world" :NEXT NIL))

Lists as Sequences

Alongside arrays, list is the other basic data structure that implements the sequence abstract data type. Let's consider the complexity of basic sequence operations for linked lists:

  • so-called random access, i.e. access by index of a random element, requires O(n) time as we have to traverse all the preceding elements before we can reach the desired one (n/2 operations on average)
  • yet, once we have reached some element, deleting it or inserting something after it takes O(1)
  • subsequencing is also O(n)

Getting the list length, in the basic case, is also O(n) i.e. it requires full list traversal. It is possible, though, to store list length as a separate slot, tracking each change on the fly, which means O(1) complexity. Lisp, however, implements the simplest variant of lists without size tracking. This is an example of a small but important decision that real-world programming is full of. Why is such a solution the right thing™, in this case? Adding the size counter to each list would have certainly made this common length operation more effective, but the cost of doing that would've included: increase in occupied storage space for all lists, a need to update size in all list modification operations, and, possibly, a need for a more complex cons cell implementation[1]. These considerations make the situation with lists almost opposite to arrays, for which size tracking is quite reasonable because they change much less often and not tracking the length historically proved to be a terrible security decision. So, what side to choose? A default approach is to prefer the solution which doesn't completely rule out the alternative strategy. If we were to choose a simple cons-cell sans size (what the authors of Lisp did) we'll always be able to add the "smart" list data structure with the size field, on top of it. Yet, stripping the size field from built-in lists won't be possible. Similar reasoning is also applicable to other questions, such as: why aren't lists, in Lisp, doubly-linked. Also, it helps that there's no security implication as lists aren't used as data exchange buffers, for which the problem manifests itself.

For demonstration, let's add the size field to our-own-list (and, meanwhile, consider all the functions that will need to update it...):

(defstruct our-own-list
  (head nil :type (or list-cell nil))
  (tail nil :type (or list-cell nil))
  (size 0 :type (integer 0)))

Given that obtaining the length of a list, in Lisp, is an expensive operation, a common pattern in programs that require multiple requests of the length field is to store its value in some variable at the beginning of the algorithm and then use this cached value, updating it if necessary.

As we see, lists are quite inefficient in random access scenarios. However, many sequences don't require random access and can satisfy all the requirements of a particular use case using just the sequential one. That's one of the reasons why they are called sequences, after all. And if we consider the special case of list operations at index 0 they are, obviously, efficient: both access and addition/removal is O(1). Also, if the algorithm requires a sequential scan, list traversal is rather efficient too, although not as good as array traversal for it still requires jumping over the memory pointers. There are numerous sequence operations that are based on sequential scans. The most common is map, which we analyzed in the previous chapter. It is the functional programming alternative to looping, a more high-level operation, and thus simpler to understand for the common cases, although less versatile.

map is a generic function, in Lisp, which means that it may work with different types of sequences. It takes as the first argument the target sequence type (if nil is supplied it won't create the resulting sequence and so will be used just for side-effects). Here is a polymorphic example involving lists and vectors:

CL-USER> (map 'vector '+
              '(1 2 3 4 5)
              #(1 2 3))
#(2 4 6)

map applies the function provided as its second argument (here, addition) sequentially to every element of the sequences that are supplied as other arguments, until one of them ends, and records the result in the output sequence. map would have been even more intuitive, if it just had used the type of the first argument for the result sequence, i.e. be a "do what I mean" dwim-map, while a separate advanced variant with result-type selection might have been used in the background. Unfortunately, the current standard scheme is not for change, but we can define our own wrapper function:

(defun dwim-map (fn seq &rest seqs)
  "A thin wrapper over MAP that uses the first SEQ's type for the result."
  (apply 'map (type-of seq) fn seqs))

map in Lisp is, historically, used for lists. So there's also a number of list-specific map variants that predated the generic map, in the earlier versions of the language, and are still in wide use today. These include mapcar, mapc, and mapcan (replaced in RUTILS by a safer flat-map). Now, let's see a couple of examples of using mapping. Suppose that we'd like to extract odd numbers from a list of numbers. Using mapcar as a list-specific map we might try to call it with an anonymous function that tests its argument for oddity and keeps them in such case:

CL-USER> (mapcar (lambda (x) (when (oddp x) x))
                 (range 1 10))
(1 NIL 3 NIL 5 NIL 7 NIL 9)

However, the problem is that non-odd numbers still have their place reserved in the result list, although it is not filled by them. Keeping only the results that satisfy (or don't) certain criteria and discarding the others is a very common pattern that is known as "filtering". There's a set of Lisp functions for such scenarios: remove, remove-if, and remove-if-not, as well as RUTILS' complements to them keep-if and keep-if-not. We can achieve the desired result adding remove to the picture:

CL-USER> (remove nil (mapcar (lambda (x) (when (oddp x) x))
                             (range 1 10)))
(1 3 5 7 9)

A more elegant solution will use the remove-if(-not) or keep-if(-not) variants. remove-if-not is the most popular among these functions. It takes a predicate and a sequence and returns the sequence of the same type holding only the elements that satisfy the predicate:

CL-USER> (remove-if-not 'oddp (range 1 10))
(1 3 5 7 9)

Using such high-level mapping functions is very convenient, which is why there's a number of other -if(-not) opertaions, like find(-if(-not)), member(-if(-not)), position(-if(-not)), etc.

The implementation of mapcar or any other list mapping function, including your own task-specific variants, follows the same pattern of traversing the list accumulating the result into another list and reversing it, in the end:

(defun simple-mapcar (fn list)
  (let ((rez ()))
    (dolist (item list)
      (:= rez (cons (call fn item) rez)))
    (reverse rez)))

The function cons is used to add an item to the beginning of the list. It creates a new list head that points to the previous list as its tail.

From the complexity point of view, if we compare such iteration with looping over an array we'll see that it is also a linear traversal that requires twice as many operations as with arrays because we need to traverse the result fully once again, in the end, to reverse it. Its advantage, though, is higher versatility: if we don't know the size of the resulting sequence (for example, in the case of remove-if-not) we don't have to change anything in this scheme and just add a filter line ((when (oddp item) ...), while for arrays we'd either need to use a dynamic array (that will need constant resizing and so have at least the same double number of operations) or pre-allocate the full-sized result sequence and then downsize it to fit the actual accumulated number of elements, which may be problematic when we deal with large arrays.

Lists as Functional Data Structures

The distinction between arrays and linked lists in many ways reflects the distinction between the imperative and functional programming paradigms. Within the imperative or, in this context, procedural approach, the program is built out of low-level blocks (conditionals, loops, and sequentials) that allow for the most fine-tuned and efficient implementation, at the expense of abstraction level and modularization capabilities. It also heavily utilizes in-place modification and manual resource management to keep overhead at a minimum. An array is the most suitable data-structure for such a way of programming. Functional programming, on the contrary, strives to bring the abstraction level higher, which may come at a cost of sacrificing efficiency (only when necessary, and, ideally, only for non-critical parts). Functional programs are built by combining referentially transparent computational procedures (aka "pure functions") that operate on more advanced data structures (either persistent ones or having special access semantics, e.g. transactional) that are also more expensive to manage but provide additional benefits.

Singly-linked lists are a simple example of functional data structures. A functional or persistent data structure is the one that doesn't allow in-place modification. In other words, to alter the contents of the structure a fresh copy with the desired changes should be created. The flexibility of linked data structures makes them suitable for serving as functional ones. We have seen the cons operation that is one of the earliest examples of non-destructive, i.e. functional, modification. This action prepends an element to the head of a list, and as we're dealing with the singly-linked list the original doesn't have to be updated: a new cons cell is added in front of it with its next pointer referencing the original list that becomes the new tail. This way, we can preserve both the pointer to the original head and add a new head. Such an approach is the basis for most of the functional data structures: the functional trees, for example, add a new head and a new route from the head to the newly added element, adding new nodes along the way — according to the same principle.

It is interesting, though, that lists can be used in destructive and non-destructive fashion likewise. There are both low- and high-level functions in Lisp that perform list modification, and their existence is justified by the use cases in many algorithms. Purely functional lists render many of the efficient list algorithms useless. One of the high-level list modification function is nconc. It concatenates two lists together updating in the process the next pointer of the last cons cell of the first list:

CL-USER> (let ((l1 (list 1 2 3))
               (l2 (list 4 5 6)))
           (nconc l1 l2)  ; note no assignment to l1
           l1)            ; but it is still changed
(1 2 3 4 5 6)

There's a functional variant of this operation, append, and, in general, it is considered distasteful to use nconc for two reasons:

  • the risk of unwarranted modification
  • funny enough, the implementation of nconc, actually, isn't mandated to be more efficient than that of append

So, forget nconc, append all the lists!

Using append we'll need to modify the previous piece of code because otherwise the newly created list will be garbage-collected immediately:

CL-USER> (let ((l1 (list 1 2 3))
               (l2 (list 4 5 6)))
           (:= l1 (append l1 l2))
           l1)
(1 2 3 4 5 6)

The low-level list modification operations are rplaca and rplacd. They can be combined with list-specific accessors nth and nthcdr that provide indexed access to list elements and tails respectively. Here's, for example, how to add an element in the middle of a list:

CL-USER> (let ((l1 (list 1 2 3)))
           (rplacd (nthcdr 0 l1)
                   (cons 4 (nthcdr 1 l1)))
           l1)
(1 4 2 3)

Just to re-iterate, although functional list operations are the default choice, for efficient implementation of some algorithms, you'll need to resort to the ugly destructive ones.

Different Kinds of Lists

We have, thus far, seen the most basic linked list variant — a singly-linked one. It has a number of limitations: for instance, it's impossible to traverse it from the end to the beginning. Yet, there are many algorithms that require accessing the list from both sides or do other things with it that are inefficient or even impossible with the singly-linked one, hence other, more advanced, list variants exist.

But first, let's consider an interesting tweak to the regular singly-linked list — a circular list. It can be created from the normal one by making the last cons cell point to the first. It may seem like a problematic data structure to work with, but all the potential issues with infinite looping while traversing it are solved if we keep a pointer to any node and stop iteration when we encounter this node for the second time. What's the use for such structure? Well, not so many, but there's a prominent one: the ring buffer. A ring or circular buffer is a structure that can hold a predefined number of items and each item is added to the next slot of the current item. This way, when the buffer is completely filled it will wrap around to the first element, which will be overwritten at the next modification. By our buffer-filling algorithm, the element to be overwritten is the one that was written the earliest for the current item set. Using a circular linked list is one of the simplest ways to implement such a buffer. Another approach will be to use an array of a certain size moving the pointer to the next item by incrementing an index into the array. Obviously, when the index reaches array size it should be reset to zero.

A more advanced list variant is a doubly-linked one, in which all the elements have both the next and previous pointers. The following definition, using inheritance, extends our original list-cell with a pointer to the previous element. Thanks to the basic object-oriented capabilities of structs, it will work with the current definition of our-own-list as well, and allow it to function as a doubly-linked list.

(defstruct (list-cell2 (:include list-cell))
  prev)

Yet, we still haven't shown the implementation of the higher-level operations of adding and removing an element to/from our-own-list. Obviously, they will differ for singly- and doubly-linked lists, and that distinction will require us to differentiate the doubly-linked list types. That, in turn, will demand invocation of a rather heavy OO-machinery, which is beyond the subject of this book. Instead, for now, let's just examine the basic list addition function, for the doubly-linked list:

(defun our-cons2 (data list)
  (when (null list) (:= list (make-our-own-list)))
  (let ((new-head (make-list-cell2
                   :data data 
                   :next (when list @list.head))))
    (:= @list.head.prev new-head)
    (make-our-own-list
     :head new-head
     :tail @list.tail
     :size (1+ @list.size))))

The first thing to note is the use of the @ syntactic sugar, from RUTILS, that implements the mainstream dot notation for slot-value access (i.e. @list.head.prev refers to the prev field of the head field of the provided list structure of the assumed our-own-list type, which in a more classically Lispy, although cumbersome, variants may look like one of the following: (our-cons2-prev (our-own-list-head list)) or (slot-value (slot-value list 'head) 'prev)[2]).

More important here is that, unlike for the singly-linked list, this function requires an in-place modification of the head element of the original list: setting its prev pointer. Immediately making doubly-linked lists non-persistent.

Finally, the first line is the protection against trying to access the null list (that will result in a much-feared, especially in Java-land, null-pointer exception class of error).

At first sight, it may seem that doubly-linked lists are more useful than singly-linked ones. But they also have higher overhead so, in practice, they are used quite sporadically. We may see just a couple of use cases on the pages of this book. One of them is presented in the next part — a double-ended queue.

Besides doubly-linked, there are also association lists that serve as a variant of key-value data structures. At least 3 types may be found in Common Lisp code, and we'll briefly discuss them in the chapter on key-value structures. Finally, a skip list is a probabilistic data structure based on singly-linked lists, that allows for faster search, which we'll also discuss in a separate chapter on probabilistic structures. Other more esoteric list variants, such as self-organized list and XOR-list, may also be found in the literature — but very rarely, in practice.

FIFO & LIFO

The flexibility of lists allows them to serve as a common choice for implementing a number of popular abstract data structures.

Queue

A queue or FIFO has the following interface:

  • enqueue an item at the end
  • dequeue the first element: get it and remove it from the queue

It imposes a first-in-first-out (FIFO) ordering on the elements. A queue can be implemented directly with a singly-linked list like our-own-list. Obviously, it can also be built on top of a dynamic array but will require permanent expansion and contraction of the collection, which, as we already know, isn't the preferred scenario for their usage.

There are numerous uses for the queue structures for processing items in a certain order (some of which we'll see in further chapters of this book).

Stack

A stack or LIFO (last-in-first-out) is even simpler than a queue, and it is used even more widely. Its interface is:

  • push an item on top of the stack making it the first element
  • pop an item from the top: get it and remove it from the stack

A simple Lisp list can serve as a stack, and you can see such uses in almost every file with Lisp code. The most common pattern is result accumulation during iteration: using the stack interface, we can rewrite simple-mapcar in an even simpler way (which is idiomatic Lisp):

(defun simple-mapcar (fn list)
  (let ((rez ()))
    (dolist (item list)
      (push (call fn item) rez))
    (reverse rez)))

Stacks hold elements in reverse-chronological order and can thus be used to keep the history of changes to be able to undo them. This feature is used in procedure calling conventions by the compilers: there exists a separate segment of program memory called the Stack segment, and when a function call happens (beginning from the program's entry point called the main function in C) all of its arguments and local variables are put on this stack as well as the return address in the program code segment where the call was initiated. Such an approach allows for the existence of local variables that last only for the duration of the call and are referenced relative to the current stack head and not bound to some absolute position in memory like the global ones. After the procedure call returns, the stack is "unwound" and all the local data is forgotten returning the context to the same state in which it was before the call. Such stack-based history-keeping is a very common and useful pattern that may be utilized in userland code likewise.

Lisp itself also uses this trick to implement global variables with a capability to have context-dependent values through the extent of let blocks: each such variable also has a stack of values associated with it. This is one of the most underappreciated features of the Lisp language used quite often by experienced lispers. Here is a small example with a standard global variable (they are called special in Lisp parlance due to this special property) *standard-output* that stores a reference to the current output stream:

CL-USER> (print 1)
1
1
CL-USER> (let ((*standard-output* (make-broadcast-stream)))
           (print 1))
1

In the first call to print, we see both the printed value and the returned one, while in the second — only the return value of the print function, while it's output is sent, effectively, to /dev/null.

Stacks can be also used to implement queues. We'll need two of them to do that: one will be used for enqueuing the items and another — for dequeuing. Here's the implementation:

(defstruct queue
  head
  tail)

(defun enqueue (item queue)
  (push item @queue.head))

(defun dequeue (queue)
  ;; Here and in the next condition, we use the property that an empty list
  ;; is also logically false. This is discouraged by many Lisp style-guides,
  ;; but, in many cases, such code is not only more compact but also more clear.
  (unless @queue.tail
    (do ()
        ((null @queue.head))  ; this loop continues until head becomes empty
      (push (pop @queue.head) @queue.tail)))
      ;; By pushing all the items from the head to the tail we reverse
      ;; their order — this is the second reversing that cancels the reversing
      ;; performed when we push the items to the head and it restores the original order.
  (when @queue.tail
    (values (pop @queue.tail)
            t)))  ; this second value is used to indicate that the queue was not empty
 
CL-USER> (let ((q (make-queue)))
           (print q)
           (enqueue 1 q)
           (enqueue 2 q)
           (enqueue 3 q)
           (print q)
           (print q)
           (dequeue q)
           (print q)
           (enqueue 4 q)
           (print q)
           (dequeue q)
           (print q)
           (dequeue q)
           (print q)
           (dequeue q)
           (print q)
           (dequeue q))
#S(QUEUE :HEAD NIL :TAIL NIL) 
#S(QUEUE :HEAD (3 2 1) :TAIL NIL) 
#S(QUEUE :HEAD (3 2 1) :TAIL NIL) 
#S(QUEUE :HEAD NIL :TAIL (2 3)) 
#S(QUEUE :HEAD (4) :TAIL (2 3)) 
#S(QUEUE :HEAD (4) :TAIL (3)) 
#S(QUEUE :HEAD (4) :TAIL NIL) 
#S(QUEUE :HEAD NIL :TAIL NIL) 
NIL  ; no second value indicates that the queue is now empty

Such queue implementation still has O(1) operation times for enqueue/dequeue. Each element will experience exactly 4 operations: 2 pushs and 2 pops (for the head and tail).

Another stack-based structure is the stack with a minimum element, i.e. some structure that not only holds elements in LIFO order but also keeps track of the minimum among them. The challenge is that if we just add the min slot that holds the current minimum, when this minimum is popped out of the stack we'll need to examine all the remaining elements to find the new minimum. We can avoid this additional work by adding another stack — a stack of minimums. Now, each push and pop operation requires us to also check the head of this second stack and, in case the added/removed element is the minimum, push it to the stack of minimums or pop it from there, accordingly.

A well-known algorithm that illustrates stack usage is fully-parenthesized arithmetic expressions evaluation:

(defun arith-eval (expr)
  "EXPR is a list of symbols that may include:
   square brackets, arithmetic operations, and numbers."
  (let ((ops ())
        (vals ())
        (op nil))
    (dolist (item expr)
      (case item
        ([ ) ; do nothing
        ((+ - * /) (push item ops))
        (] (:= op (pop ops)
               val (pop vals))
           (case op
             (+ (:+ val (pop vals)))
             (- (:- val (pop vals)))
             (* (:* val (pop vals)))
             (/ (:/ val (pop vals))))
           (push val vals))
        (otherwise (push item vals))))
    (pop vals)))

CL-USER> (arith-eval '([ 1 + [ [ 2 + 3 ] * [ 4 * 5 ] ] ] ]))
101

Deque

A deque is a short name for a double-ended queue, which can be traversed in both orders: FIFO and LIFO. It has 4 operations: push-front and push-back (also called shift), pop-front and pop-back (unshift). This structure may be implemented with a doubly-linked list or likewise a simple queue with 2 stacks. The difference for the 2-stacks implementation is that now items may be pushed back and forth between head and tail depending on the direction we're popping from, which results in worst-case linear complexity of such operations: when there's constant alteration of front and back directions.

The use case for such structure is the algorithm that utilizes both direct and reverse ordering: a classic example being job-stealing algorithms, when the main worker is processing the queue from the front, while other workers, when idle, may steal the lowest priority items from the back (to minimize the chance of a conflict for the same job).

Stacks in Action: SAX Parsing

Custom XML parsing is a common task for those who deal with different datasets, as many of them come in XML form, for example, Wikipedia and other Wikidata resources. There are two main approaches to XML parsing:

  • DOM parsing reads the whole document and creates its tree representation in memory. This technique is handy for small documents, but, for huge ones, such as the dump of Wikipedia, it will quickly fill all available memory. Also, dealing with the deep tree structure, if you want to extract only some specific pieces from it, is not very convenient.
  • SAX parsing is an alternative variant that uses the stream approach. The parser reads the document and, upon completing the processing of a particular part, invokes the relevant callback: what to do when an open tag is read, when a closed one, and with the contents of the current element. These actions happen for each tag, and we can think of the whole process as traversing the document tree utilizing the so-called "visitor pattern": when visiting each node we have a chance to react after the beginning, in the middle, and in the end.

Once you get used to SAX parsing, due to its simplicity, it becomes a tool of choice for processing XML, as well as JSON and other formats that allow for a similar stream parsing approach. Often the simplest parsing pattern is enough: remember the tag we're looking at, and when it matches a set of interesting tags, process its contents. However, sometimes, we need to make decisions based on the broader context. For example, let's say, we have the text marked up into paragraphs, which are split into sentences, which are, in turn, tokenized. To process such a three-level structure, with SAX parsing, we could use the following outline (utilizing CXML library primitives):

(defclass text-sax (sax:sax-parser-mixin)
  ((parags :initform nil :accessor sax-parags)
   (parag :initform nil :accessor sax-parag)
   (sent :initform nil :accessor sax-sent)
   (tag :initform nil :accessor sax-tag)))

(defmethod sax:start-element ((sax text-sax)
                              namespace-uri local-name qname attrs)
  (declare (ignore namespace-uri qname attrs))
  (:= (sax-tag sax) (mkeyw local-name))

(defmethod sax:end-element ((sax text-sax)
                            namespace-uri local-name qname)
  (declare (ignore namespace-uri qname))
  (with-slots (tag parags sent) sax
    (case tag
      (:paragraph (push (reverse parag) parags)
                  (:= parag nil))
      (:sentence (push (reverse sent) parag)
                 (:= sent nil)))))

(defmethod sax:characters ((sax text-sax) text)
  (when (eql :token (sax-tag sax))
    (push text (sax-sent sax)))

(defmethod sax:end-document ((sax text-sax))
  (reverse (sax-parags sax)))

This code will return the accumulated structure of paragraphs from the sax:end-document method. And two stacks: the current sentence and the current paragraph are used to accumulate intermediate data while parsing. In a similar fashion, another stack of encountered tags might have been used to exactly track our position in the document tree if there were such necessity. Overall, the more you'll be using SAX parsing, the more you'll realize that stacks are enough to address 99% of the arising challenges.

Lists as Sets

Another very important abstract data structure is a Set. It is a collection that holds each element only once no matter how many times we add it there. This structure may be used in a variety of cases: when we need to track the items we have already seen and processed, when we want to calculate some relations between groups of elements,s and so forth.

Basically, its interface consists of set-theoretic operations:

  • add/remove an item
  • check whether an item is in the set
  • check whether a set is a subset of another set
  • union, intersection, difference, etc.

Sets have an interesting aspect that an efficient implementation of element-wise operations (add/remove/member) and set-wise (union/...) require the use of different concrete data-structures, so a choice should be made depending on the main use case. One way to implement sets is by using linked lists. Lisp has standard library support for this with the following functions:

  • adjoin to add an item to the list if it's not already there
  • member to check for item presence in the set
  • subsetp for subset relationship query
  • union, intersection, set-difference, and set-exclusive-or for set operations

This approach works well for small sets (up to tens of elements), but it is rather inefficient, in general. Adding an item to the set or checking for membership will require O(n) operations, while, in the hash-set (that we'll discuss in the chapter on key-value structures), these are O(1) operations. A naive implementation of union and other set-theoretic operations will require O(n^2) as we'll have to compare each element from one set with each one from the other. However, if our set lists are in sorted order set-theoretic operations can be implemented efficiently in just O(n) where n is the total number of elements in all sets, by performing a single linear scan over each set in parallel. Using a hash-set will also result in the same complexity.

Here is a simplified implementation of union for sets of numbers built on sorted lists:

(defun sorted-union (s1 s2)
  (let ((rez ()))
    (do ()
        ((and (null s1) (null s2)))
      (let ((i1 (first s1))
            (i2 (first s2)))
        (cond ((null i1) (dolist (i2 s2)
                           (push i2 rez))
                         (return))
              ((null i2) (dolist (i1 s1)
                           (push i1 rez))
                         (return))
              ((= i1 i2) (push i1 rez)
                         (:= s1 (rest s1)
                             s2 (rest s2)))
              ((< i1 i2) (push i1 rez)
                         (:= s1 (rest s1)))
              ;; just T may be used instead
              ;; of the following condition
              ((> i1 i2) (push i2 rez)
                         (:= s2 (rest s2))))))
    (reverse rez)))

CL-USER> (sorted-union '(1 2 3)
                       '(0 1 5 6))
(0 1 2 3 5 6)

This approach may be useful even for unsorted list-based sets as sorting is a merely O(n * log n) operation. Even better though, when the use case requires primarily set-theoretic operations on our sets and the number of changes/membership queries is comparatively low, the most efficient technique may be to keep the lists sorted at all times.

Merge Sort

Speaking about sorting, the algorithms we discussed for array sorting in the previous chapter do not work as efficient for lists for they are based on swap operations, which are O(n), in the list case. Thus, another approach is required, and there exist a number of efficient list sorting algorithms, the most prominent of which is Merge sort. It works by splitting the list into two equal parts until we get to trivial one-element lists and then merging the sorted lists into the bigger sorted ones. The merging procedure for sorted lists is efficient as we've seen in the previous example. A nice feature of such an approach is its stability, i.e. preservation of the original order of the equal elements, given the proper implementation of the merge procedure.

(defun merge-sort (list comp)
  (if (null (rest list))
      list
      (let ((half (floor (length list) 2)))
        (merge-lists (merge-sort (subseq seq 0 half) comp)
                     (merge-sort (subseq seq half) comp)
                     comp))))

(defun merge-lists (l1 l2 comp)
  (let ((rez ())
    (do ()
        ((and (null l1) (null l2)))
      (let ((i1 (first l1))
            (i2 (first l2)))
        (cond ((null i1) (dolist (i l2)
                           (push i rez))
                         (return))
              ((null i2) (dolist (i l1)
                           (push i rez))
                         (return))
              ((call comp i1 i2) (push i1 rez)
                                 (:= l1 (rest l1)))
              (t (push i2 rez)
                 (:= l2 (rest l2))))))
    (reverse rez)))

The same complexity analysis as for binary search applies to this algorithm. At each level of the recursion tree, we perform O(n) operations: each element is pushed into the resulting list once, reversed once, and there are at most 4 comparison operations: 3 null checks and 1 call of the comp function. We also need to perform one copy per element in the subseq operation and take the length of the list (although it can be memorized and passed down as the function call argument) on the recursive descent. This totals to not more than 10 operations per element, which is a constant. And the height of the tree is, as we already know, (log n 2). So, the total complexity is O(n * log n).

Let's now measure the real time needed for such sorting, and let's compare it to the time of prod-sort (with optimal array accessors) from the Arrays chapter:

CL-USER> (with ((lst (random-list 10000))
                (vec (make-array 10000 :initial-contents lst)))
           (print-sort-timings "Prod" 'prod-sort vec)
           (print-sort-timings "Merge " 'merge-sort lst))
= Prodsort of random vector =
Evaluation took:
  0.048 seconds of real time
= Prodsort of sorted vector =
Evaluation took:
  0.032 seconds of real time
= Prodsort of reverse sorted vector =
Evaluation took:
  0.044 seconds of real time
= Merge sort of random vector =
Evaluation took:
  0.007 seconds of real time
= Merge sort of sorted vector =
Evaluation took:
  0.007 seconds of real time
= Merge sort of reverse sorted vector =
Evaluation took:
  0.008 seconds of real time

Interestingly enough, Merge sort turned out to be around 5 times faster, although it seems that the number of operations required at each level of recursion is at least 2-3 times bigger than for quicksort. Why we got such result is left as an exercise to the reader: I'd start from profiling the function calls and looking where most of the time is wasted...

It should be apparent that the merge-lists procedure works in a similar way to set-theoretic operations on sorted lists that we've discussed in the previous part. It is, in fact, provided in the Lisp standard library. Using the standard merge, Merge sort may be written in a completely functional and also generic way to support any kind of sequences:

(defun merge-sort (seq comp)
  (if (or (null seq)  ; avoid expensive length calculation for lists
          (<= (length seq) 1))
      seq
      (let ((half (floor (length seq) 2)))
        (merge (type-of seq)
               (merge-sort (subseq seq 0 half) comp)
               (merge-sort (subseq seq half) comp)
               comp))))

There's still one substantial difference of Merge sort from the array sorting functions: it is not in-place. So it also requires the O(n * log n) additional space to hold the half sublists that are produced at each iteration. Sorting and merging them in-place is not possible. There are ways to somewhat reduce this extra space usage but not totally eliminate it.

Parallelization of Merge Sort

The extra-space drawback of Merge sort may, however, turn irrelevant if we consider the problem of parallelizing this procedure. The general idea of parallelized implementation of any algorithm is to split the work in a way that allows reducing the runtime proportional to the number of workers performing those jobs. In the ideal case, if we have m workers and are able to spread the work evenly the running time should be reduced by a factor of m. For the Merge sort, it will mean just O(n/m * log n). Such ideal reduction is not always achievable, though, because often there are bottlenecks in the algorithm that require all or some workers to wait for one of them to complete its job.

Here's a trivial parallel Merge sort implementation that uses the eager-future2 library, which adds high-level data parallelism capabilities based on the Lisp implementation's multithreading facilities:

(defun parallel-merge-sort (seq comp)
  (if (or (null seq) (<= (length seq) 1))
      seq
      (with ((half (floor (length seq) 2))
             (thread1 (eager-future2:pexec
                       (merge-sort (subseq seq 0 half) comp)))
             (thread2 (eager-future2:pexec
                       (merge-sort (subseq seq half) comp))))
        (merge (type-of seq)
               (eager-future2:yield thread1)
               (eager-future2:yield thread2)               
               comp))))

The eager-future2:pexec procedure submits each merge-sort to the thread pool that manages multiple CPU threads available in the system and continues program execution not waiting for it to return. While eager-future2:yield pauses execution until the thread performing the appropriate merge-sort returns.

When I ran our testing function with both serial and parallel merge sorts on my machine, with 4 CPUs, I got the following result:

CL-USER> (with ((lst1 (random-list 10000))
                (lst2 (copy-list lst1)))
           (print-sort-timings "Merge " 'merge-sort lst1)
           (print-sort-timings "Parallel Merge " 'parallel-merge-sort lst2))
= Merge sort of random vector =
Evaluation took:
  0.007 seconds of real time
  114.29% CPU
= Merge sort of sorted vector =
Evaluation took:
  0.006 seconds of real time
  116.67% CPU
= Merge sort of reverse sorted vector =
Evaluation took:
  0.007 seconds of real time
  114.29% CPU
= Parallel Merge sort of random vector =
Evaluation took:
  0.003 seconds of real time
  266.67% CPU
= Parallel Merge sort of sorted vector =
Evaluation took:
  0.003 seconds of real time
  266.67% CPU
= Parallel Merge sort of reverse sorted vector =
Evaluation took:
  0.005 seconds of real time
  220.00% CPU

A speedup of approximately 2x, which is also reflected by the rise in CPU utilization from around 100% (i.e. 1 CPU) to 250%. These are correct numbers as the merge procedure is still executed serially and remains the bottleneck. There are more sophisticated ways to achieve optimal m times speedup, in Merge sort parallelization, but we won't discuss them here due to their complexity.

Lists and Lisp

Historically, Lisp's name originated as an abbreviation of "List Processing", which points both to the significance that lists played in the language's early development and also to the fact that flexibility (a major feature of lists) was always a cornerstone of its design. Why are lists important to Lisp? Maybe, originally, it was connected with the availability and the good support of this data structure in the language itself. But, quickly, the focus shifted to the fact that, unlike other languages, Lisp code is input in the compiler not in a custom string-based format but in the form of nested lists that directly represent the syntax tree. Coupled with superior support for the list data structure, it opens numerous possibilities for programmatic processing of the code itself, which are manifest in the macro system, code walkers and generators, etc. So, "List Processing" turns out to be not about lists of data, but about lists of code, which perfectly describes the main distinctive feature of this language...


Footnotes:

[1] While, in the Lisp machines, cons cells even had special hardware support, and such change would have made it useless.

[2] Although, for structs, it is implementation-dependent if this will work. In the major implementations, it will.

2019-08-12

Programming Algorithms: Arrays

Arrays are, alongside structs, the most basic data structure and, at the same time, the default choice for implementing algorithms. A one-dimensional array that is also called a "vector" is a contiguous structure consisting of the elements of the same type. One of the ways to create such arrays, in Lisp, is this:

CL-USER> (make-array 3)
#(0 0 0)

The printed result is the literal array representation. It happens that the array is shown to hold 0's, but that's implementation-dependent. Additional specifics can be set during array initialization: for instance, the :element-type, :initial-element, and even full contents:

CL-USER> (make-array 3 :element-type 'list :initial-element nil)
#(NIL NIL NIL)
CL-USER> (make-array 3 :initial-contents '(1.0 2.0 3.0))
#(1.0 2.0 3.0)

If you read back such an array you'll get a new copy with the same contents:

CL-USER> #(1.0 2.0 3.0)
#(1.0 2.0 3.0)

It is worth noting that the element type restriction is, in fact, not a limitation the default type is T[1]. In this case, the array will just hold pointers to its elements that can be of arbitrary type. If we specify a more precise type, however, the compiler might be able to optimize storage and access by putting the elements in memory directly in the array space. This is, mainly, useful for numeric arrays, but it makes multiple orders of magnitude difference for them for several reasons, including the existence of vector CPU instructions that operate on such arrays.

The arrays we have created are mutable, i.e. we can change their contents, although we cannot resize them. The main operator to access array elements is aref. You will see it in those pieces of code, in this chapter, where we care about performance.

CL-USER> (let ((vec (make-array 3 :initial-contents '(1.0 2.0 3.0))))
           (print (aref vec 0))
           (print (? vec 1))
           (:= (aref vec 2) 4.0))
           (print (? vec 2))
           (aref vec 3))
1.0 
2.0 
4.0
; Evaluation aborted on #<SIMPLE-TYPE-ERROR expected-type: (MOD 3) datum: 3>

In Lisp, array access beyond its boundary, as expected, causes an error.

It is also possible to create constant arrays using the literal notation #(). These constants can, actually, be changed in some environments, but don't expect anything nice to come out of such abuse — and the compiler will warn you of that:

CL-USER> (let ((vec #(1.0 2.0 3.0)))
           (:= (aref vec 2) nil)
           (print vec))
; caught WARNING:
;   Destructive function (SETF AREF) called on constant data.
;   See also:
;     The ANSI Standard, Special Operator QUOTE
;     The ANSI Standard, Section 3.2.2.3
; 
; compilation unit finished
;   caught 1 WARNING condition

#(1.0 2.0 NIL)

RUTILS provides more options to easily create arrays with a shorthand notation:

CL-USER> #v(1 2 3)
#(1 2 3)
CL-USER> (vec 1 2 3)
#(1 2 3)

Although the results seem identical they aren't. The first version creates a mutable analog of #(1 2 3), and the second also makes it adjustable (we'll discuss adjustable or dynamic arrays next).

Arrays as Sequences

Vectors are one of the representatives of the abstract sequence container type that has the following basic interface:

  • inquire the length of a sequence — performed in Lisp using the function length
  • access the element by index — the RUTILS ? operator is the most generic variant while the native one for arrays is aref and a more general elt, for all built-in sequences (this also includes lists and, in some implementations, user-defined, so-called, extensible sequences)
  • get the subsequence — the standard provides the function subseq for this purpose

These methods have some specific that you should mind:

  • the length function, for arrays, works in O(1) time as length is tracked in the array structure. There is an alternative (more primitive) way to handle arrays, employed, primarily, in C when the length is not stored, and, instead, there's a special termination "symbol" that indicates the end of an array. For instance, C strings have a '\0' termination character, and arrays representing command-line arguments, in the Unix syscalls API for such functions as exec, are terminated with null-pointers. Such an approach is, first of all, not efficient from the algorithmic point of view as it requires O(n) time to query the array's length. But, what's even more important, it has proven to be a source of a number of catastrophic security vulnerabilities — the venerable "buffer overflow" family of errors
  • the subseq function creates a copy of the part of its argument, which is an expensive operation. This is the functional approach that is a proper default, but many of the algorithms don't involve subarray mutation, and, for them, a more efficient variant would be to use a shared-structure variant that doesn't make a copy but merely returns a pointer into the original array. Such option is provided, in the Lisp standard, via the so-called displaced arrays, but it is somewhat cumbersome to use, that's why a more straightforward version is present in RUTILS which is named slice
CL-USER> (with ((vec (vec 1 2 3))
                (part (slice vec 2)))
           (print part)
           (:= (? part 0) 4)
           (print part)
           vec)

#(3)
#(4)
#(1 2 4)

Beyond the basic operations, sequences in Lisp are the target of a number of higher-order functions, such as find, position, remove-if etc. We'll get back to discussing their use later in the book.

Dynamic Vectors

Let's examine arrays from the point of view of algorithmic complexity. General-purpose data structures are usually compared by their performance on several common operations and, also, space requirements. These common operations are: access, insertion, deletion, and, sometimes, search.

In the case of ordinary arrays, the space used is the minimum possible: almost no overhead is incurred except, perhaps, for some meta-information about array size. Array element access is performed by index in constant time because it's just an offset from the beginning that is the product of index by the size of a single element. Search for an element requires a linear scan of the whole array or, in the special case of a sorted array, it can be done in O(log n) using binary search.

Insertion (at the end of an array) and deletion with arrays is problematic, though. Basic arrays are static, i.e. they can't be expanded or shrunk at will. The case of expansion requires free space after the end of the array that isn't generally available (because it's already occupied by other data used by the program) so it means that the whole array needs to be relocated to another place in memory with sufficient space. Shrinking is possible, but it still requires relocation of the elements following the deleted one. Hence, both of these operations require O(n) time and may also cause memory fragmentation. This is a major drawback of arrays.

However, arrays definitely should be the default choice for most algorithms. Why? First of all, because of the other excellent properties arrays provide and also because, in many cases, lack of flexibility can be circumvented in a certain manner. One common example is iteration with accumulation of results in a sequence. This is often performed with the help of a stack (as a rule, implemented with a linked list), but, in many cases (especially, when the length of the result is known beforehand), arrays may be used to the same effect. Another approach is using dynamic arrays, which add array resizing capabilities. And only in the case when an algorithm requires contiguous manipulation (insertion and deletion) of a collection of items or other advanced flexibility, linked data structures are preferred.

So, the first approach to working around the static nature of arrays is possible when we know the target number of elements. For instance, the most common pattern of sequence processing is to map a function over it, which produces the new sequence of the same size filled with results of applying the function to each element of the original sequence. With arrays, it can be performed even more efficiently than with a list. We just need to pre-allocate the resulting vector and set its elements one by one as we process the input:

(defun map-vec (fn vec)
  "Map function FN over each element of VEC
   and return the new vector with the results."
  (let ((rez (make-array (length vec))))
    (dotimes (i (length vec))
      (:= (aref rez i) (call fn (aref vec i))))
    rez))

CL-USER> (map-vec '1+ #(1 2 3))
#(2 3 4)

We use a specific accessor aref here instead of generic ? to ensure efficient operation in the so-called "inner loop" — although, there's just one loop here, but it will be the inner loop of many complex algorithms.

However, in some cases we don't know the size of the result beforehand. For instance, another popular sequence processing function is called filter or remove-if(-not) in Lisp. It iterates over the sequence and keeps only elements that satisfy/don't satisfy a certain predicate. It is, generally, unknown how many elements will remain, so we can't predict the size of the resulting array. One solution will be to allocate the full-sized array and fill only so many cells as needed. It is a viable approach although suboptimal. Filling the result array can be performed by tracking the current index in it or, in Lisp, by using an array with a fill-pointer:

(defun clumsy-filter-vec (pred vec)
  "Return the vector with only those elements of VEC
   for which calling pred returns true."
  (let ((rez (make-array (length vec) :fill-pointer t)))
    (dotimes (i (length vec))
      (when (call pred (aref vec i))
        (vector-push (aref vec i) rez)))
    rez))

CL-USER> (describe (clumsy-filter-vec 'oddp #(1 2 3)))
#(1 3)
  [vector]
Element-type: T
Fill-pointer: 2
Size: 3
Adjustable: yes
Displaced-to: NIL
Displaced-offset: 0
Storage vector: #<(SIMPLE-VECTOR 3) {100E9AF30F}>

Another, more general way, would be to use a "dynamic vector". This is a kind of an array that supports insertion by automatically expanding its size (usually, not one element at a time but proportionally to the current size of the array). Here is how it works:

CL-USER> (let ((vec (make-array 0 :fill-pointer t :adjustable t)))
           (dotimes (i 10)
             (vector-push-extend i vec)
             (describe vec)))
#(0)
  [vector]
Element-type: T
Fill-pointer: 1
Size: 1
Adjustable: yes
Displaced-to: NIL
Displaced-offset: 0
Storage vector: #<(SIMPLE-VECTOR 1) {100ED9238F}>

#(0 1)
Fill-pointer: 2
Size: 3

#(0 1 2)
Fill-pointer: 3
Size: 3

#(0 1 2 3)
Element-type: T
Fill-pointer: 4
Size: 7

...

#(0 1 2 3 4 5 6 7)
Fill-pointer: 8
Size: 15

#(0 1 2 3 4 5 6 7 8)
Element-type: T
Fill-pointer: 9
Size: 15

#(0 1 2 3 4 5 6 7 8 9)
Element-type: T
Fill-pointer: 10
Size: 15

For such "smart" arrays the complexity of insertion of an element becomes asymptotically constant: resizing and moving elements happens less and less often the more elements are added. With a large number of elements, this comes at a cost of a lot of wasted space, though. At the same time, when the number of elements is small (below 20), it happens often enough, so that the performance is worse than for a linked list that requires a constant number of 2 operations for each insertion (or 1 if we don't care to preserve the order). So, dynamic vectors are the solution that can be used efficiently only when the number of elements is neither too big nor too small.

Why Are Arrays Indexed from 0

Although most programmers are used to it, not everyone understands clearly why the choice was made, in most programming languages, for 0-based array indexing. Indeed, there are several languages that prefer a 1-based variant (for instance, MATLAB and Lua). This is quite a deep and yet very practical issue that several notable computer scientists, including Dijkstra, have contributed to.

At first glance, it is "natural" to expect the first element of a sequence to be indexed with 1, second — with 2, etc. This means that if we have a subsequence from the first element to the tenth it will have the beginning index 1 and the ending — 10, i.e. be a closed interval also called a segment: [1, 10]. The cons of this approach are the following:

  1. It is more straightforward to work with half-open intervals (i.e. the ones that don't include the ending index): especially, it is much more convenient to split and merge such intervals, and, also, test for membership. With 0-based indexing, our example interval would be half-open: [0, 10).

  2. If we consider multi-dimensional arrays that are most often represented using one-dimensional ones, getting an element of a matrix with indices i and j translates to accessing the element of an underlying vector with an index i*w + j or i + j*h for 0-based arrays, while for 1-based ones, it's more cumbersome: (i-1)*w + j. And if we consider 3-dimensional arrays (tensors), we'll still get the obvious i*w*h + j*h + k formula, for 0-based arrays, and, maybe, (i-1)*w*h + (j-1)*h + k for 1-based ones, although I'm not, actually, sure if it's correct (which shows how such calculations quickly become untractable). Besides, multi-dimensional array operations that are much more complex than mere indexing also often occur in many practical tasks, and they are also more complex and thus error-prone with base 1.

There are other arguments, but I consider them to be much more minor and a matter of taste and convenience. However, the intervals and multi-dimensional arrays issues are quite serious. And here is a good place to quote one of my favorite anecdotes that there are two hard problems in CS: cache invalidation and naming things,.. and off-by-one errors. Arithmetic errors with indexing are a very nasty kind of bug, and although it can't be avoided altogether 0-based indexing turns out to be a much more balanced solution.

Now, using 0-based indexing, let's write down the formula for finding the middle element of an array. Usually, it is chosen to be (floor (length array) 2). This element will divide the array into two parts, left and right, each one having length at least (1- (floor (length array) 2): the left part will always have such size and will not include the middle element. The right side will start from the middle element and will have the same size if the total number of array elements is even or be one element larger if it is odd.

Multi-Dimensional Arrays

So far, we have only discussed one-dimensional arrays. However, more complex data-structures can be represented using simple arrays. The most obvious example of such structures is multi-dimensional arrays. There's a staggering variety of other structures that can be built on top of arrays, such as binary (or, in fact, any n-ary) trees, hash-tables, and graphs, to name a few. If we have a chance to implement the data structure on an array, usually, we should not hesitate to take it as it will result in constant access time, good cache locality contributing to faster processing and, in most cases, efficient space usage.

Multi-dimensional arrays are a contiguous data-structure that stores its elements so that, given the coordinates of an element in all dimensions, it can be retrieved according to a known formula. Such arrays are also called tensors, and in case of 2-dimensional arrays — matrices. We have already seen one matrix example in the discussion of complexity:

#2A((1 2 3)
    (4 5 6))

A matrix has rows (first dimension) and columns (second dimension). Accordingly, the elements of a matrix may be stored in the row-major or column-major order. In row-major, the elements are placed row after row — just like on this picture, i.e., the memory will contain the sequence: 1 2 3 4 5 6. In column-major order, they are stored by column (this approach is used in many "mathematical" languages, such as Fortran or MATLAB), so raw memory will look like this: 1 4 2 5 3 6. If row-major order is used the formula to access the element with coordinates i (row) and j (column) is: (+ (* i n) j) where n is the length of the matrix's row, i.e. its width. In the case of column-major order, it is: (+ i (* j m)) where m is the matrix's height. It is necessary to know, which storage style is used in a particular language as in numeric computing it is common to intermix libraries written in many languages — C, Fortran, and others — and, in the process, incompatible representations may clash.[2]

Such matrix representation is the most obvious one, but it's not exclusive. Many languages, including Java, use iliffe vectors to represent multi-dimensional arrays. These are vectors of vectors, i.e. each matrix row is stored in a separate 1-dimensional array, and the matrix is the vector of such vectors. Besides, more specific multi-dimensional arrays, such as sparse or diagonal matrices, may be represented using more efficient storage techniques at the expense of a possible loss in access speed. Higher-order tensors may also be implemented with the described approaches.

One classic example of operations on multi-dimensional arrays is matrix multiplication. The simple straightforward algorithm below has the complexity of O(n^3) where n is the matrix dimension. The condition for successful multiplication is equality of height of the first marix and width of the second one. The cubic complexity is due to 3 loops: by the outer dimensions of each matrix and by the inner identical dimension.

(defun m* (m1 m2)
  (let ((n (array-dimension m1 1))
        (n1 (array-dimension m1 0))
        (n2 (array-dimension m2 1))
        (rez (make-array (list n1 n2))))
    (assert (= n (array-dimension m2 1)))
    (dotimes (i n1)
      (dotimes (j n2)
        (let ((cur 0))
          (dotimes (k n)
            ;; :+ is the incrementing analog of :=
            (:+ cur (* (aref m1 i k)
                       (aref m2 k j))))
          (:= (aref rez i j) cur))))
    rez))

There are more efficient albeit much more complex versions using divide-and-conquer approach that can work in only O(n^2.37), but they have significant hidden constants and, that's why, are rarely used in practice, although if you're relying on an established library for matrix operations, such as the Fortran-based BLAS/ATLAS, you will find one of them under-the-hood.

Binary Search

Now, let's talk about some of the important and instructive array algorithms. The most prominent ones are searching and sorting.

A common sequence operation is searching for the element either to determine if it is present, to get its position or to retrieve the object that has a certain property (key-based search). The simple way to search for an element in Lisp is using the function find:

CL-USER> (let ((vec #v((pair :foo :bar) (pair :baz :quux))))
           (print (find (pair :foo :bar) vec))
           (print (find (pair :foo :bar) vec :test 'equal))
           (print (find (pair :bar :baz) vec :test 'equal))
           (print (find :foo vec :key 'lt)))
NIL
(:FOO :BAR) 
NIL
(:FOO :BAR) 

In the first case, the element was not found due to the wrong comparison predicate: the default eql will only consider to structures the same if it's the same object, and, in this case, there will be two separate pairs with the same content. So, the second search is successful as equal performs deep comparison. Then the element is not found as it is just not present. And, in the last case, we did the key-based search looking just at the lt element of all pairs in vec.

Such search is called sequential scan because it is performed in a sequential manner over all elements of the vector starting from the beginning (or end if we specify :from-end t) until either the element is found or we have examined all the elements. The complexity of such search is, obviously, O(n), i.e. we need to access each element of the collection (if the element is present we'll look, on average, at n/2 elements, and if not present — always at all n elements).

However, if we know that our sequence is sorted, we can perform the search much faster. The algorithm used for that is one of the most famous algorithms that every programmer has to know and use, from time to time — binary search. The more general idea behind it is called "divide and conquer": if there's some way, looking at one element, to determine the outcome of our global operation for more than just this element we can discard the part, for which we already know that the outcome is negative. In binary search, when we're looking at an arbitrary element of the sorted vector and compare it with the item we search for:

  • if the element is the same we have found it
  • if it's smaller all the previous elements are also smaller and thus uninteresting to us — we need to look only on the subsequent ones
  • if it's greater all the following elements are not interesting

Thus, each time we can examine the middle element and, after that, can discard half of the elements of the array without checking them. We can repeat such comparisons and halving until the resulting array contains just a single element.

Here's the straightforward binary search implementation using recursion:

(defun bin-search (val vec &optional (pos 0))
  (if (> (length vec) 1)
      (with ((mid (floor (+ beg end) 2))
             (cur (aref vec mid)))
        (cond ((< cur val) (bin-search val
                                       (slice vec (1+ mid))
                                       (+ pos mid 1)))
              ((> cur val) (bin-search val
                                       (slice vec 0 (1+ mid))
                                       pos))
              (t (+ pos mid))))
      (when (= (aref vec 0) val)
        pos)))

If the middle element differs from the one we're looking for it halves the vector until just one element remains. If the element is found its position (which is passed as an optional 3rd argument to the recursive function) is returned. Note that we assume that the array is sorted. Generally, there's no way to quickly check this property unless we examine all array elements (and thus lose all the benefits of binary search). That's why we don't assert the property in any way and just trust the programmer :)

An important observation is that such recursion is very similar to a loop that at each stage changes the boundaries we're looking in-between. Not every recursive function can be matched with a similar loop so easily (for instance, when there are multiple recursive calls in its body an additional memory data structure is needed), but when it is possible it usually makes sense to choose the loop variant. The pros of looping is the avoidance of both the function calls' overhead and the danger of hitting the recursion limit or the stack overflow associated with it. While the pros of recursion are simpler code and better debuggability that comes with the possibility to examine each iteration by tracing using the built-in tools.

Another thing to note is interesting counter-intuitive arithmetic of additional comparisons. In our naive approach, we had 3 cond clauses, i.e. up to 2 comparisons to make at each iteration. In total, we'll look at (log n 2) elements of our array, so we have no more than (/ (1- (log n 2)) n) chance to match the element with the = comparison before we get to inspect the final 1-element array. I.e. with the probability of (- 1 (/ (1- (log n 2)) n)) we'll have to make all the comparisons up to the final one. Even for such small n as 10 this probability is 0.77 and for 100 — 0.94. And this is an optimistic estimate for the case when the element searched for is actually present in the array, which may not always be so. Otherwise, we'll have to make all the comparisons. Effectively, these numbers prove the equality comparison meaningless and just a waste of computation, although from "normal" programmer intuition it might seem like a good idea to implement early exit in this situation...

Finally, there's also one famous non-obvious bug associated with the binary search that was still present in many production implementations, for many years past the algorithm's inception. It's also a good example of the dangers of forfeiting boundary conditions check that is the root of many severe problems plaguing our computer systems by opening them to various exploits. The problem, in our code, may manifest in systems that have limited integer arithmetic with potential overflow. In Lisp, if the result of summing two fixnums is greater than most-positive-fixnum (the maximum number that can be represented directly by the machine word) it will be automatically converted to bignums, which are a slower representation but with unlimited precision:

CL-USER> most-positive-fixnum
4611686018427387903
CL-USER> (type-of most-positive-fixnum)
(INTEGER 0 4611686018427387903)
CL-USER> (+ most-positive-fixnum most-positive-fixnum)
9223372036854775806
CL-USER> (type-of (+ most-positive-fixnum most-positive-fixnum))
(INTEGER 4611686018427387904)

In many other languages, such as C or Java, what will happen is either silent overflow (the worst), in which case we'll get just the remainder of division of the result by the maximum integer, or an overflow error. Both of these situations are not accounted for in the (floor (+ beg end) 2) line. The simple fix to this problem, which makes sense to keep in mind for future similar situations, is to change the computation to the following equivalent form: (+ beg (floor (- end beg) 2)). It will never overflow. Why? Try to figure out on your own ;)

Taking all that into account and allowing for a custom comparator function, here's an "optimized" version of binary search that returns 3 values:

  • the final element of the array
  • its position
  • has it, actually, matched the element we were searching for?
(defun bin-search (val vec &key (less '<) (test '=) (key 'identity))
  (when (plusp (length vec))
    (let ((beg 0)
          (end (length vec)))
      (do ()
          ((= beg end))
        (let ((mid (+ beg (floor (- end beg) 2))))
          (if (call less (call key (aref vec mid)) val)
              (:= beg (1+ mid))
              (:= end mid))))
      (values (aref vec beg)
              beg
              (call test (call key (aref vec beg)) val)))))

How many loop iterations do we need to complete the search? If we were to take the final one-element array and expand the array from it by adding the discarded half it would double in size at each step, i.e. we'll be raising 2 to the power of the number of expansion iterations (initially, before expansion — after 0 iterations — we have 1 element, which is 2^0, after 1 iteration, we have 2 elements, after 2 — 4, and so on). The number of iterations needed to expand the full array may be calculated by the inverse of exponentiation — the logarithmic function. I.e. we'll need (log n 2) iterations (where n is the initial array size). Shrinking the array takes the same as expanding, just in the opposite order, so the complexity of binary search is O(log n).

How big is the speedup from linear to logarithmic complexity? Let's do a quick-and-dirty speed comparison between the built-in (and optimized) sequential scan fucntion find and our bin-search:

CL-USER> (with ((size 100000000)
                (mid (1+ (/ size 2)))
                (vec (make-array size)))
           (dotimes (i size)
             (:= (? vec i) i))
           (time (find mid vec))
           (time (bin-search mid vec)))
Evaluation took:
  0.591 seconds of real time
  0.595787 seconds of total run time (0.595787 user, 0.000000 system)
  100.85% CPU
  ...
Evaluation took:
  0.000 seconds of real time
  0.000000 seconds of total run time (0.000000 user, 0.000000 system)
  100.00% CPU
  ...

Unfortunately, I don't have enough RAM on my notebook to make bin-search take at least a millisecond of CPU time. We can count nanoseconds to get the exact difference, but a good number to remember is that (log 1000000 2) is approximately 20, so, for the million elements array, the speedup will be 50000x!

The crucial limitation of binary search is that it requires our sequence to be pre-sorted because sorting before each search already requires at least linear time to complete, which kills any performance benefit we might have expected. There are multiple situations when the pre-sort condition may hold without our intervention:

  • all the data is known beforehand and we can sort it just once prior to running the search, which may be repeated multiple times for different values
  • we maintain the sorted order as we add data. Such an approach is feasible only if addition is performed less frequently than search. This is often the case with databases, which store their indices in sorted order

A final note on binary search: obviously, it will only work fast for vectors and not linked sequences.

Binary Search in Action

In one consumer internet company I was working for, a lot of text processing (which was the company's bread-and-butter) relied on access to a huge statistical dataset called "ngrams". Ngrams is a simple Natural Language Processing concept: basically, they are phrases of a certain length. A unigram (1gram) is a single word, a bigram — a pair of words, a fivegram — a list of 5 words. Each ngram has some weight associated with it, which is calculated (estimated) from the huge corpus of texts (we used the crawl of the whole Internet). There are numerous ways to estimate this weight, but the basic one is to just count the frequency of the occurance of a specific ngram phrase in the corpus.

The total number of ngrams may be huge: for our case, the whole dataset, on disk, measured in tens of gigabytes. And the application requires constant random access to it. Using an off-the-shelf database would have incurred us too much overhead as such systems are general-purpose and don't optimize for the particular use cases, like the one we had. So, a special-purpose solution was needed. In fact, now there is readily-available ngrams handling software, such as KenLM. We have built our own, and, initially, it relied on binary search of the in-memory dataset to answer the queries. Considering the size of the data, what do you think was the number of operations required? I don't remember it exactly, but somewhere between 25 and 30. For handling tens of gigabytes or hundreds of millions/billions of ngrams — quite a decent result. And, most important, it didn't exceed our application's latency limits! The key property that enabled such solution was the fact that all the ngrams were known beforehand and hence the dataset could be pre-sorted. Yet, eventually, we moved to an even faster solution based on perfect hash-tables (that we'll discuss later in this book).

One more interesting property of this program was that it took significant time to initialize as all the data had to be loaded into memory from disk. During that time, which measured in several dozens of minutes, the application was not available, which created a serious bottleneck in the whole system and complicated updates as well as put normal operation at additional risk. The solution we utilized to counteract this was also a common one for such cases: lazy loading in memory using the Unix mmap facility.

Sorting

Sorting is another fundamental sequence operation that has many applications. Unlike searching, the sorted sequence, there is no single optimal algorithm for sorting, and different data structures allow different approaches to it. In general, the problem of sorting a sequence is to place all of its elements in a certain order determined by the comparison predicate. There are several aspects that differentiate sorting functions:

  • in-place sorting is a destructive operation, but it is often desired because it may be faster and also it preserves space (especially relevant when sorting big amounts of data at once). The alternative is copying sort
  • stable: whether 2 elements, which are considered the same by the predicate, retain their original order or may be shuffled
  • online: does the function require to see the whole sequence before starting the sorting process or can it work with each element one-by-one always preserving the result of processing the already seen part of the sequence in the sorted order

One more aspect of a particular sorting algorithm is its behavior on several special kinds of input data: already sorted (in direct and reversed order), almost sorted, completely random. An ideal algorithm should show better than average performance (up to O(1)) on the sorted and almost sorted special cases.

Over the history of CS, sorting was and still remains a popular research topic. Not surprisingly, several dozens of different sorting algorithms were developed. But before discussing the prominent ones, let's talk about "Stupid sort" (or "Bogosort"). It is one of the sorting algorithms that has a very simple idea behind, but an outstandingly nasty performance. The idea is that among all permutations of the input sequence there definitely is the completely sorted one. If we were to take it, we don't need to do anything else. It's an example of the so-called "generate and test" paradigm that may be employed when we know next to nothing about the nature of our task: then, put some input into the black box and see the outcome. In the case of bogosort, the number of possible inputs is the number of all permutations that's equal to n!, so considering that we need to also examine each permutation's order the algorithm's complexity is O(n * n!) — quite a bad number, especially, since some specialized sorting algorithms can work as fast as O(n) (for instance, Bucket sort for integer numbers). On the other hand, if generating all permutations is a library function and we don't care about complexity such an algorithm will have a rather simple implementation that looks quite innocent. So you should always inquire about the performance characteristics of 3rd-party functions. And, by the way, your standard library sort function is also a good example of this rule.

(defun bogosort (vec comp)
  (dolist (variant (all-permutations vec))
    (dotimes (i (1- (length variant)))
                ;; this is the 3rd optional argument of dotimes header
                ;; that is evaluated only after the loop finishes normally
                ;; if it does we have found a completely sorted permutation!
                (return-from bogosort variant))
      (when (call comp (? variant (1+ i)) (? variant i))
        (return)))))  ; current variant is not sorted, skip it

O(n^2) Sorting

Although we can imagine an algorithm with even worse complexity factors than this, bogosort gives us a good lower bound on the sorting algorithm's performance and an idea of the potential complexity of this task. However, there are much faster approaches that don't have a particularly complex implementation. There is a number of such simple algorithms that work in quadratic time. A very well-known one, which is considered by many a kind of "Hello world" algorithm, is Bubble sort. Yet, in my opinion, it's quite a bad example to teach (sadly, often it is taught) because it's both not very straightforward and has poor performance characteristics. That's why it's never used in practice. There are two other simple quadratic sorting algorithms that you actually have a chance to encounter in the wild, especially, Insertion sort that is used rather frequently. Their comparison is also quite insightful, so we'll take a look at both, instead of focusing just on the former.

Selection sort is an in-place sorting algorithm that moves left-to-right from the beginning of the vector one element at a time and builds the sorted prefix to the left of the current element. This is done by finding the "largest" (according to the comparator predicate) element in the right part and swapping it with the current element.

(defun selection-sort (vec comp)
  (dotimes (i (1- (length vec)))
    (let ((best (aref vec i))
          (idx i))
      (dotimes (j (- (length vec) i 1))
        (when (call comp (aref vec (+ i j 1)) best)
          (:= best (aref vec (+ i j 1))
              idx (+ i j 1)))))
    (rotatef (aref vec i) (aref vec idx)))  ; this is the lisp's swap operator
  vec)

Selection sort requires a constant number of operations regardless of the level of sortedness of the original sequence: (/ (* n (- n 1)) 2) — the sum of the arithmetic progression from 1 to n, because, at each step, it needs to fully examine the remainder of the elements to find the maximum, and the remainder's size varies from n to 1. It handles equally well both contiguous and linked sequences.

Insertion sort is another quadratic-time in-place sorting algorithm that builds the sorted prefix of the sequence. However, it has a few key differences from Selection sort: instead of looking for the global maximum in the right-hand side it looks for a proper place of the current element in the left-hand side. As this part is always sorted it takes linear time to find the place for the new element and insert it there leaving the side in sorted order. Such change has great implications:

  • it is stable
  • it is online: the left part is already sorted, and, in contrast with selection sort, it doesn't have to find the maximum element of the whole sequence in the first step, it can handle encountering it at any step
  • for sorted sequences it works in the fastest possible way — in linear time — as all elements are already inserted into proper places and don't need moving. The same applies to almost sorted sequences, for which it works in almost linear time. However, for reverse sorted sequences, its performance will be the worse. In fact, there is a clear proportion of the algorithm's complexity to the average offset of the elements from their proper positions in the sorted sequence: O(k * n), where k is the average offset of the element. For sorted sequences k=0 and for reverse sorted it's (/ (- n 1) 2).
(defun insertion-sort (vec comp)
  (dotimes (i (1- (length vec)))
    (do ((j i (1- j)))
        ((minusp j))
      (if (call comp (aref vec (1+ j)) (aref vec j))
          (rotatef (aref vec (1+ j)) (aref vec j))
          (return))))
  vec)

As you see, the implementation is very simple: we look at each element starting from the second, compare it to the previous element, and if it's better we swap them and continue the comparison with the previous element until we reach the array's beginning.

So, where's the catch? Is there anything that makes Selection sort better than Insertion? Well, if we closely examine the number of operations required by each algorithm we'll see that Selection sort needs exactly (/ (* n (- n 1)) 2) comparisons and on average n/2 swaps. For Insertion sort, the number of comparisons varies from n-1 to (/ (* n (- n 1)) 2), so, in the average case, it will be (/ (* n (- n 1)) 4), i.e. half as many as for the other algorithm. In the sorted case, each element is already in its position, and it will take just 1 comparison to discover that, in the reverse sorted case, the average distance of an element from its position is (/ (- n 1) 2), and for the middle variant, it's in the middle, i.e. (/ (- n 1) 4). Times the number of elements (n). But, as we can see from the implementation, Insertion sort requires almost the same number of swaps as comparisons, i.e. (/ (* (- n 1) (- n 2)) 4) in the average case, and it matches the number of swaps of Selection sort only in the close to best case, when each element is on average 1/2 steps away from its proper position. If we sum up all comparisons and swaps for the average case, we'll get the following numbers:

  • Selection sort: (+ (/ (* n (- n 1)) 2) (/ n 2)) = (/ (+ (* n n) n) 2)
  • Insertion sort: (+ (/ (* n (- n 1)) 2) (+ (/ (* (- n 1) (- n 2)) 4) = (/ (+ (* 1.5 n n) (* -2.5 n) 1) 2)

The second number is slightly higher than the first. For small ns it is almost negligible: for instance, when n=10, we get 55 operations for Selection sort and 63 for Insertion. But, asymptotically (for huge ns like millions and billions), Insertion sort will need 1.5 times more operations. Also, it is often the case that swaps are more expensive operations than comparisons (although, the opposite is also possible).

In practice, Insertion sort ends up being used more often, for, in general, quadratic sorts are only used when the input array is small (and so the difference in the number of operations) doesn't matter, while it has other good properties we mentioned. However, one situation when Selection sort's predictable performance is an important factor is in the systems with deadlines.

Quicksort

There is a number of other O(n^2) sorting algorithms similar to Selection and Insertion sorts, but studying them quickly turns boring so we won't. As there's also a number of significantly faster algorithms that work in O(n * log n) time (almost linear). They usually rely on the divide-and-conquer approach when the whole sequence is recursively divided into smaller subsequences that have some property, thanks to which it's easier to sort them, and then these subsequences are combined back into the final sorted sequence. The feasibility of such performance characteristics is justified by the observation that ordering relations are recursive, i.e. if we have compared two elements of an array and then compare one of them to the third element, with a probability of 1/2 we'll also know how it relates to the other element.

Probably, the most famous of such algorithms is Quicksort. Its idea is, at each iteration, to select some element of the array as the "pivot" point and divide the array into two parts: all the elements that are smaller and all those that are larger than the pivot; then recursively sort each subarray. As all left elements are below the pivot and all right — above when we manage to sort the left and right sides the whole array will be sorted. This invariant holds for all iterations and for all subarrays. The word "invariant", literally, means some property that doesn't change over the course of the algorithm's execution when other factors, e.g. bounds of the array we're processing, are changing.

There're several tricks in Quicksort implementation. The first one has to do with pivot selection. The simplest approach is to always use the last element as the pivot. Now, how do we put all the elements greater than the pivot after it if it's already the last element? Let's say that all elements are greater — then the pivot will be at index 0. Now, if moving left to right over the array we encounter an element that is not greater than the pivot we should put it before, i.e. the pivot's index should increment by 1. When we reach the end of the array we know the correct position of the pivot, and in the process, we can swap all the elements that should precede it in front of this position. Now, we have to put the element that is currently occupying the pivot's place somewhere. Where? Anywhere after the pivot, but the most obvious thing is to swap it with the pivot.

(defun quicksort (vec comp)
  (when (> (length vec) 1)
    (with ((pivot-i 0)
           (pivot (aref vec (1- (length vec)))))
      (dotimes (i (1- (length vec)))
        (when (call comp (aref vec i) pivot)
          (rotatef (aref vec i)
                   (aref vec pivot-i))
          (:+ pivot-i)))
      ;; swap the pivot (last element) in its proper place
      (rotatef (aref vec (1- (length vec)))
               (aref vec pivot-i))
      (quicksort (slice vec 0 pivot-i) comp)
      (quicksort (slice vec (1+ pivot-i)) comp)))
  vec)

Although recursion is employed here, such implementation is space-efficient as it uses array displacement ("slicing") that doesn't create new copies of the subarrays, so sorting happens in-place. Speaking of recursion, this is one of the cases when it's not so straightforward to turn it into looping (this is left as an exercise to the reader :) ).

What is the complexity of such implementation? Well, if, on every iteration, we divide the array in two equal halves we'll need to perform n comparisons and n/2 swaps and increments, which totals to 2n operations. And we'll need to do that (log n 2) times, which is the height of a complete binary tree with n elements. At every level in the recursion tree, we'll need to perform twice as many sorts with twice as little data, so each level will take the same number of 2n operations. Total complexity: 2n * (log n 2), i.e. O(n * log n). In the ideal case.

However, we can't guarantee that the selected pivot will divide the array into two ideally equal parts. In the worst case, if we were to split it into 2 totally unbalanced subarrays, with n-1 and 0 elements respectively, we'd need to perform sorting n times and had to perform a number of operations that will diminish in the arithmetic progression from 2n to 2. Which sums to (* n (- n 1)). A dreaded O(n^2) complexity. So, the worst-case performance for quicksort is not just worse, but in a different complexity league than the average-case one. Moreover, the conditions for such performance (given our pivot selection scheme) are not so uncommon: sorted and reverse-sorted arrays. And the almost sorted ones will result in the almost worst-case scenario.

It is also interesting to note that if, at each stage, we were to split the array into parts that have a 10:1 ratio of lengths this would have resulted in n * log n complexity! How come? The 10:1 ratio, basically, means that the bigger part each time is shortened at a factor of around 1.1, which still is a power-law recurrence. The base of the algorithm will be different, though: 1.1 instead of 2. Yet, from the complexity theory point of view, the logarithm base is not important because it's still a constant: (log n x) is the same as (/ (log n 2) (log x 2)), and (/ 1 (log x 2)) is a constant for any fixed logarithm base x. In our case, if x is 1.1 the constant factor is 7.27. Which means that quicksort, in the quite bad case of recurring 10:1 splits, will be just a little more than 7 times slower than, in the best case, of recurring equal splits. Significant — yes. But, if we were to compare n * log n (with base 2) vs n^2 performance for n=1000 we'd already get a 100 times slowdown, which will only continue increasing as the input size grows. Compare this to a constant factor of 7...

So, how do we achieve at least 10:1 split, or, at least, 100:1, or similar? One of the simple solutions is called 3-medians approach. The idea is to consider not just a single point as a potential pivot but 3 candidates: first, middle, and last points — and select the one, which has the median value among them. Unless accidentally two or all three points are equal, this guarantees us not taking the extreme value that is the cause of the all-to-nothing split. Also, for a sorted array, this should produce a nice near to equal split. How probable is stumbling at the special case when we'll always get at the extreme value due to equality of the selected points? The calculations here are not so simple, so I'll give just the answer: it's extremely improbable that such condition will hold for all iterations of the algorithm due to the fact that we'll always remove the last element and all the swapping that is going on. More precisely, the only practical variant when it may happen is when the array consists almost or just entirely of the same elements. And this case will be addressed next. One more refinement to the 3-medians approach that will work even better for large arrays is 9-medians that, as is apparent from its name, performs the median selection not among 3 but 9 equidistant points in the array.

Dealing with equal elements is another corner case for quicksort that should be addressed properly. The fix is simple: to divide the array not in 2 but 3 parts, smaller, larger, and equal to the pivot. This will allow for the removal of the equal elements from further consideration and will even speed up sorting instead of slowing it down. The implementation adds another index (this time, from the end of the array) that will tell us where the equal-to-pivot elements will start, and we'll be gradually swapping them into this tail as they are encountered during array traversal.

Production Sort

I was always wondering how it's possible, for Quicksort, to be the default sorting algorithm when it has such bad worst-case performance and there are other algorithms like Merge sort or Heap sort that have guaranteed O(n * log n) ones. With all the mentioned refinements, it's apparent that the worst-case scenario, for Quicksort, can be completely avoided (in the probabilistic sense) while it has a very nice property of sorting in-place with good cache locality, which significantly contributes to better real-world performance. Moreover, production sort implementation will be even smarter by utilizing Quicksort while the array is large and switching to something like Insertion sort when the size of the subarray reaches a certain threshold (10-20 elements). All this, however, is applicable only to arrays. When we consider lists, other factors come into play that make Quicksort much less plausible.

Here's an attempt at such — let's call it "Production sort" — implementation (the function 3-medians is left as an excercise to the reader).

(defun prod-sort (vec comp &optional (eq 'eql))
  (cond ((< (length vec) 2)
         vec)
        ((< (length vec) 10)
         (insertion-sort vec comp))
        (t
         (rotatef (aref vec (1- (length vec)))
                  (aref vec (3-medians vec comp eq)))
         (with ((pivot-i 0)
                (pivot-count 1)
                (last-i (1- (length vec)))
                (pivot (aref vec last-i)))
           (do ((i 0 (1+ i)))
               ((> i (- last-i pivot-count)))
             (cond ((call comp (aref vec i) pivot)
                    (rotatef (aref vec i)
                             (aref vec pivot-i))
                    (:+ pivot-i))
                   ((call eq (aref vec i) pivot)
                    (rotatef (aref vec i)
                             (aref vec (- last-i pivot-count)))
                    (:+ pivot-count)
                    (:- i))))  ; decrement i to reprocess newly swapped point
           (dotimes (i pivot-count)
             (rotatef (aref vec (+ pivot-i i))
                      (aref vec (- last-i i))))
         (prod-sort (slice vec 0 pivot-i) comp eq)
         (prod-sort (slice vec (+ pivot-i pivot-count)) comp eq))))
  vec)

All in all, the example of Quicksort is very interesting, from the point of view of complexity analysis. It shows the importance of analyzing the worst-case and other corner-case scenarios, and, at the same time, teaches that we shouldn't give up immediately if the worst case is not good enough, for there may be ways to handle such corner cases that reduce or remove their impact.

Performance Benchmark

Finally, let's look at our problem from another angle: simple and stupid. We have developed 3 sorting functions' implementations: Insertion, Quick, and Prod. Let's create a tool to compare their performance on randomly generated datasets of decent sizes. This may be done with the following code and repeated many times to exclude the effects of randomness.

(defun random-vec (size)
  (let ((vec (make-array size)))
    (dotimes (i size)
      (:= (? vec i) (random size)))
    vec))

(defun print-sort-timings (sort-name sort-fn vec)
  ;; we'll use in-place modification of the input vector VEC
  ;; so we need to copy it to preserve the original for future use
  (let ((vec (copy-seq vec))
        (len (length vec)))
    (format t "= ~Asort of random vector (length=~A) =~%"
            sort-name len)
    (time (call sort-fn vec '<))
    (format t "= ~Asort of sorted vector (length=~A) =~%"
            sort-name len)
    (time (call sort-fn vec '<))
    (format t "= ~Asort of reverse sorted vector (length=~A) =~%"
            sort-name len)
    (time (call sort-fn vec '>))))

CL-USER> (let ((vec (random-vec 1000)))
           (print-sort-timings "Insertion " 'insertion-sort vec)
           (print-sort-timings "Quick" 'quicksort vec)
           (print-sort-timings "Prod" 'prod-sort vec))
= Insertion sort of random vector (length=1000) =
Evaluation took:
  0.128 seconds of real time
...
= Insertion sort of sorted vector (length=1000) =
Evaluation took:
  0.001 seconds of real time
...
= Insertion sort of reverse sorted vector (length=1000) =
Evaluation took:
  0.257 seconds of real time
...
= Quicksort of random vector (length=1000) =
Evaluation took:
  0.005 seconds of real time
...
= Quicksort of sorted vector (length=1000) =
Evaluation took:
  5.429 seconds of real time
...
= Quicksort of reverse sorted vector (length=1000) =
Evaluation took:
  2.176 seconds of real time
...
= Prodsort of random vector (length=1000) =
Evaluation took:
  0.008 seconds of real time
...
= Prodsort of sorted vector (length=1000) =
Evaluation took:
  0.004 seconds of real time
...
= Prodsort of reverse sorted vector (length=1000) =
Evaluation took:
  0.007 seconds of real time

Overall, this is a really primitive approach that can't serve as conclusive evidence on its own, but it has value as it aligns well with our previous calculations. Moreover, it once again reveals some things that may be omitted in those calculations: for instance, the effects of the hidden constants of the Big-O notation or of the particular programming vehicles used. We can see that, for their worst-case scenarios, where Quicksort and Insertion sort both have O(n^2) complexity and work the longest, Quicksort comes 10 times slower, although it's more than 20 times faster for the average case. This slowdown may be attributed both to the larger number of operations and to using recursion. Also, our Prodsort algorithm demonstrates its expected performance. As you see, such simple testbeds quickly become essential in testing, debugging, and fine-tuning our algorithms' implementations. So it's a worthy investment.

Finally, it is worth noting that array sort is often implemented as in-place sorting, which means that it will modify (spoil) the input vector. We use that in our test function: first, we sort the array and then sort the sorted array in direct and reverse orders. This way, we can omit creating new arrays. Such destructive sort behavior may be both the intended and surprising behavior. The standard Lisp's sort and stable-sort functions also exhibit it, which is, unfortunately, a source of numerous bugs due to the application programmer forgetfulness of the function's side-effects (at least, this is an acute case, for myself). That's why RUTILS provides an additional function safe-sort that is just a thin wrapper over standard sort to free the programmer's mind from worrying or forgetting about this treacherous sort's property.

A few points we can take away from this chapter:

  1. Array is a goto structure for implementing your algorithms. First, try to fit it before moving to other things like lists, trees, and so on.
  2. Complexity estimates should be considered in context: of the particular task's requirements and limitations, of the hardware platform, etc. Performing some real-world benchmarking alongside back-of-the-napkin abstract calculations may be quite insightful.
  3. It's always worth thinking of how to reduce the code to the simplest form: checking of additional conditions, recursion, and many other forms of code complexity, although, rarely are a game changer, often may lead to significant unnecessary slowdowns.

Footnotes:

[1] or void* in C, or some other type that allows any element in your language of choice

[2] Such incompatibility errors are not a cheap thing: for instance, it is reported that the crash of the first Arian V rocket happened due to interoperation of two programs that used the metric and the imperial measurement systems without explicit conversion of the data. There's an elegant solution to such problem: "dimensional numbers", which a custom reader macro to encode the measure alongside the number. Here is a formula expressed with such numbers:

(defun running-distance-for-1kg-weight-loss (mass)
  (* 1/4 (/ #M37600kJ (* #M0.98m/s2 mass))))

CL-USER> (running-distance-for-1kg-weight-loss #M80kg)
119897.96
CL-USER> (running-distance-for-1kg-weight-loss #I200lb)
105732.45

The output is, of course, in metric units. Unfortunately, this approach will not be useful for arrays encoded by different languages as they are obtained not by reading the input but by referencing external memory. Instead, a wrapper struct/class is, usually, used to specify the elements order.