2019-07-22

"Programming Algorithms" Book


Drago — a nice example of a real-world binary tree

I'm writing a book about algorithms and Lisp. It, actually, started several years ago, but as I experience a constant shortage of quality time to devote to such side activities, short periods of writing alternated with long pauses. Now, I'm, finally, at the stage when I can start publishing it. But I intend to do that, first, gradually in this blog and then put the final version — hopefully, improved and polished thanks to the comments of the first readers — on Leanpub. The book will be freely available with a CC BY-NC-ND license.

The book will have 16 chapters grouped into 3 parts: essential data structures, derivative ones, and advanced algorithms. I plan to publish each one approximately once a week. To finish the process by the end of the year.

I hope the book turns out to be an enlightening read for those who start their career in programming or want to level up in it. At least, I tried to accumulate inside all my experience from production algorithmic development, teaching of these topics, and Lisp programming, over the last 10+ years. Below is a short preface and an introductory chapter about Complexity.

Why Algorithms Matter

In our industry, currently, there seems to prevail a certain misunderstanding of the importance of algorithms for the working programmer. There's often a disconnect between the algorithmic questions posed at the job interviews and the everyday essence of the same job. That's why opinions are voiced that you, actually, don't have to know CS to be successful in the software developer's job. That's true, you don't, but you'd better do if you want to be in the notorious top 10% programmers. For several reasons. One is that, actually, you can find room for algorithms almost at every corner of your work — provided you are aware of their existence. To put it simply, the fact that you don't know a more efficient or elegant solution to a particular programming problem doesn't make your code less crappy. The current trend in software development is that, although the hardware becomes more performant, the software becomes slower faster. There are two reasons for that, in my humble opinion:

  1. Most of the application programmers don't know the inner workings of the underlying platforms. And the number of platform layers keeps increasing.
  2. Most of the programmers also don't know enough algorithms and algorithmic development technics to squeeze the most from their code. And often this means a loss of one or more orders of magnitude of performance.

In the book, I'll address, primarily, the second issue but will also try to touch on the first whenever possible.

Besides, learning the art of solving difficult algorithmic problems trains the brain and makes it more apt to solving various other problems, in the course of your day-to-day work.

Finally, you will be speaking the same lingua franca as other advanced programmers — the tongue that transcends the mundane differences of particular programming languages. And you'll gain a more detached view of those differences, freeing your mind from the dictate of a particular set of choices exhibiting in any one of them.

One of the reasons for this gap of understanding of the value of algorithms, probably, originates from how they are usually presented in the computer science curriculum. First, it is often done in a rather theoretical or "mathematical" way with rigorous proofs and lack of connection to the real world™. Second, the audience is usually freshmen or sophomores who don't have a lot of practical programming experience and thus can't appreciate and relate how this knowledge may be applied to their own programming challenges (because they didn't have those yet) — rather, most of them are still at the level of struggling to learn well their first programming language and, in their understanding of computing, are very much tied to its choices and idiosyncrasies.

In this book, the emphasis is made on the demonstration of the use of the described data structures and algorithms in various areas of computer programming. Moreover, I anticipate that the self-selected audience will comprise programmers with some experience in the field. This makes a significant difference in the set of topics that are relevant and how they can be conveyed. Another thing that helps a lot is when the programmer has a good command of more than one programming language, especially, if the languages are from different paradigms: static and dynamic, object-oriented and functional. These factors allow bridging the gap between "theoretical" algorithms and practical coding, making the topic accessible, interesting, and inspiring.

This is one answer to a possible question: why write another book on algorithms? Indeed, there are several good textbooks and online courses on the topic, of which I'd recommend the most Steven Skienna's The Algorithm Design Manual. Yet, as I said, this book is not at all academic in presentation of the material, which is a norm for other textbooks. Except for simple arithmetic, it contains almost no "math" or proofs. And, although proper attention is devoted to algorithm complexity, it doesn't deal with theories of complexity or computation and similar scientific topics. Besides, all the algorithms and data structures come with some example practical use cases. Last, but not least, there's no book on algorithms in Lisp, and, in my opinion, it's a great topic to introduce the language. The next chapter will provide a crash course to grasp the basic ideas, and then we'll discuss various Lisp programming approaches alongside the algorithms they will be used to implement.

This is an introductory book, not a bible of algorithms. It will draw a comprehensive picture and cover all topics necessary for further advancement of your algorithms knowledge. However, it won't go too deep into the advanced topics, such as persistent or probabilistic data structures, advanced tree, graph, and optimization algorithms, as well as algorithms for particular fields, such as Machine Learning, Cryptography or Computational Geometry. All of those fields require (and usually have) separate books of their own.

A Few Words about Lisp

For a long time, I've been contemplating writing an introductory book on Lisp, but something didn't add up, I couldn't see the coherent picture, in my mind. And then I got a chance to teach algorithms with Lisp. From my point of view, it's a perfect fit for demonstrating data structures and algorithms (with a caveat that students should be willing to learn it), while discussing the practical aspects of those algorithms allows to explain the language naturally. At the same time, this topic requires almost no endeavor into the adjacent areas of programming, such as architecture and program design, integration with other systems, user interface, and use of advanced language features, such as types or macros. And that is great because those topics are overkill for an introductory text and they are also addressed nicely and in great detail elsewhere (see Practical Common Lisp and ANSI Common Lisp).

Why Lisp is great for algorithmic programs? One reason is that the language was created with such use case in mind. It has support for all the proper basic data structures, such as arrays, hash-tables, linked lists, strings, and tuples. It also has a numeric tower, which means no overflow errors and, so, a much saner math. Next, it's created for the interactive development style, so the experimentation cycle is very short, there's no compile-wait-run-revise red tape, and there are no unnecessary constraints, like the need for additional annotations (a.k.a. types), prohibition of variable mutation or other stuff like that. You just write a function in the REPL, run it and see the results. In my experience, Lisp programs look almost like pseudocode. Compared to other languages, they may be slightly more verbose at times but are much more clear, simple, and directly compatible with the algorithm's logical representation.

But why not choose a popular programming language? The short answer is that it wouldn't have been optimal. There are 4 potential mainstream languages that could be considered for this book: C++, Java, Python, and JavaScript. (Surely, there's already enough material on algorithms that uses them). The first two are statically-typed, which is, in itself, a big obstacle to using them as teaching languages. Java is also too verbose, while C++ — too low-level. These qualities don't prevent them from being used in the majority of production algorithm code, in the wild, and you'll, probably, end up dealing with such code sooner than later if not already. Besides, their standard libraries provide great examples of practical algorithm implementation. But, I believe that gaining good conceptual understanding will allow to easily adapt to one of these languages if necessary while learning them in parallel with diving into algorithms creates unnecessary complexity. Python and JS are, in many ways, the opposite choices: they are dynamic and provide some level of an interactive experience (albeit inferior compared to Lisp), but those languages are in many ways anti-algorithmic. Trying to be simple and accessible, they hide too much from the programmer and don't give enough control of the concrete data. Teaching algorithms, using their standard libraries, seems like cheating to me as their basic data structures often are not what they claim to be. Lisp is in the middle: it is both highly interactive and gives enough control of the environment, while not being too verbose and demanding. And the price to pay — the unfamiliar syntax — is really small, in my humble opinion.

Yet, there's another language that is rapidly gaining popularity and is considered by many to be a good choice for algorithmic development — Rust. It's also a static language, although not so ceremonial as Java or C++. However, neither am I an expert in Rust, nor intend to become one. Moreover, I think the same general considerations regarding static languages apply to it.

Algorithmic Complexity

Complexity is a point that will be mentioned literally on every page of this book; the discussion of any algorithm or data structure can't avoid this topic. After correctness, it is the second most important quality of every algorithm — moreover, often correctness alone doesn't matter if complexity is neglected, while the opposite is possible: to compromise correctness somewhat in order to get significantly better complexity. By and large, algorithm theory differs from other subjects of CS in that it concerns not about presenting a working (correct) way to solve some problem but about finding an efficient way to do it. Where efficiency is understood as the minimal (or admissible) number of operations performed and occupied memory space.

In principle, the complexity of an algorithm is the dependence of the number of operations that will be performed on the size of the input. It is crucial to the computer system's scalability: it may be easy to solve the programming problem for a particular set of inputs, but how will the solution behave if the input is doubled, increased tenfold or million-fold? This is not a theoretical question, and an analysis of any general-purpose algorithm should have a clear answer to it.

Complexity is a substantial research topic: a whole separate branch of CS — Complexity Theory — exists to study it. Yet, throughout the book, we'll try to utilize the end results of such research without delving deep into rigorous proofs or complex math, especially since, in most of the cases, measuring complexity is a matter of simple counting. Let's look at the following illustrative example:

(defun mat-max (mat)
  (let (max)
    (dotimes (i (array-dimension mat 0))
      (dotimes (j (array-dimension mat 1))
        (when (or (null max)
                  (> (aref mat i j) max))
          (:= max (aref mat i j)))))
    max))

This function finds the maximum element of a two-dimensional array (matrix):

CL-USER> (mat-max #2A((1 2 3) (4 5 6)))
6

What's its complexity? To answer, we can just count the number of operations performed: at each iteration of the inner loop, there are 2 comparisons involving 1 array access, and, sometimes, if the planets align we perform another access for assignment. The inner loop is executed (array-dimension mat 1) times (let's call it m where m=3), and the outer one — (array-dimension mat 0) (n=2, in the example). If we sum this all up we'll get: n * m * 4 as an upper limit, for the worst case when each sequent array element is larger then the previous. As a rule of thumb, each loop adds multiplication to the formula, and each sequential block adds a plus sign.

In this calculation, there are two variables (array dimensions n and m) and one constant (the number of operations performed for each array element). There exists a special notation — Big-O — used to simplify the representation of end results of such complexity arithmetic. In it, all constants are reduced to 1, and thus m * 1 becomes just m, and also since we don't care about individual array dimension differences we can just put n * n instead of n * m. With such simplification, we can write down the final complexity result for this function: O(n^2). In other words, our algorithm has quadratic complexity (which happens to be a variant of a broader class called "polynomial complexity") in array dimensions. It means that by increasing the dimensions of our matrix ten times, we'll increase the number of operations of the algorithm 100 times. In this case, however, it may be more natural to be concerned with the dependence of the number of operations on the number of elements of the matrix, not its dimensions. We can observe that n^2 is the actual number of elements, so it can also be written as just n — if by `n` we mean the number of elements, and then the complexity is linear in the number of elements (O(n)). As you see, it is crucial to understand what `n` we are talking about!

There are just a few more things to know about Big-O complexity before we can start using it to analyze our algorithms.

1. There are 6 major complexity classes of algorithms:

  • constant-time (O(1))
  • sublinear (usually, logarithmic — O(log n))
  • linear (O(n)) and superlinear (O(n * log n))
  • higher-order polynomial (O(n^c), where c is some constant greater than 1)
  • exponential (O(с^n), where с is usually 2 but, at least, greater than 1)
  • and just plain lunatic complex (O(n!) and so forth) — I call them O(mg), jokingly

Each class is a step-function change in performance, especially, at scale. We'll talk about each one of them as we'll be discussing the particular examples of algorithms falling into it.

2. Worst-case vs. average-case behavior. In this example, we saw that there may be two counts of operations: for the average case, we can assume that approximately half of the iterations will require assignment (which results in 3,5 operations in each inner loop), and, for the worst case, the number will be exactly 4. As Big-O reduces all numbers to 1, for this example, the difference is irrelevant, but there may be others, for which it is much more drastic and can't be discarded. Usually, for such algorithms, both complexities should be mentioned (alongside with ways to avoid worst-case scenarios): a good example is quicksort algorithm described in the subsequent chapter.

3. We have also seen the so-called "constant factors hidden by the Big-O notation". I.e., from the point of view of algorithm complexity, it doesn't matter if we need to perform 3 operations in the inner loop or 30. Yet, it is quite important in practice, and we'll also discuss it below when examining binary search. Even more, some algorithms with better theoretical complexity may be worse in many practical applications due to these hidden factors (for example, until the dataset reaches a certain size).

4. Finally, besides execution time complexity, there's also space complexity, which instead of the number of operations measures the amount of storage space used proportional to the size of the input. In general, similar approaches are applied to its estimation.


No comments: