2019-08-05

Programming Algorithms: Data Structures

The next several chapters will be describing the basic data structures that every programming language provides, their usage, and the most important algorithms relevant to them. And we'll start with the notion of a data-structure and tuples or structs that are the most primitive and essential one. Here is a snippet from this chapter.

Data Structures vs Algorithms

Let's start with a somewhat abstract question: what's more important, algorithms or data structures?

From one point of view, algorithms are the essence of many programs, while data structures may seem secondary. Besides, although a majority of algorithms rely on certain features of particular data structures, not all do. Good examples of the data-structure-relying algorithms are heapsort, search using BSTs, and union-find. And of the second type: the sieve of Erastophenes and consistent hashing.

At the same time, some seasoned developers state that when the right data structure is found, the algorithm will almost write itself. Linus Torvalds, the creator of Linux, is quoted saying:

Bad programmers worry about the code. Good programmers worry about data structures and their relationships.

A somewhat less poignant version of the same idea is formulated in the Art of Unix Programming by Eric S. Raymond as the "Rule of Representation":

Fold knowledge into data so program logic can be stupid and robust.

Even the simplest procedural logic is hard for humans to verify, but quite complex data structures are fairly easy to model and reason about. To see this, compare the expressiveness and explanatory power of a diagram of (say) a fifty-node pointer tree with a flowchart of a fifty-line program. Or, compare an array initializer expressing a conversion table with an equivalent switch statement. The difference in transparency and clarity is dramatic.

Data is more tractable than program logic. It follows that where you see a choice between complexity in data structures and complexity in code, choose the former. More: in evolving a design, you should actively seek ways to shift complexity from code to data.

Data structures are more static than algorithms. Surely, most of them allow change of their contents over time, but there are certain invariants that always hold. This allows reasoning by simple induction: consider only two (or at least a small number of) cases, the base one(s) and the general. In other words, data structures remove, in the main, the notion of time from consideration, and change over time is one of the major causes of program complexity. In other words, data structures are declarative, while most of the algorithms are imperative. The advantage of the declarative approach is that you don't have to imagine (trace) the flow of time through it.

So, this book, like most other books on the subject, is organized around data structures. The majority of the chapters present a particular structure, its properties and interface, and explain the algorithms, associated with it, showing its real-world use cases. Yet, some important algorithms don't require a particular data structure, so there are also several chapters dedicated exclusively to them.

More information about the book may be found on its website.

No comments: