2019-08-30

Programming Algorithms: Key-Values

To conclude the description of essential data structures, we need to discuss key-values (kvs), which are the broadest family of structures one can imagine. Unlike arrays and lists, kvs are not concrete structures. In fact, they span, at least in some capacity, all of the popular concrete ones, as well as some obscure.

The main feature of kvs is efficient access to the values by some kind of keys that they are associated with. In other words, each element of such data structure is a key-value pair that can be easily retrieved if we know the key, and, on the other hand, if we ask for the key that is not in the structure, the null result is also returned efficiently. By "efficiently", we usually mean O(1) or, at least, something sublinear (like O(log n)), although, for some cases, even O(n) retrieval time may be acceptable. See how broad this is! So, a lot of different structures may play the role of key-values.

By the way, there isn't even a single widely-adopted name for such structures. Besides key-values — which isn't such a popular term (I derived it from key-value stores) — in different languages, they are called maps, dictionaries, associative arrays, tables, objects and so on.

In a sense, these are the most basic and essential data structures. They are so essential that some dynamic languages — for example, Lua, explicitly, and JavaScript, without a lot of advertisement — rely on them as the core (sometimes sole) language's data structure. Moreover, key-values are used almost everywhere. Below is a list of some of the most popular scenarios:

  • implementation of the object system in programming languages
  • most of the key-value stores are, for the most part, glorified key-value structures
  • internal tables in the operating system (running process table or file descriptor tables in the Linux kernel), programming language environment or application software
  • all kinds of memoization and caching
  • efficient implementation of sets
  • ad hoc or predefined records for returning aggregated data from function calls
  • representing various dictionaries (in language processing and beyond)

Considering such a wide spread, it may be surprising that, historically, the programming language community only gradually realized the usefulness of key-values. For instance, such languages as C and C++ don't have the built-in support for general kvs (if we don't count structs and arrays, which may be considered significantly limited versions). Lisp, on the contrary, was to some extent pioneering their recognition with the concepts of alists and plists, as well as being one of the first languages to have hash-table support in the standard.

Concrete Key-values

Let's see what concrete structures can be considered key-values and in which cases it makes sense to use them.

Simple Arrays

Simple sequences, especially arrays, may be regarded as a particular variant of kvs that allows only numeric keys with efficient (and fastest) constant-time access. This restriction is serious. However, as we'll see below, it can often be worked around with clever algorithms. As a result, arrays actually play a major role in the key-value space, but not in the most straightforward form. Although, if it is possible to be content with numeric keys and their number is known beforehand, vanilla arrays are the best possible implementation option. Example: OS kernels that have a predefined limit on the number of processes and a "process table" that is indexed by pid (process id) that lies in the range 0..MAX_PID.

So, let's note this curious fact that arrays are also a variant of key-values.

Associative Lists

The main drawback of using simple arrays for kvs is not even the restriction that all keys should somehow be reduced to numbers, but the static nature of arrays, that do not lend themselves well to resizing. As an alternative, we could then use linked lists, which do not have this restriction. If the key-value contains many elements, linked lists are clearly not ideal in terms of efficiency. Many times, the key-value contains very few elements, perhaps only half a dozen or so. In this case, even a linear scan of the whole list may not be such an expensive operation. This is where various forms of associative lists enter the scene. They store pairs of keys and values and don't impose any restrictions, neither on the keys nor on the number of elements. But their performance quickly degrades below acceptable once the number of elements grows above several. Many flavors of associative lists can be invented. Historically, Lisp supports two variants in the standard library:

  • alists (association lists) are lists of cons pairs. A cons pair is the original Lisp data structure, and it consists of two values called the car and the cdr (the names come from two IBM machine instructions). Association lists have dedicated operations to find a pair in the list (assoc) and to add an item to it (pairlis), although, it may be easier to just push the new cons cell onto it. Modification may be performed simply by altering the cdr of the appropriate cons-cell. ((:foo . "bar") (42 . "baz")) is an alist of 2 items with keys :foo and 42, and values "bar" and "baz". As you can see, it's heterogenous in a sense that it allows keys of arbitrary type.
  • plists (property lists) are flat lists of alternating keys and values. They also have dedicated search (getf) and modify operations (setf getf), while insertion may be performed by calling push twice (on the value, and then the key). The plist with the same data as the previous alist will look like this: (:foo "bar" 42 "baz"). Plists are used in Lisp to represent the keyword function arguments as a whole.

Deleting an item from such lists is quite efficient if we already know the place that we want to clear, but tracking this place if we haven't found it yet is a bit cumbersome. In general, the procedure will be to iterate the list by tails until the relevant cons cell is found and then make the previous cell point to this one's tail. A destructive version for alists will look like this:

(defun alist-del (key alist)
  (loop :for tail := alist :then (rest tail)
        :for prev := alist :then tail
        :when (eql key (car (first tail)))
        :do (return (if (eql prev alist)  ; special case of first item
                        (rest alist)
                        (progn (setf (rest prev) (rest tail))
                               alist)))))

However, the standard provides higher-level delete operations for plists (remf) and alists: (remove key alist :key 'car).

Both of these ad-hoc list-based kvs have some historical baggage associated with them and are not very convenient to use. Nevertheless, they can be utilized for some simple scenarios, as well as for interoperability with the existing language machinery. And, however counter-intuitive it may seem, if the number of items is small, alists may be the most efficient key-value data structure.

Another nonstandard but more convenient and slightly more efficient variant of associatie lists was proposed by Ron Garret and is called dlists (dictionary lists). It is a cons-pair of two lists: the list of keys and the list of values. The dlist for our example will look like this: ((:foo 42) . ("bar" "baz")).

As the interface of different associative lists is a thin wrapper over the standard list API, the general list-processing knowledge can be applied to dealing with them, so we won't spend any more time describing how they work.

Hash-Tables

Hash-tables are, probably, the most common way to do key-values, nowadays. They are dynamic and don't impose restrictions on keys while having an amortized O(1) performance albeit with a rather high constant. The next chapter will be exclusively dedicated to hash-table implementation and usage. Here, it suffices to say that hash-tables come in many different flavors, including the ones that can be efficiently pre-computed if we want to store a set of items that is known ahead-of-time. Hash-tables are, definitely, the most versatile key-value variant and thus the default choice for such a structure. However, they are not so simple and may pose a number of surprises that the programmer should understand in order to use them properly.

Structs

Speaking of structs, they may also be considered a special variant of key-values with a predefined set of keys. In this respect, structs are similar to arrays, which have a fixed set of keys (from 0 to MAX_KEY). As we already know, structs internally map to arrays, so they may be considered a layer of syntactic sugar that provides names for the keys and handy accessors. Usually, the struct is pictured not as a key-value but rather a way to make the code more "semantic" and understandable. Yet, if we consider returning the aggregate value from a function call, as the possible set of keys is known beforehand, it's a good stylistic and implementation choice to define a special-purpose one-off struct for this instead of using an alist or a hash-table. Here is a small example — compare the clarity of the alternatives:

(defun foo-adhoc-list (arg)
  (let ((rez (list)))
    ...
    (push "hello" rez)
    ...
    (push arg rez)
    ...
    rez))

CL-USER> (foo-adhoc-list 42)
(42 "hello")

(defun foo-adhoc-hash (arg)
  (let ((rez (make-hash-table)))
    ...
    (:= (gethash :baz rez) "hello")
    ...
    (:= (gethash :quux rez) arg))
    ...
    rez))

CL-USER> (foo-adhoc-hash 42)
#<HASH-TABLE :TEST EQL :COUNT 2 {1040DBFE83}>

(defstruct foo-rez
  baz quux)

(defun foo-struct (&rest args)
  (let ((rez (make-foo-rez)))
    ...
    (:= (foo-baz rez) "hello")
    ...
    (:= (foo-quux rez) 42))
    ...
    rez))

CL-USER> (foo-struct 42)
#S(FOO-REZ :BAZ "hello" :QUUX 42)

Trees

Another versatile option for implementing kvs is by using trees. There are even more tree variants than hash-tables and we'll also have dedicated chapters to study them. Generally, the main advantage of trees, compared to simple hash-tables, is the possibility to impose some ordering on the keys (although, linked hash-tables also allow for that), while the disadvantage is less efficient operation: O(log n). Also, trees don't require hashing. Another major direction that the usage of trees opens is the possibility of persistent key-values implementation. Some languages, like Java, have standard-library support for tree-based kvs (TreeMap), but most languages delegate dealing with such structures to library authors for there is a wide choice of specific trees and neither may serve as the default choice of a key-value structure.

KV Operations

The primary operation for a kv structure is access to its elements by key: to set, change, and remove. As there are so many different variants of concrete kvs there's a number of different low-level access operations, some of which we have already discussed in the previous chapters and the others will see in the next ones.

Yet, most of the algorithms don't necessarily require the efficiency of built-in accessors, while their clarity will seriously benefit from a uniform generic access operation. Such an operation, as we have already mentioned, is defined by RUTILS and is called generic-elt or ?, for short. We have already seen it in action in some of the examples before. And that's not an accident as kv access is among the most frequent o. In the following chapter, we will stick to the rule of using the specific accessors like gethash when we are talking about some structure-specific operations and ? in all other cases — when clarity matters more than low-level considerations. ? is implemented using the CLOS generic function machinery that provides dynamic dispatch to a concrete retrieval operation and allows defining additional variants for new structures as the need arises. Another useful feature of generic-elt is chaining that allows expressing multiple accesses as a single call. This comes in very handy for nested structures. Consider an example of accessing the first element of the field of the struct that is the value in some hash table: (? x :key 0 'field). If we were to use concrete operations it would look like this: (slot-value (nth 0 (gethash :key x)) 'field).

Below is the backbone of the generic-elt function that handles chaining and error reporting:

(defgeneric generic-elt (obj key &rest keys)
  (:documentation
   "Generic element access in OBJ by KEY.
    Supports chaining with KEYS.")
  (:method :around (obj key &rest keys)
    (reduce #'generic-elt keys :initial-value (call-next-method obj key)))
  (:method (obj key &rest keys)
    (declare (ignore keys))
    (error 'generic-elt-error :obj obj :key key)))

And here are some methods for specific kvs (as well as sequences):

(defmethod generic-elt ((obj hash-table) key &rest keys)
  (declare (ignore keys))
  (gethash key obj))

(defmethod generic-elt ((obj vector) key &rest keys)
  (declare (ignore keys))
  ;; Python-like handling of negative indices as offsets from the end
  (when (minusp key) (setf key (- (length obj) key)))
  (aref obj key))

(defmethod generic-elt ((obj (eql nil)) key &rest keys)
  (declare (ignore key keys))
  (error "Can't access NIL with generic-elt!"))

generic-setf is a complement function that allows defining setter operations for generic-elt. There exists a built-in protocol to make Lisp aware that generic-setf should be called whenever := (or the standard setf) is invoked for the value accessed with ?: (defsetf ? generic-setf).

It is also common to retrieve all keys or values of the kv, which is handled in a generic way by the keys and vals RUTILS functions.

Key-values are not sequences in a sense that they are not necessarily ordered, although some variants are. But even unordered kvs may be traversed in some random order. Iterating over kvs is another common and essential operation. In Lisp, as we already know, there are two complimentary iteration patterns: the functional map- and the imperative do-style. RUTILS provides both of them as mapkv and dokv, although I'd recommend to first consider the macro dotable that is specifically designed to operate on hash-tables.

Finally, another common necessity is the transformation between different kv representations, primarily, between hash-tables and lists of pairs, which is also handled by RUTILS with its ht->pairs/ht->alist and pairs->ht/alist->ht functions.

As you see, the authors of the Lisp standard library hadn't envisioned the generic key-value access protocols, and so it is implemented completely in a 3rd-party addon. Yet, what's most important is that the building blocks for doing that were provided by the language, so this case shows the critical importance that these blocks (primarily, CLOS generic functions) have in future-proofing the language's design.

Memoization

One of the major use cases for key-values is memoization — storing the results of previous computations in a dedicated table (cache) to avoid recalculating them. Memoization is one of the main optimization techniques; I'd even say the default one. Essentially, it trades space for speed. And the main issue is that space is also limited so memoization algorithms are geared towards optimizing its usage to retain the most relevant items, i.e. maximize the probability that the items in the cache will be reused.

Memoization may be performed ad-hoc or explicitly: just set up some key scheme and a table to store the results and add/retrieve/remove the items as needed. It can also be delegated to the compiler in the implicit form. For instance, Java or Python provide the @memoize decorator: once it is used with the function definition, each call to it will pass through the assigned cache using the call arguments as the cache keys. This is how the same feature may be implemented in Lisp, in the simplest fashion:

(defun start-memoizing (fn)
  (stop-memoizing fn)
  (:= (symbol-function fn)
      (let ((table (make-hash-table :test 'equal))
            (vanilla-fn (symbol-function fn)))
        (:= (get fn :cache) table
            (get fn :fn) vanilla-fn)
        (lambda (&rest args)
          (getset# (format nil "~{~A~^|~}" args)
                   table
                   (apply vanilla-fn args))))))

(defun stop-memoizing (fn)
  (when (get fn :fn)
    (:= (symbol-function fn) (get fn :fn)
        (get fn :fn) nil)))

CL-USER> (defun foo (x)
           (sleep 5)
           x) 
CL-USER> (start-memoizing 'foo)
CL-USER> (time (foo 1))
Evaluation took:
  5.000 seconds of real time
CL-USER> (time (foo 1))
Evaluation took:
  0.000 seconds of real time
CL-USER> (time (foo 2))
Evaluation took:
  5.001 seconds of real time

We use a hash-table to store the memoized results. The getset# macro from RUTILS tries to retrieve the item from the table by key and, if it's not present there, performs the calculation given as its last argument returning its result while also storing it in the table at key. Another useful Lisp feature utilized in this facility is called "symbol plist": every symbol has an associated key-value plist. Items in this plist can be retrieved using the get operator.[1]

This approach is rather primitive and has a number of drawbacks. First of all, the hash-table is not limited in capacity. Thus if it is used carelessly, a memory-leak is inevitable. Another possible issue may occur with the keys, which are determined by simply concatenating the string representations of the arguments — possibly, non-unique. Such bug may be very subtle and hard to infer. Overall, memoization is the source of implicit behavior that always poses potential trouble but sometimes is just necessary. A more nuanced solution will allow us to configure both how the keys are calculated and various parameters of the cache, which we'll discuss next. One more possible decision to make might be about what to cache and what not: for example, we could add a time measurement around the call to the original function and only when it exceeds a predefined limit the results will be cached.

Cache Invalidation

The problem of cache invalidation arises when we set some limit on the size of the cache. Once it is full — and a properly setup cache should be full, effectively, all the time — we have to decide which item to remove (evict) when we need to put a new one in the cache. I've already mentioned the saying that (alongside naming things) it is the hardest challenge in computer science. In fact, it's not, it's rather trivial, from the point of view of algorithms. The hard part is defining the notion of relevance. There are two general approximations which are used unless there are some specific considerations: frequency of access or time of last access. Let's see the algorithms built around these. Each approach uses some additional data stored with each key. The purpose of the data is to track one of the properties, i.e., either frequency of access or time of last access.

Second Chance and Clock Algorithms

The simplest approach to cache invalidation except for random choice eviction may be utilized when we are severely limited in the amount of additional space we can use per key. Usually, this situation is typical for hardware caches. The minimal possible amount of information to store is 1 bit. If we have just as much space, the only option we have is to use it as a flag indicating whether the item was accessed again after it was put into the cache. This technique is very fast and very simple. And improves cache performance to some extent. There may be two ways of tracking this bit efficiently:

  1. Just use a bit vector (usually called "bitmap", in such context) of the same length as the cache size. To select the item for eviction, find the first 0 from the left or right. With the help of one of the hardware instructions from the bit scan family (ffs — find first zero, clz — count trailing zeros, etc.), this operation can be blazingly fast. In Lisp, we could use the high-level function position:
    (defun find-candidate-second-chance (bitmap)
      (declare (type bit-vector bitmap))
      (position 0 bitmap))
    

    The type declaration is necessary for the implementation to emit the appropriate machine instruction. If you're not confident in that, just disassemble the function and look at the generated machine code:

    CL-USER> (disassemble 'find-candidate-second-chance)
    ; disassembly for FIND-CANDIDATE-SECOND-CHANCE
    ; Size: 228 bytes. Origin: #x103A8E42F0
    ...
    ; 340:       B878D53620       MOV EAX, #x2036D578             ; #<FDEFN SB-KERNEL:%BIT-POSITION/0>
    ...
    

    So, SBCL uses sb-kernel:%bit-position/0, nice. If you look inside this function, though, you'll find out that it's also pretty complicated. And, overall, there are lots of other assembler instructions in this piece, so if our goal is squeezing the last bit out of it there's more we can do:

    • Force the implementation to optimize for speed: put (declaim (optimize (speed 3) (debug 0) (safety 1))) at the top of the file with the function definition or use proclaim in the REPL with the same declarations.
    • Use the low-level function sb-kernel:%bit-position/0 directly.
    • Go even deeper and use the machine instruction directly — SBCL allows that as well: (sb-vm::%primitive sb-vm::unsigned-word-find-first-bit x). But this will be truly context-dependent (on the endianness, hardware architecture, and the size of the bit vector itself, which should fit into a machine word for this technique to work).

    However, there's one problem with the function find-candidate-second-chance: if all the bits are set it will return nil. By selecting the first element (or even better, some random element), we can fix this problem. Still, eventually, we'll end up with all elements of the bitmap set to 1, so the method will degrade to simple random choice. It means that we need to periodically reset the bit vector. Either on every eviction — this is a good strategy if we happen to hit the cache more often than miss. Or after some number of iterations. Or after every bit is set to 1.

  2. Another method for selecting a candidate to evict is known as the Clock algorithm. It keeps examining the visited bit of each item, in a cycle: if it's equal to 1 reset it and move to the next item; if it's 0 — select the item for eviction. Basically, it's yet another strategy for dealing with the saturation of the bit vector. Here's how it may be implemented in Lisp with the help of the closure pattern: the function keeps track of its internal state, using a lexical variable that is only accessible from inside the function, and that has a value that persists between calls to the function. The closure is created by the let block and the variable closed over is i, here:
  3. (let ((i 0))
      (defun find-candidate-clock (bitmap)
        (declare (type (vector bit) bitmap))
        (loop :with len := (length bitmap)
              :until (zerop (svref bitmap i))
              :do (:+ i)
                  (when (= i len)
                    (:= i 0)))
        i))
    

    Our loop is guaranteed to find the zero bit at least after we cycle over all the elements and return to the first one that we have set to zero ourselves. Obviously, here and in other places where it is not stated explicitly, we're talking about single-threaded execution only.

LFU

So, what if we don't have such a serious restriction on the size of the access counter? In this case, a similar algorithm that uses a counter instead of a flag will be called least frequently used (LFU) item eviction. There is one problem though: the access counter will only grow over time, so some items that were heavily used during some period will never be evicted from the cache, even though they may never be accessed again. To counteract this accumulation property, which is similar to bitmap saturation we've seen in the previous algorithm, a similar measure can be applied. Namely, we'll have to introduce some notion of epochs, which reset or diminish the value of all counters. The most common approach to epochs is to right shift each counter, i.e. divide by 2. This strategy is called aging. An LFU cache with aging may be called LRFU — least frequently and recently used.

As usual, the question arises, how often to apply aging. The answer may be context-dependent and dependent on the size of the access counter. For instance, usually, a 1-byte counter, which can distinguish between 256 access operations, will be good enough, and it rarely makes sense to use a smaller one as most hardware operates in byte-sized units. The common strategies for aging may be:

  • periodically with an arbitrarily chosen interval — which should be enough to accumulate some number of changes in the counters but not to overflow them
  • after a certain number of cache access operations. Such an approach may ensure that the counter doesn't overflow: say, if we use a 1-byte counter and age after each 128 access operations the counter will never exceed 192. Or we could perform the shift after 256 operations and still ensure lack of overflows with high probability

LRU

An alternative approach to LFU is LRU — evict the item that was used the longest time ago. LRU means that we need to store either last-access timestamps or some generation/epoch counters. Another possibility is to utilize access counters, similar to the ones that were used for LFU, except that we initialize them by setting all bits to 1, i.e. to the maximum possible value (255 for 1-byte counter). The counters are decremented, on each cache access, simultaneously for all items except for the item being accessed. The benefit of such an approach is that it doesn't require accessing any external notion of time making the cache fully self-contained, which is necessary for some hardware implementations, for instance. The only thing to remember is not to decrement the counter beyond 0 :)

Unlike LFU, this strategy can't distinguish between a heavily-accessed item and a sparingly-accessed one. So, in the general case, I'd say that LFU with aging (LRFU) should be the default approach, although its implementation is slightly more complex.

Memoization in Action: Transposition Tables

Transposition Tables is a characteristic example of the effective usage of memoization, which comes from classic game AI. But the same approach may be applied in numerous other areas with lots of computation paths that converge and diverge at times. We'll return to similar problems in the last third of this book.

In such games as chess, the same position may be reached in a great variety of moves. All possible sequences are called transpositions, and it is obvious that, regardless of how we reached a certain position, if we have already analyzed that situation previously, we don't need to repeat the analysis when it repeats. So, caching the results allows us to save a lot of redundant computation. However, the number of positions, in chess, that comes up during the analysis is huge so we don't stand a chance of remembering all of them. In this case, a good predictor for the chance of a situation to occur is very likely the number of times it has occurred in the past. For that reason, an appropriate caching technique, in this context, is plain LFU. But there's more. Yet, another measure of the value of a certain position is how early it occurred in the game tree (since the number of possible developments, from it, is larger). So, classic LFU should be mixed with this temporal information yielding a domain-specific caching approach. And the parameters of combining the two measures together are subject to empirical evaluation and research.

There's much more to transposition tables than mentioned in this short introduction. For instance, the keys describing the position may need to include additional information if the history of occurrence in it impacts the further game outcome (castling and repetition rules). Here's, also, a quote from Wikipedia on their additional use in another common chess-playing algorithm:

The transposition table can have other uses than finding transpositions. In alpha-beta pruning, the search is fastest (in fact, optimal) when the child of a node corresponding to the best move is always considered first. Of course, there is no way of knowing the best move beforehand, but when iterative deepening is used, the move that was found to be the best in a shallower search is a good approximation. Therefore this move is tried first. For storing the best child of a node, the entry corresponding to that node in the transposition table is used.

Low-Level Caching

So, memoization is the primary tool for algorithm optimization, and the lower we descend into our computing platform the more this fact becomes apparent. For hardware, it is, basically, the only option. There are many caches in the platform that act behind the scenes, but which have a great impact on the actual performance of your code: the CPU caches, the disk cache, the page cache, and other OS caches. The main issue, here, is the lack of transparency into their operation and sometimes even the lack of awareness of their existence. This topic is, largely, beyond the scope of our book, so if you want to learn more, there's a well-known talk "A Crash Course in Modern Hardware" and an accompanying list of "Latency Numbers Every Programmer Should Know" that you can start with. Here, I can provide only a brief outline.

The most important cache in the system is the CPU cache — or, rather, in most of the modern architectures, a system of 2 or 3 caches. There's an infamous von-Neumann's bottleneck of the conventional computer hardware design: the CPU works roughly 2 orders of magnitude faster than it can fetch data from memory. Last time I checked, the numbers were: execution of one memory transfer took around 250-300 CPU cycles, i.e. around 300 additions or other primitive instructions could be run during that time. And the problem is that CPUs operate only on data that they get from memory, so if the bottleneck didn't exist at all, theoretically, we could have 2 orders of magnitude faster execution. Fortunately, the degradation in performance is not so drastic, thanks to the use of CPU caches: only around an order of magnitude. The cache transfer numbers are the following: from L1 (the fastest and hence smallest) cache — around 5 cycles, from L2 — 20-30 cycles, from L3 — 50-100 cycles (that's why L3 is, not always used as it's almost on par with the main memory). Why do I say that fastest means smallest? Just because fast access memory is more expensive and requires more energy. Otherwise, we could just make all RAM as fast as the L1 cache.

How these caches operate? This is one of the things that every algorithmic programmer should know, at least, in general. Even if some algorithm seems good on paper, a more cache-friendly one with worse theoretical properties may very well outperform it.

The CPU cache temporarily stores contents of the memory cells (memory words) indexed by their addresses. It is called set-associative as it operates not on single cells but on sequential blocks of those (in the so-called cache lines). The L1 cache of size 1MB, usually, will store 64 such blocks each one holding 16 words. This approach is oriented towards the normal sequential layout of executable code, structures, and arrays — the majority of the memory contents. And the corresponding common memory access pattern — sequential. I.e., after reading one memory cell, usually, the processor will move on to the next: either because it's the next instruction to execute or the next item in the array being iterated over. That's why so much importance in program optimization folklore is given to cache alignment, i.e. structuring the program's memory so that the things commonly accessed together will fit into the same cache line. One example of this principle is the padding of structures with zeroes to align their size to be a multiple of 32 or 64. The same applies to code padding with nops. And this is another reason why arrays are a preferred data structure compared to linked lists: when the whole contents fit in the same cache line its processing performance is blazingly fast. The catch, though, is that it's, practically, impossible, for normal programmers, to directly observe how CPU cache interoperates with their programs. There are no tools to make it transparent so what remains is to rely on the general principles, second-guessing, and trial&error.

Another interesting choice for hardware (and some software) caches is write-through versus write-back behavior. The question is how the cache deals with cached data being modified:

  • either the modifications will be immediately stored to the main storage, effectively, making the whole operation longer
  • or they may, first, be persisted to the cache only; while writing to the backing store (synchronization) will be performed on of all data in the cache at configured intervals

The second option is faster as there's a smaller number of expensive round-trips, but it is less resilient to failure. A good example of the write-back cache in action is the origin of the Windows "Safely remove hardware" option. The underlying assumption is that the data to be written to the flash drive passes through the OS cache, which may be configured in the write-back fashion. In this case, a forced sync is required before disconnecting the device to ensure that the latest version of the cached data is saved to it.

Another example of caching drastically impacting performance, which everyone is familiar with, is paging or swapping — an operation performed by the operating system. When the executing programs together require more (virtual) memory than the size of the RAM that is physically available, the OS saves some of the pages of data that these program use to a place on disk known as the swap section.

A few points we can take away from this chapter:

  1. Key-values are very versatile and widely-used data structures. Don't limit your understanding of them to a particular implementation choice made by the designers of the programming language you're currently using.
  2. Trading space for time is, probably, the most wide-spread and impactful algorithmic technique.
  3. Caching, which is a direct manifestation of this technic and one of the main applications of key-value data structures, is one of the principal factors impacting program performance, on a large scale. It may be utilized by the programmer in the form of memoization, and will also inevitably be used by the underlying platform, in hard to control and predict ways. The area of program optimization for efficient hardware utilization represents a distinct set of techniques, requiring skills that are obscure and also not fully systematized.

Footnotes:

[1] Symbol plists represent on of the unpleasant legacy features of the language, in that the most obvious accessor name, namely get, is reserved for working with symbols. Therefore, this name can not be used for accessing other kinds of data. Historically, symbol plists where the first and only variant of key-values available in the language (at that time, the other languages didn't have the slightest idea of such a high-level concept).

No comments: