Programming Algorithms: Approximation

This is a snippet of the "Approximation" chapter of the book.

This chapter will be a collection of stuff from somewhat related but still distinct domains. What unites it is that all the algorithms we will discuss are, after all, targeted at calculating approximations to some mathematical functions. There are no advanced data structures involved, neither is the aim to find a clever way to improve the runtime of some common operations. No, these algorithms are about calculations and computing an acceptable result within the allocated time budget.

Combinatorial Optimization

Dynamic Programming is a framework that can be used for finding the optimal value of some loss function when there are multiple configurations of the problem space that result in different values. Such search is an example of discrete optimization for there is a countable number of states of the system and a distinct value of the cost function we're optimizing corresponding to each state. There are also similar problems that have an unlimited and uncountable number of states, but there is still a way to find a global or local optimum of the cost function for them. They comprise the continuous optimization domain. Why is optimization not just a specialized area relevant to a few practitioners but a toolbox that every senior programmer should know how to utilize? The primary reason is that it is applicable in almost any domain: the problem just needs to be large enough to rule out simple brute force. You can optimize how the data is stored or how the packets are routed, how the blueprint is laid out or the servers are loaded. Many people are just not used to looking at their problems this way. Also, understanding optimization is an important prerequisite for having a good grasp of machine learning, which is revolutionizing the programming world.

DP is an efficient and, overall, great optimization approach, but it can't succeed if the problem doesn't have an optimal substructure. Combinatorial Optimization approaches deal with finding a near-optimum for the problems where an exhaustive search requires O(2^n) computations. Such problems are called NP-hard and a classic example of those is the Travelling Salesman (TSP). The task is to find an optimal order of edges in a cycle spanning all vertices of a fully-connected weighted graph. As we saw previously, this problem doesn't have an optimal substructure, i.e. an optimal partial solution isn't necessarily a part of the best overall one, and so taking the shortest edge doesn't allow the search procedure to narrow down the search space when looking at the next vertex. A direct naive approach to TSP will enumerate all the possible variants and select the one with a minimal cost. However, the number of variants is n!, so this approach becomes intractable very fast. A toy example of visiting all the capitals of the 50 US states has 10^64 variants. This is where quantum computers promise to overturn the situation, but while we're waiting for them to mature, the only feasible approach is developing approximation methods that will get us a good enough solution in polynomial (ideally, linear) time. TSP may look like a purely theoretical problem, but it has some real-world applications. Besides vehicle routing, automated drilling and soldering in electronics is another example. Yet, even more important is that there are many other combinatorial optimization problems, but, in essence, the approaches to solving one of them apply to all the rest. I.e., like with shortest path, coming up with an efficient solution to TSP allows to efficiently solve a very broad range of problems over a variety of domains.

So, let's write down the code for the basic TSP solution. As usual, we have to select the appropriate graph representation. From one point of view, we're dealing with a fully-connected graph, so every representation will work and a matrix one will be the most convenient. However, storing an n^2-sized array is not the best option, especially for a large n. A better "distributed" representation might be useful here. Yet, for the TSP graph, an even better approach would be to do the opposite of our usual optimization trick: trade computation for storage space. When the graph is fully-connected, usually, there exists some kind of an underlying metric space that contains all the vertices. The common example is the Euclidian space, in which each vertex has a coordinate (for example, the latitude and longitude). Anyway, whichever way to represent the vertex position is used, the critical requirement is the existence of the metric that may be calculated at any time (and fast). Under such conditions, we don't have to store the edges at all. So, our graph will be just a list of vertices.

Let's use the example with the US state capitals. Each vertex will be representated as a pair of floats (lat & lon). We can retireve the raw data from the Wikipedia article about the US capitols (with an 'o') and extract the values we need with the following code snippet[1], which cuts a few corners:

(defstruct city
  name lat lon)

(defparameter *wp-link* "https://en.wikipedia.org/w/index.php?title=List_of_state_and_territorial_capitols_in_the_United_States&action=edit&section=1")

(defparameter *cs*
  (with ((raw (drakma:http-request *wp-link*))
         (coords-regex (ppcre:create-scanner "\\{\\{coord\\|(\\d+)\\|(\\d+)\\|([.\\d]+)\\|.\\|(\\d+)\\|(\\d+)\\|([.\\d]+)\\|.\\|type"))
         (capitals (list)))
    (flet ((dms->rad (vec off)
             (* (/ pi 180)
                (+     (? vec (+ off 0))
                    (/ (? vec (+ off 1)) 60)
                    (/ (? vec (+ off 2)) 3600)))))
      (dolist (line (split #\Newline (slice raw
                                            (search "{| class=\"wikitable sortable\"" raw)
                                            (search "</textarea><div class='editOptions'>" raw))))
        (when-it (and (starts-with "|" line)
                      (search "{{coord" line))
          (with ((_ coords (ppcre:scan-to-strings coords-regex line))
                 (coords (map* 'read-from-string coords)))
            (push (make-city :name (slice line (position-if 'alpha-char-p line)
                                          (position-if (lambda (ch) (member ch '(#\] #\|)))
                                                       line :start 1))
                           :lat (dms->rad coords 0)
                           :lon (dms->rad coords 3))
    (coerce capitals 'vector)))

CL-USER> (length *cs*)

We also need to define the metric. The calculation of distances on Earth, though, is not so straightforward as on a plain. Usually, as a first approximation, the haversine formula is used that provides the estimate of the shortest distance over the surface "as-the-crow-flies" (ignoring the relief).

(defun earth-dist (c1 c2)
  (with ((lat1 (? c1 'lat))
         (lat2 (? c2 'lat))
         (a (+ (expt (sin (/ (- lat2 lat1) 2))
               (* (cos lat1)
                  (cos lat2)
                  (expt (sin (/ (- (? c2 'lon) (? c1 'lon)) 2)) 
    (* 1.2742e7  ; Earth diameter
       (atan (sqrt a) (sqrt (- 1 a)))))) 

With the metric at our disposal, let's define the function that will calculate the length of the whole path and use it for a number of random paths (we'll use the RUTILS function shuffle to produce a random path).

(defun path-length (path)
  (let ((rez (earth-dist (? path 0) (? path -1))))
    (dotimes (i (1- (length path)))
      (:+ rez (earth-dist (? path i) (? path (1+ i)))))

CL-USER> (path-length *cs*)
CL-USER> (path-length (shuffle *cs*))
CL-USER> (path-length (shuffle *cs*))

We can see that an average path may have a length of around 10k kilometers. However, we don't know anything about the shortest or the longest one, and to find out reliably, we'll have to evaluate 50! paths... Yet, as we accept the sad fact that it is not possible to do with our current technology, it's not time to give up yet. Yes, we may not be able to find the absolute best path, but at least we can try to improve on the random one. Already, the three previous calculations had a variance of 5%. So, if we're lucky, maybe we could hit a better path purely by chance. Let's try a thousand paths using our usual argmin pattern:

(defun random-search (path n)
  (let ((min (path-length path))
        (arg path))
    (loop :repeat n :do
      (with ((path (shuffle path))
             (len (path-length path)))
        (when (< len min)
          (:= min len
              arg path))))
    (values arg

CL-USER> (:= *print-length* 2)
CL-USER> (random-search *cs* 1000)
(#S(CITY :NAME "Atlanta" :LAT 0.5890359059538811d0 ...)
 #S(CITY :NAME "Montpelier, Vermont" :LAT 0.772521512027179d0 ...) ...)

OK, we've got a sizable 20% improvement. What about 1,000,000 combinations?

CL-USER> (time (random-search *cs* 1000000))
Evaluation took:
  31.338 seconds of real time
(#S(CITY :NAME "Boise, Idaho" :LAT 0.7612723873453388d0 ...)
 #S(CITY :NAME "Helena, Montana" :LAT 0.813073800024579d0 ...) ...)

Cool, another 15%. Should we continue increasing the size of the sample? Maybe, after a day of computations, we could get the path length down by another 20-30%. And that's already a good gain. Surely, we could also parallelize the algorithm or use a supercomputer in order to analyze many more variants. But there should be something smarter than simple brute force, right?

More details about of the book may be found on its website.