Lisp, the Universe and Everything

2018-11-27

Structs vs Parametric Polymorphism

Recently, Tamas Papp wrote about one problem he had with Lisp in the context of scientific computing: that it's impossible to specialize methods on parametric types.
While you can tell a function that operates on arrays that these arrays have element type double-float, you cannot dispatch on this, as Common Lisp does not have parametric types.
I encountered the same issue while developing the CL-NLP Lisp toolkit for natural language processing. For instance, I needed to specialize methods on sentences, which may come in different flavors: as lists of tokens, vectors of tokens, lists of strings or some more elaborate data-structure with attached metadata. Here's an example code. There's a generic function to perform various tagпing jobs (POS, NER, SRL etc.) It takes two arguments: the first — as with all CL-NLP generic functions — is the tagger object that is used for algorithm selection, configuration, as well as for storing intermediate state when necessary. The second one is a sentence being tagged. Here are two of its possible methods:
(defmethod tag ((tagger ap-dict-postagger) (sent string)) ...)
(defmethod tag ((tagger ap-dict-postagger) (sent list)) ...)
The first processes a raw string, which assumes that we should invoke some pre-processing machinery that tokenizes it and then, basically, call the second method, which will perform the actual tagging of the resulting tokens. So, list here means list of tokens. But what if we already have the tokenization, but haven't created the token objects? I.e. a list of strings is supplied as the input to the tag method. The CLOS machinery doesn't have a way to distinguish, so we'll have to resort to using typecase inside the method, which is exactly what defmethod replaces as a transparent and extensible alternative. Well, in most other languages we'll stop here and just have to assert that nothing can be done and it should be accepted as is. After all, it's a local nuisance and not a game changer for our code (although Tamas refers to it as a game-changer for his). In Lisp, we can do better. Thinking about this problem, I see at least 3 solutions with a varying level of elegance and portability. Surely, they may seem slightly inferior to such capability being built directly into the language, but demanding to have everything built-in is unrealistic, to say the least. Instead, having a way to build something in ourselves is the only future-proof and robust alternative. And this is what Lisp is known for. The first approach was mentioned by Tamas himself:
You can of course branch on the array element types and maybe even paper over the whole mess with sufficient macrology (which is what LLA ended up doing), but this approach is not very extensible, as, eventually, you end up hardcoding a few special types for which your functions will be "fast", otherwise they have to fall back to a generic, boxed type. With multiple arguments, the number of combinations explodes very quickly.
Essentially, rely on typecase-ing but use macros to blend it into the code in the most non-intrusive way, minimizing boilerplate. This is a straightforward path, in Lisp, and it has its drawbacks for long-running projects that need to evolve over time. But it remains a no-brainer for custom one-offs. That's why, usually, few venture further to explore other alternatives. The other solution was mentioned in the Reddit discussion of the post:
Generics dispatching on class rather than type is an interesting topic. I've definitely sometimes wanted the latter so far in doing CL for non-scientific things. It is certainly doable to make another group of generics that do this using the MOP.
I.e. use the MOP to introduce type-based generic dispatch. I won't discuss it here but will say that similar things were tried in the past quite successfully. ContextL and Layered functions are some of the examples. Yet, the MOP path is rather heavy and has portability issues (as the MOP is not in the standard, although there is the closer-to-mop project that unifies most of the implementations). In my point of view, its best use is for serious and fundamental extension of the CL object system, not to solve a local problem that may occur in some contexts but is not so pervasive. Also, I'd say that the Lisp approach that doesn't mix objects and types (almost) is, conceptually, the right one as these two facilities solve a different set of problems. There's a third — much simpler, clear and portable solution that requires minimal boilerplate and, in my view, is best suited for such level of problems. To use struct-s. Structs are somewhat underappreciated in the Lisp world, not a lot of books and study materials give them enough attention. And that is understandable as there's not a lot to explain. But structs are handy for many problems as they are a hassle-free and efficient facility that provides some fundamental capabilities. In its basic form, the solution is obvious, although a bit heavy. We'll have to define the wrapper structs for each parametric type we'd like to dispatch upon. For example, list-of-strings and list-of-tokens. This looks a little stupid and it is, because what's the semantic value of a list of strings? That's why I'd go for sentence/string and sentence/token which is a clearer naming scheme. (Or, if we want to mimic Julia, sentence<string>).
(defstruct sent/str
  toks)
Now, from the method's signature, we will already see that we're dealing with sentences in the tagging process. And will be able to spot when some other tagging algorithm operates on the paragraphs instead of words: let's say, tagging parts of an email with such labels as greeting, signature, and content. Yes, this can also be conveyed via the name of the tagger, but, still, it's helpful. And it's also one of the hypothetical fail cases for a parametric type-based dispatch system: if we have two different kinds of lists of strings that need to be processed differently, we'd have to resort to similar workarounds in it as well. However, if we'd like to distinguish between lists of strings and vectors of strings, as well as more generic sequences of strings we'll have to resort to more elaborate names, like sent-vec/str, as a variant. It's worth noting though that, for the sake of producing efficient compiled code, only vectors of different types of numbers really make a difference. A list of strings or a list of tokens, in Lisp, uses the same accessors so optimization here is useless and type information may be used only for dispatch and, possibly, type checking. Actually, Lisp doesn't support type-checking of homogenous lists, so you can't say :type (list string), only :type list. (Wel, you can, actually uses (and satisfies (lambda (x) (every 'stringp x)), but what's the gain?) Yet, using structs adds more semantic dimensions to the code than just naming. They may store additional metadata and support simple inheritance, which will come handy when we'd like to track sentence positions in the text and so on.
(defstruct sent-vec/tok
  (toks nil :type (vector tok)))

(defstruct (corpus-sent-vec/tok (:include sent-vec/tok))
  file beg end)
And structs are efficient in terms of both space consumption and speed of slot access.
So, now we can do the following:
(defmethod tag ((tagger ap-dict-postagger) (sent sent/str)) ...)
(defmethod tag ((tagger ap-dict-postagger) (sent sent/tok)) ...)
(defmethod tag ((tagger ap-dict-postagger) (sent sent-vec/tok)) ...)
We'll also have to defstruct each parametric type we'd like to use. As a result, with this approach, we can have the following clean and efficient dispatch:
(defgeneric tag (tagger sent)
  (:method (tagger (sent string))
    (tag tagger (tokenize *word-splitter* sent))
  (:method (tagger (sent sent/str))
    (tag tagger (make-sent/tok :toks (map* ^(prog1 (make-tok 
                                                    :word %
                                                    :beg off 
                                                    :end (+ off (length %))) 
                                              (:+ off (1+ (length %)))
                                           @sent.toks)))
  (:method ((tagger pos-tagger) (sent sent/tok))
    (copy sent :toks (map* ^(copy % :pos (classify tagger
                                                   (extract-features tagger %))
                           @sent.toks))))

CL-USER> (tag *pos-tagger* "This is a test.")
#S(SENT/TOK :TOKS (<This/DT 0..4> <is/VBZ 5..7> <a/DT 8..9>
                   <test/NN 10..14> <./. 14..15>))
Some of the functions used here, ?, map*, copy, as well as @ and ^ reader macros, come from my RUTILS, which fills the missing pieces of the CL standard library. Also an advantage of structs is that they define a lot of things in the background: invoking type-checking for slots, a readable print-function, a constructor, a builtin copy-structure and more. In my view, this solution isn't any less easy-to-use than the static-typed one (Julia's). There's a little additional boilerplate (defstructs), which may be even considered to have a positive impact on the code's overall clarity. And yes, you have to write boilerplate in Lisp sometimes, although not so much of it. Here's a fun quote on the topic I saw on twitter some days ago:
Lisp is an embarrassingly concise language. If you’re writing a bunch of boilerplate in it, you need to read SICP & “Lisp: A Language for Stratified Design”.
P.S. There's one more thing I wanted to address from Tamas's post
Now I think that one of the main reasons for this is that while you can write scientific code in CL that will be (1) fast, (2) portable, and (3) convenient, you cannot do all of these at the same time.
I'd say that this choice (or rather a need to prioritize one over the others) exists in every ecosystem. At least, looking at his Julia example, there's no word of portability (citing Tamas's own words about the language: "At this stage, code that was written half a year ago is very likely to be broken with the most recent release."), while convenience may be manifest well for his current use case, but what if we require to implement in the same system features that deal with other areas outside of numeric computing? I'm not so convinced. Or, speaking about Python, which is a goto language for scientific computing. In terms of performance, the only viable solution is to implement the critical parts in C (or Cython). Portable? No. Convenient — likewise. Well, as a user you get convenience, and speed, and portability (although, pretty limited). But at what cost? I'd argue that developing the Common Lisp scientific computing ecosystem to a similar quality would have required only 10% of the effort that went into building numpy and scipy...

2018-09-30

ANN: flight-recorder - a robust REPL logging facility

Interactivity is a principal requirement for a usable programming environment. Interactivity means that there should be a shell/console/REPL or other similar text-based command environment. And a principal requirement for such an environment is keeping history. And not just keeping it, but doing it robustly:

  • recording history from concurrently running sessions
  • keeping unlimited history
  • identifying the time of the record and its context

This allows to freely experiment and reproduce the results of successful experiments, as well as go back to an arbitrary point in time and take another direction of your work, as well as keeping DRY while performing common repetitive tasks in the REPL (e.g. initialization of an environment or context).

flight-recorder or frlog (when you need a distinct name) is a small tool that intends to support all these requirements. It grew out of a frustration with how history is kept in SLIME, so it was primarily built to support this environment, but it can be easily utilized for other shells that don't have good enough history facility. It is possible due to its reliance on the most common and accessible data-interchange facility: text-based HTTP.

frlog is a backend service that supports any client that is able to send an HTTP request.

The backend is a Common Lisp script that can be run in the following manner (probably, the best way to do it is inside screen):

sbcl --noprint --load hunch.lisp -- -port 7654 -script flight-recorder.lisp

If will print a bunch of messages that should end with the following line (modulo timestamp):

[2018-09-29 16:00:53 [INFO]] Started hunch acceptor at port: 7654.

The service appends each incoming request to the text file in markdown format: ~/.frlog.md.

The API is just a single endpoint - /frlog that accepts GET and POST requests. The parameters are:

  • text is the content (url-encoded, for sure) of the record that can, alternatively, be sent in the POST request's body (more robust)

Optional query parameters are:

  • title - used to specify that this is a new record: for console-based interactions, usually, there's a command and zero or more results - a command starts the record (and thus should be accompanied with the title: for SLIME interactions it's the current Lisp package and a serial number). An entitled record is added in the following manner:
    ### cl-user (10) 2018-09-29_15:49:17
    
        (uiop:pathname-directory-pathname )
    
    If there's no title, the text is added like this:
    ;;; 2018-09-29_15:49:29
    
        #<program-error @ #x100074bfd72>
    
  • tag - if provided it signals that the record should be made not to a standard .frlog.md file, but to .frlog-<tag>.md. This allows to easily log a specific group of interactions separately If the response code is 200 everything's fine.

Currently, 2 clients are available:

  • a SLIME client flight-recorder.el that monkey-patches a couple of basic SLIME functions (just load it from Emacs if you have SLIME initialized)
  • and a tiny Lisp client frlog.lisp

P.S. To sum up, more and more I've grown to appreciate simple (sometimes even primitive - the primitive the better :) tools. flight-recorder seems to me to be just like that: it was very easy to hack together, but it solves an important problem for me and, I guess, for many. And it's the modern "Unix way": small independent daemons, text-based formats and HTTP instead of pipes...

P.P.S. frlog uses another tiny tool of mine - hunch that I've already utilized in a number of projects but haven't described yet - it's a topic for another post. In short, it is a script to streamline running hunchentoot that does all the basic setup and reduces the developer's work to just defining the HTTP endpoints.

P.P.P.S. I know, the name is, probably, taken and it's a rather obvious one. But I think it just doesn't matter in this case... :)

2018-01-31

Minimal Perfect Hash-Tables in Common Lisp

Recently, my twitter pal @ifesdjeen wrote a line that resonated with me: "Looks like it's easier for people to read 40 blog posts than a single whitepaper." And although he used it in a negative context, I recognized it as a very precise (and, actually, positive) description of what a research engineer does: read a whitepaper (or a dozen, for what it's worth) and transform it into working code and - as a possible byproduct - into a blog post that other engineers will understand and be able to reproduce. I'm in the business of reproducing papers for about 7 years now, and, I believe, everyone with the same experience will confirm that it's not a skill every engineer should be easily capable of. Not all papers even can be reproduced (because the experiment was just not set up correctly), and of those, which, in principle, can be, only some are presented in the form that I can grasp well enough to be able to program them. And, after all, working code is an ultimate judge of your understanding.

But I digressed... :) This post is meant to explain (in simple "engineering" terms) the concept of minimal perfect hash-tables and how I've recently implemented one of their varieties to improve the memory efficiency of my language identification library WILD. Uncovering, in the process, a more abstract concept behind it.

The Justification of Perfect Hash-Tables

Minimal perfect hash-tables are persistent data structures that solve 2 basic deficiencies of normal hash-tables: they guarantee both constant (not only amortized) O(1) time for collision-free key-based access, while no reserved space is required (which for hash-tables may be in the range of 20%-50% of its size). It comes at a cost of two restrictions: the table should be filled with a keyset known ahead-of-time, and the process of building one takes longer than for a usual hash-table (although, for many methods, it can still be bounded by amortized O(1) time).

From my point of view, the main advantage of perfect HTs is the possibility to substantially reduce memory usage, which is important in such use cases as storing big dictionaries (relevant to many NLP tasks). Moreover, the space requirements can be lowered even more if the whole keyset is stored in the table, i.e. there can be no misses. Under such constraints, unlike with normal hash-tables, which still require storing the keys alongside the values due to the need to compare them in cases of hash collisions, in a perfect HT they can be just omitted. Unfortunately, such situation is rarely the case, but if some level of false positives is permitted, with the help of additional simple programmatic tricks, the memory usage for keys can be also reduced by orders of magnitude.

Hash-tables on their own are the trickiest among the basic data structures. But, in terms of complexity, they pale in comparison to minimal perfect hash-tables, which belong to the advanced algorithms family. One reason for that is that perfect hash-tables require more "moving parts", but the main one is, probably, that there's no common well-known algorithm to distribute keys in a perfect manner. And reading many perfect hash-table papers will scare away most programmers, at least it did me. However, after some research and trial-and-error, I've managed to find the one that presents a simple and scalable approach. So, I'm going to explain it in this post.

Hash-tables in SBCL

If you take a look at some particular hash-table implementation, your mental model of what a hash-table is may be quite seriously shaken. A straightforward open addressing table will assume a vector of key-value pairs as an underlying concrete structure, filled as the table is modified. On a 64-bit system, it will require: (8 × 1.5 + 8 × 16) × entries-count + total size of all keys + total size of all values + constant size of metadata. The 8 × 1.5 bytes in the formula is storage needed to hold a pointer to a hash-table entry including an average 33% extra overhead, the additional 8 × 16 bytes per entry will be spent on a cons cell (if we use this efficient although antiquated way to represent pairs in Lisp). It should be noted that depending on the size of keys and values the first term may be either a significant (up to half of the total size) or a negligible part of the table's memory profile.

However, this is not how SBCL implements hash-tables. See for yourself. Let's create a random hash-table with approximately 1000 keys:


> (defparameter *ht*
    (pairs->ht (loop :repeat 1000
                     :collect (pair (fmt "~{~C~}"
                                         (loop :repeat (random 10)
                                               :collect (code-char (+ 65 (random 50)))))
                                    (random 1000)))
               :test 'equal))

And inspect its contents:


> (inspect *ht*)

The object is a STRUCTURE-OBJECT of type HASH-TABLE.
0. TEST: EQUAL
1. TEST-FUN: #
2. HASH-FUN: #
3. REHASH-SIZE: 1.5
4. REHASH-THRESHOLD: 1.0
5. REHASH-TRIGGER: 1024
6. NUMBER-ENTRIES: 844
7. TABLE: #(#{EQUAL
              "plCjU" 985
              "VVYZqKm[" 861
              "N\\" 870
              "fqfBdZP\\" 364
              "cHNjIM]Y" 804
              "p_oHUWc^" 630
              "CqGRaMH" 408
              "NecO" 636
              "QDBq" 380
              "M" 838
              ...
             }
            0 "plCjU" 985 "VVYZqKm[" 861 "N\\" 870 "fqfBdZP\\" 364 ...)
8. NEXT-WEAK-HASH-TABLE: NIL
9. %WEAKNESS: 0
10. NEXT-FREE-KV: 845
11. CACHE: 1688
12. INDEX-VECTOR: #(0 617 332 393 35 0 194 512 0 420 ...)
13. NEXT-VECTOR: #(0 72 0 0 253 499 211 0 349 488 ...)
14. HASH-VECTOR: #(9223372036854775808 1830284672361826498 3086478891113066655
                   24243962159539 2602570438820086662 2431530612434713043
                   4568274338569089845 3085527599069570347 2374133389538826432
                   3322613783177911862 ...)
15. LOCK: #
16. SYNCHRONIZED-P: NIL

As the fog of initial surprise clears, we can see that instead of a single vector it uses 4! The keys and values are placed in the one called TABLE (that starts with a reference to the hash-table object itself). Note that storing the entries as pairs is redundant both in terms of memory and access (additional dereferencing). That's why an obvious optimization is to put them directly in the backing array: so the keys and values here are interleaved. One more indirection that may be removed, at least in Lisp, is "unboxing" the numbers, i.e. storing them immediately in the array avoiding the extra pointer. This may be an additional specialized optimization for number-keyed hash-tables (to which we'll return later), but it is hardly possible with the interleaved scheme.

Perhaps, the biggest surprise is that the entries of the TABLE array are stored sequentially, i.e. there are no gaps. If they were to be added randomly, we'd expect the uniform distribution of 845×2 unique keys and corresponding values over the 2048-element array. Instead, this randomness is transferred to the 1024-element integer INDEX-VECTOR: another level of indirection. Why? For the same reason yet another 1024-element array is used — a HASH-VECTOR, which stores the values of hashes for all the table keys: efficient resizing. Although, it may seem that the biggest overhead of a hash-table is incurred due to reserved space for new keys, in fact, as we have now learned, resizing is a much heavier factor. Especially this relates to the HASH-VECTOR: if the hashes of all keys would not have been stored, each time the table got resized they would have had to be recalculated anew: an intolerable slowdown for larger tables. Having a separate INDEX-VECTOR is not so critical, but it provides two additional nice capabilities. Resizing the table is performed by adjusting the vectors' lengths (an optimized operation) without the need to redistribute the entries. Besides, unlike in theory, we can iterate this table in a deterministic order: the one of keys addition into the table. Which comes quite handy, although SBCL developers don't want to advertise this as a public API as it may restrict potential future modifications to the data structure. There's also a NEXT-VECTOR that is used to optimize insertion.

Overall, we can see that a production-optimized hash-table turns out to be much heavier than a textbook variant. And, indeed, these optimizations are relevant, as SBCL's hash-table are quite efficient. In my experiments several years ago, they turned out to be 2-3 times faster than, for instance, the CCL ones. Yet, it's clear that the speed optimizations, as usual, come at a cost of storage deoptimization. To restore storage sanity, we could fall back to the basic hash-table variant, and it's a possible solution for some cases, although a mediocre one, in general. Neither will it be the fastest, nor fully optimized.

Building an MPHT

Most of space in the SBCL's hash-table implementation is spent on metadata and keys, not on values. Yet, if we introduce a restriction that the table cannot be altered — no new values added after initial construction and no resizing possible — most of those optimizations and connected storage become redundant. An ideally space-efficient data structure is an array, but it doesn't allow key-based access. In theory, minimal perfect hash-tables are arrays with a minimal amount of metadata overhead. Yet, the precise amount is dependent on the algorithm, and there's still new research improving them going on. Overviewing all the existing approaches (most of which I don't fully understand) is beyond the scope of this post.

Besides, MPHTs require additional calculations at access time compared to normal hash-tables. So, if a hash-table is well-optimized it will usually be faster than and MPH.

And still, MPHs require a non-negligible amount of additional space for metadata: for the algorithm that will be discussed it's around 2 bytes per key. The best algorithms in the literature claim to reduce this amount more than 4 times to less than 4 bits per key. However, for now, I have picked the simpler approach, since this difference is not critical considering that every value in the table, on a 64-bit machine, occupies at least 16 bytes (a pointer plus the data), so an overhead of 2 bytes versus 0.5 bytes (that will probably be rounded to 1 byte) is already negligible.

Now, let's think of how we can distribute the keys in a hash-table so that there are no collisions and the backing array has the same number of elements as the number of hash-table keys. Theoretically, as the set of keys is known before the creation of the table, we can find a hash function that produces such distribution. Unfortunately, due to the Birthday paradox, it may be a long search. The algorithms for MPHTs suggest ways of structuring it. A good algorithm should have at most O(n) complexity as, otherwise, it will be infeasible for large keysets, which are the main use case for perfect hash-tables.

The algorithm I will briefly describe now was suggested by Botelho and Ziviani. It is a 2-step process:

  • at first stage, using a normal hash-function (in particular, Jenkins hash), all keys are nearly uniformly distributed into buckets, so that the number of keys in each bucket doesn't exceed 256. This can be done by setting the hash divisor to (ceiling (length keyset) 200 #| or slightly less |#);
  • next, for each bucket, a perfect hash function is constructed via a simple algorithm: for each key, two hash codes are calculated by one call to the Jenkins hash (this function outputs 3 hashes at once), which are treated as vertices of a graph. If the graph happens to be non-circular (which can be ensured with high probability with a suitable hash divisor), it is possible to construct the desired function as a sum of 2 hashes. Otherwise, we change the Jenkins hash seed and try constructing a new graph until a non-circular one is obtained. In practice, this requires just a couple of tries;
  • the construction of the final hash function is described very clearly by Czech, Havas and Majewski who have proposed this method: it lies in performing a depth-first search on the graph, labelling the edges with unique numbers and deducing the corresponding number for each possible Jenkins hash value.

Here you can see one of the possible labelings (each edge corresponds to a key and its unique index, each vertex — to the value for each of the possible Jenkins hashes):

Now, the per-bucket hash-function can be reconstructed from an array of numbers (in the range 0-255) associated with each possible hash. The divisor of the hash is twice the number of keys in the bucket, though it can be any number above the number of keys: the greater the divisor, the more space is used for storing the numbers (the divisor is the length of an array) and the less time it takes to find an acyclic graph.

The algorithm works quite fast: on my laptop, it takes 8 seconds for the table of 725 359 character trigrams from a mid-size language identification model and 13 seconds for 1 236 452 words from the same model.

To sum up, this is how to find the index of an element (bytes argument) in our perfect hash-table:


(defun csindex (bytes cstab)
  (with ((mod (/ (1- (length @cstab.meta)) 2))  ; divisor of the top-level hash
         (hash (* (mod (jenkins-hash bytes) mod) 2))  ; determine the bucket
         (off (? @cstab.meta hash))
         (seed (? @cstab.meta (1+ hash)))  ; each bucket has its own Jenkins seed
         (mod2 (- (? @cstab.meta (+ 2 hash)) off))
         (b c (jenkins-hash2 bytes seed (* mod2 2)))
         (goff (* 2 off)))
    ;; bucket offset + in-bucket hash
    (+ off (mod (+ (? gs (+ goff b))
                   (? gs (+ goff c)))
                mod2))))

Note, that, in the code for this article, I refer to perfect hash tables as cstabs for the reasons described in the end.

Efficiency In Practice

So, let's now examine the memory efficiency of this method. Thankfully, just recently the SBCL developers started working on a critical for every algorithmic developer missing piece in the SBCL API: a function to determine the size in memory occupied by an arbitrary object. As we know from the famous koan, "LISP programmers know the value of everything and the cost of nothing". Indeed, from a certain point of view, this applies to SBCL. Although we, now, have a rough tool at our disposal that patches this hole... ;)

Using this unofficial function, we can roughly calculate the space occupied by the character trigrams hash-table mentioned above:


> (let ((ht (? wild:*lang-detector* '3gs)))
    (+ (object-size ht)
       (object-size @ht.table)
       (object-size @ht.index-vector)
       (object-size @ht.hash-vector)
       (reduce '+ (map 'list
                       (lambda (obj)
                         ;; the keys of the table are strings
                         ;; and values -- alists of a symbol and a float
                         (reduce '+ (map 'list ^(if (listp %)
                                                    (sum 'object-size %)
                                                    (object-size %))
                                         @ht.table)))))))
102856432

100 MB!


> (let ((ct (build-const-table (? wild:*lang-detector* '3gs))))
    (+ (object-size ct)
       (object-size @ct.gs)
       (object-size @ct.meta)
       (reduce '+ (map 'list ^(sum 'object-size %)
                       @ct.data))))
53372880

I.e., we have reduced the memory usage almost twice. Because all the metadata now occupies just 1479808 bytes or 1,5 MB.

One critical decision that allows for such drastic memory-use improvement is omitting the keys from the table. It should be noted that adding them back is trivial: (defstruct (keyed-pht (:include const-table)) (keys nil :type simple-vector) (test 'eql)), for which getting the value will work like this:


(defun csget (key cstab)
  (let ((idx (csindex (bytes key) cstab)))
    (when (call @cstab.test key (svref @cstab.keys idx))
      (values (svref @cstab.data idx)
              t)))

However, this will, basically, return us to the same ballpark as a textbook hash-table implementation in terms of memory usage while loosing in terms of speed. Yet, if we allow for some controlled level of false positives, there are a few algorithmic tricks that can be used to once again make keys almost negligible.

The first one is really simple and straightforward: replace the vector of keys with the vector of their hashes. In particular, if we take a single byte of the hash, such array will be 10-100 times smaller than the generic keys array and produce an FP-rate of 1/255.

Another trick will be to use a Bloom filter. For instance, a filter with 0.1 FP-rate for all the trigrams from our language identification model will require just around 0.4 MB of storage compared to 0.7 MB needed to store the 1-byte hashes and 30 MB needed to store just the keys. The disadvantage of a Bloom filter, however, is slower processing time: for the mentioned 10% FP-rate we'll need to perform 4 hash calculations, and if we'd like to reach the same 0.3% rate as the 1-byte hash array we'd have to increase the memory usage to 1MB and perform 8 hash calculations.

Finally, it should be noted that the main motivator for this work was reducing the memory footprint of my language identification library, for, to be widely adopted, such kind of project should be very frugal in its memory usage: 10-30 MB is its maximum polite footprint. By switching to a perfect hash-table, we haven't reached that goal (at least, for this particular model), but there's also plenty of room for optimizing values memory usage that I will return to later.

Const-tables

The other motivator for this work, in general, was my interest in the topic of efficient "static" data structures. In this context, I feel that the notion of a perfect hash table doesn't fully capture the essential features of the data structures class we're dealing with. First of all, the main distinguishing factor is static vs dynamic usage. A hash-table is thought of as a dynamic entity while our data structure is primarily a static one. Next, hashing is, for sure, involved in constructing all these structures, but it's only part of a bigger picture. The way of structuring the metadata, in particular, the index arrays may differ greatly not only between perfect and usual hash-table, but also within the ordinary hash-table class — within the different implementations.

So, I decided to come up with a different name for this data structure — a const-table (or cstab). It defines a broad class of persistent data structures that allow constant-time key-based access and, ideally, efficiently store the keys. The implementation, described here, is released as the library with the same name. It is still in its infancy, so major changes are possible — and suggestions on its improvement are also welcome.

2017-04-17

Pretty-Printing Trees

  (or The Ugliest Code I've Ever Written)

In the last couple of days, I was ill and had to stay in bed, so I've used this time also to tidy up the work that accumulated over the past year in cl-nlp. That was especially timely, considering the interest that was expressed in using it by some people who I've met at the recent Lisp-related events.

I've even assembled a rough checklist of the things that need to be finished to get it to v.1.0 and beyond.

Besides, after finishing the basic cleaning, I've returned to one of the programming tasks that has racked my head for long: tree pretty-printing. In NLP, we constantly have to deal with various versions of parse trees, like the constituency or dependency ones, but the problem is that they are not easily visualized. And good visualization plays, at least for me, a critical role in effective debugging, ideation and programming. It's an essential part of a solid interactive experience that is one of the fundamental traits of Lisp development.

For instance, a constituency tree is usually presented as a Lisp list. Here's an infamous example from the Penn Treebank:

  (S 
    (NP-SBJ 
      (NP (NNP Pierre) (NNP Vinken) )
      (, ,) 
      (ADJP 
        (NP (CD 61) (NNS years) )
        (JJ old) )
      (, ,) )
    (VP (MD will) 
      (VP (VB join) 
        (NP (DT the) (NN board) )
        (PP-CLR (IN as) 
          (NP (DT a) (JJ nonexecutive) (NN director) ))
        (NP-TMP (NNP Nov.) (CD 29) )))
    (. .) )

A dependency tree has several representations, all of which are not really intuitive to grasp. This is the Stanford format:

 amod(ideas-2, Colorless-0)
 amod(ideas-2, green-1)
 nsubj(sleep-3, ideas-2)
 root(sleep-3, sleep-3)
 advmod(sleep-3, furiously-4)
 punct(sleep-3, .-5)

And here's the CoNLL one:

0 Colorless _ _ ADJ 2
1 green _ _ ADJ 2
2 ideas _ _ NOUN 3
3 sleep _ _ NOUN 3
4 furiously _ _ ADV 3
5 . _ _ PUNCT 3

Also, Google's Parsey McParseface offers another - presumably, more visual - representation (using the asciitree lib). Still, it is not good enough, as it messes with the order of words in a sentence.

Input: Bob brought the pizza to Alice .
Parse:
brought VBD ROOT
 +-- Bob NNP nsubj
 +-- pizza NN dobj
 |   +-- the DT det
 +-- to IN prep
 |   +-- Alice NNP pobj
 +-- . . punct

As you see, dependency trees are not trivial to visualize (or pretty-print) in ASCII. The authors of Spacy creatively approached solving this problem by using CSS in their displaCy tool:

However, it seems like an overkill to bring a browser to with you for such a small task. And it's also not very scalable:

I, in fact, was always interested in creative ways of text-based visualization. So, I thought of ways to represent parse trees in ASCII.

With constituency ones, it's rather trivial:

> (pprint-tree '(TOP (S (NP (NN ))
                        (VP (VBZ )
                            (NP (DT )
                                (JJ )
                                (NN )))
                        (|.| <.:5 22..23>)))
           TOP            
            :             
            S             
  .-----------:---------. 
  :          VP         : 
  :   .---------.       : 
 NP   :        NP       : 
  :   :   .----:-----.  : 
 NN  VBZ DT   JJ    NN  . 
  :   :   :    :     :  : 
This  is  a simple test . 

The dependencies are trickier, but I managed to find a way to show them without compromising the sentence word order:

> (pprint-deps '(<This:0 0..4> <is:1 5..7> <a:2 8..9> <simple:3 10..16> <test:4 17..21> <.:5 22..23>)
               '(root(_ROOT_-0, is-1) nsubj(is-1, This-0) dobj(is-1, test-4) det(test-4, a-2) amod(test-4, simple-3) punct(is-1, .-5)))
Colorless green     ideas      sleep     furiously . 
    ^       ^        .^         .^.          ^     ^
    :       `. amod .´:         :::          :     :
    `..... amod .....´:         :::          :     :
                      `. nsubj .´::          :     :
                                 :`. advmod .´     :
                                 :`.... punct .....´
                               root

And it looks pretty neat even for longer sentences:

We        hold these   truths to       be  self -       evident , that all      men are        created     equal , that they are        endowed      by  their    Creator    with certain unalienable Rights , that among     these are      Life         , Liberty   and the    pursuit     of     Happiness . 
 ^         .^.   ^       .^    ^       .^.   ^  ^         .^    ^   ^   ^       .^   ^           .^.         ^   ^   ^    ^   ^           .^.         ^.   ^        .^.        ^.    ^         ^        .^   ^   ^    ^.        ^   .^.        ^.         ^    ^.      ^   ^       .^.        ^.        ^     ^
 `. nsubj .´::   `. det .´:    `. aux .´::   :  `. punct .´:    :   :   `. det .´:   `. auxpass .´::         :   :   :    :   `. auxpass .´::         ::   `. poss .´::        ::    :         `. amod .´:   :   :    :`. pobj .´   :::        :`. punct .´    :`. cc .´   `. det .´::        :`. pobj .´     :
            :`... dobj ...´             ::   `. npadvmod .´:    :   :            :               ::`. advcl .´   :   :    :               :::         ::             ::        ::    `...... amod ......´:   :   :    :             :::        ::              ::                   :`. prep .´               :
            ::                          :`..... acomp .....´    :   :            `.. nsubjpass ..´::             :   :    :               :::         ::             ::        :`......... pobj .........´   :   :    :             :::        ::              :`...... conj .......´                         :
            :`......... advcl ..........´                       :   :                            ::`... punct ...´   :    :               :::         ::             :`. prep .´                             :   :    :             :::        :`.... conj ....´                                              :
            :`..................... punct ......................´   `........... mark ...........´::                 :    :               :::         :`... pobj ....´                                       :   :    :             ::`. attr .´                                                              :
            ::                                                                                    ::                 :    :               ::`. agent .´                                                      :   :    `... prep ....´:                                                                        :
            ::                                                                                    ::                 :    `.. nsubjpass ..´::                                                                :   `...... mark ......´:                                                                        :
            ::                                                                                    ::                 `....... mark .......´::                                                                :                       :                                                                        :
            ::                                                                                    ::                                       :`............................ punct .............................´                       :                                                                        :
            ::                                                                                    ::                                       :`........................................ advcl .........................................´                                                                        :
            ::                                                                                    :`................ advcl ................´                                                                                                                                                                  :
            :`...................................... ccomp .......................................´                                                                                                                                                                                                           :
            :`............................................................................................................................................ punct .............................................................................................................................................´
          root

However, writing the visualization code was one of the most intimidating programming tasks I've ever encountered. One explanation is that trees are most naturally processed in depth-first order top-down, while the visualization requires bottom-up BFS approach. The other may be that pixel-perfect (or, in this case, character-perfect display is always tedious). As far as I'm concerned, this is not a sufficient explanation, but I couldn't find any other. The ugliest part of this machinery is deps->levels function that prints the dependency relations in a layered fashion. The problem is to properly calculate minimal space necessary to accommodate both tokens and dependency labels and to account for different cases when the token has outgoing dependency arcs or doesn't. In theory sounds pretty easy, but in practice, it turned out a nightmare.

And all of this assumes projective trees (non-intersecting arcs), as well as doesn't know how to show on one level two arcs going from one token in two directions. Finally, I still couldn't align the two trees (constituency and dependency) above and under the sentence. Here's the target:

           TOP            
            :             
            S             
  .----------------:--------------. 
  :               VP              : 
  :         .---------.           : 
 NP         :        NP           : 
  :         :   .----:---------.  : 
 NN        VBZ DT   JJ        NN  . 
This        is  a simple     test . 
  ^         .^. ^    ^        .^  ^
  `. nsubj .´:: :    `. amod .´:  :
             :: `.... det ....´:  :
             :`..... dobj .....´  :
             :`...... punct ......´
           root

and this is how it prints for now (one more challenge was to transfer additional offsets from dependencies into the constituency tree):

           TOP            
            :             
            S             
  .-----------:---------. 
  :          VP         : 
  :   .---------.       : 
 NP   :        NP       : 
  :   :   .----:-----.  : 
 NN  VBZ DT   JJ    NN  . 
This  is  a simple test . 
  ^         .^. ^    ^        .^  ^
  `. nsubj .´:: :    `. amod .´:  :
             :: `.... det ....´:  :
             :`..... dobj .....´  :
             :`...... punct ......´
           root

Well, the good news is that it is usable, but it still needs more work to be feature complete. I wonder what was I doing wrong: maybe, someone can come up with a clean and simple implementation of this functionality (in any language)? I consider it a great coding challenge, although it may require a week of your free time and a bunch of dead neurons to accomplish. But if you're willing to take it, I'd be glad to see the results... :D

2017-01-02

(m8n)ware Open for Business

Today, I want to announce (m8n)ware (the name is an i18n-abbreviation of "meditationware" with a mix of Lisp parens). This is a thing I always wanted to build. After parting ways with Grammarly almost a year ago, I had some time to rest and think about my next move. And this thought I couldn't let go so I figured: you can always go work somewhere, but you don't have a lot of stabs at realizing some of your own ideas. Maybe, two or three in a lifetime. I had tried this once already with fin-ack almost 8 years ago, and the concept behind it was, basically, the same — the implementation differed.

In theory

What is (m8n)ware? It is a company aimed at solving problems in the area of cognition-related computing, which will be built as a distributed network of mostly Lisp research-oriented engineers. This sounds rather complex, so let me try to explain a couple of points:

  • Cognition-related computing is the best term I came to after a long thinking about the area of CS that includes various tasks related to cognition, intelligence, knowledge, and associated logic. The common marketing buzzword is Artificial Intelligence, but it has a negative history and is quite misleading. All computer programs implement some form of "artificial" intelligent behavior. The defining feature of cognition-related computing is that it requires some transformation of raw data into structured computer-processable information and back, which is similar to human cognitive functions that arguably do the same transformation for our own internal processing.
  • The zest of the distributed network notion is that the primary focus of (m8n)ware is building not a localized corporate-like structure, in which people are bound primarily by legal contracts and payment obligations, but a loosely coupled group of like-minded people, who share the same values, interests, and approaches to technology. This organization will be seeking a perfect middle-ground between a corporation and an open-source community.
  • Research-oriented engineers is another "middle-ground" term that describes the main multidisciplinary role needed in this organization. We're not a scientific lab that is focused on fundamental research, neither are we an outsourcing shop that faithfully implements existing results according to a given spec. We're engineers in the sense that we deliver production-ready technology that may be useful to the end users in a straightforward manner. And, at the same time, we're researchers because we wield the methodology and are ready to experiment in new areas that don't have satisfactory state-of-the-art solutions.

I don't believe in the VC mantra of "build a startup, get rich, change the world." First of all, I don't believe in changing the material world (which implies a conviction that you know better). I believe in changing yourself. Also, getting rich and doing something good (to the world) are not the goals that are always aligned. Moreover, they are usually in conflict. I'm not a businessman in the sense that money is not my ultimate goal. But I like to see things grow and develop, things that are bigger than myself. Thus I'm interested not in market share but in mind share.

Considering all of the above, (m8n)ware is not going to be a product company in a traditional sense. It will be a technology company that will create and disseminate knowledge-based services and products. Also, it will not aim at rapid growth, but rather at sustainable development.

In the previous post, I've listed my motivations for moving in this particular direction and explained my values. I can't say that it got overwhelmingly positive feedback, but, in general, the results were better than I expected :) Now, I have several clues on how to cross the most challenging chasm of scaling its operation from a single-person endeavor to a productive and sustainable group. Meanwhile, I was testing if this approach may work in practice and doing market research. Now, I'm ready to go all in and devote at least the next year of my professional life to building this thing.

http://m8nware.com is oficially live. If you're interested in cooperation as a client, partner or co-worker, please, let me know...

In practice

There are several aspects that I'm betting on in (m8n)ware that are non-mainstream and may seem somewhat counter-intuitive. In this part, I want to provide more details about why and how, I think, this will work.

The first aspect is radical transparency. From this and the previous post, it should be clear that (m8n)ware originated and plans to continue functioning fully exposed to the outside world, not relying on any secret know-hows or clever tricks. I don't plan to conceal what we're going to do and why. Neither "fake it till you make it." Why this will work? First of all, my experience shows that in the current age of information overload, we're fighting primarily not for the purse but for the thoughts of our "customers" (in many possible markets: not only where you sell your product/services, but also in the labor market, and in the ecosystem of potential competitors, partners, and vendors). And this requires information sharing, not safekeeping and concealment. Secondly, in general, I'm not interested in competition — rather I'd like to find a unique niche in the market that will be served by the company in the best possible manner and will be big enough to sustain it. The good news is that the AI-market is, currently, growing very fast and this trend will last at least for a couple more years. So demand is greater than supply, and this means not a very harsh competitive environment. Another thing is Lisp: no one in their right mind will bet a company on Lisp, so I'm not really worried about the competition in the labor market. :) The final point about openness is that I personally endorse it, and as this is the company that aims to be as close to my ideal as possible it should endorse it.

Although it's not a classic product company, it's not going to be a typical outsourcing one either. Yes, initially, it will provide primarily consulting services, but the idea is that, with time, the share of these services will decrease in favor of supporting more general-purpose tools and technology developed in-house. And to ensure the constant priority of this goal, we'll be doing such work from day one. Currently, I see it in the following manner: the time of all engineers will be split in some proportion between for-pay consulting and developing open-source/research projects for free, and with time as some of these projects become important to the company, it will start paying the people who develop them for this work as well. This is a frugal approach, but I advocate it based on personal perspective: working at my previous gigs, I'd be eager to forfeit, say, 20% of my salary to be able to spend 20% of my time on open-source projects that matter to me personally. Actually, the percentage may be much bigger. Currently, I spend 50% of my time working on such projects and am quite happy with this. I deeply believe that such balance is more appealing to many programmers (especially, the kind of people I'd be willing to cooperate with) than a conventional approach.

Lisp again. From my experience working in cognitive problems domain, I can definitely say that it's not about coding. For several reasons. The obvious one is that 80% of resources are spent in other parts: thinking/learning, working with data, experiments, documentation. (The remaining 20% are still critical, especially since most of the solutions are resource-demanding and the code is algorithm-heavy). Then, current technology situation: the days of backend-only solutions are, unfortunately, gone. A lot of problems require heavy mobile or in-browser presence. And on the backend, thanks to microservices and other stuff, no one is developing in a single language and even on a single platform anymore. Finally, there's knowledge transfer. Programs may be not a bad medium to express concepts, but not the optimal one either: between scientific papers, blog posts, markdown documents, experiment notebooks, and production-optimized programs, there is no one-size-fits-all solution. All this creates conditions, in which the choice of a programming language becomes much less a constraint than it was just a a few years ago. On the other side, from the point of view of "internal" productivity (not concerned with integration into the bigger picture), Lisp has proven to be a great and rewarding environment very well suited for research work. Plus a great way to differentiate in the labor market... :)

Our value proposition

So, if you need to solve some cognitive computing problems, here's what (m8n)ware may offer.

  1. Provide small-scale consulting services: talk to your people and help them with their challenges, perform an audit, study feasibility of some serious project, help with gathering relevant data sets, etc.
  2. Develop a prototype solution for a particular problem, and deliver it as a set of data, documentation, and a working web-service to allow integration in your prototypes, testing in your environment, with your clients and data.
  3. Develop a turnkey solution and integrate it into your environment. This is rather tricky as we'll prefer to work in Lisp, and not every environment will be ready for this. We're also willing to compromise and develop some non-critical integration parts in other languages when necessary, provided that the core remains Lisp-based.

Why you should go to us instead of solving the problem on your own? The current situation in cognition-related computing is that such projects have high business value, but are not easy to complete: they require not just engineering, but a substantial/prevailing research component. Productive work in this area assumes a skill set of developers and managers that is different from conventional software development. Obviously, you want to develop this expertise in-house, but growing it, currently, is a slow and daunting process. Still, you should definitely do that for the long-term benefits, but this doesn't mean that you can say something in the lines of: in the next half a year I need to solve this complex AI problem, so I'll just hire a person/team in a couple of months and let them do it. It's a risky approach even in conventional software development, and in this field, it just doesn't work. The competition for AI researchers is insane and, moreover, if you're a regular company and not Google/Facebook or the latest-hottest startup your chances of hiring and retaining top talent are, basically, nil. (Why we'll be able to have the talented people while you won't? Because our particular focus — Cognitive+Tech+Distiributed+Lisp — will allow us to appeal to a portion of the talent pool that is not happy in mainstream environments).

Cognitive computing projects are risky and hard to predict. That's why for any serious long-term (longer than a couple of months) partnership we'll be dividing the work into reasonable chunks that will allow you to get at least part of the value at each checkpoint, see and assess progress, and pivot if your plans or conditions change.

We're open for business — write to info@m8nware.com if interested.

2016-12-31

Уроки курса «Алгоритмика»

28 декабря состоялся второй выпуск слушателей курса по алгоритмам в Projector. Пока что именно "слушателей", а не полноценных выпускников, потому что до 100% того, чем может и должен стать этот курс, еще очень далеко. Но чтобы достичь этих 100%, в том числе, нужно подвести некоторые промежуточные итоги. Первая попытка, прошедшая весной и в начале лета, официально позиционировалась как бета-версия, но и вторая, по факту, тоже оказалось пробной, т.к. курс был разделен на 2 уровня сложности и существенно переработан.

Что можно сказать точно — что курс нужен. Эти знания и навыки дают возможность программисту перейти из категории кодера в полноценного разработчика, способного решать задачи любой сложности. Именно такие люди нужны как гуглам и фейсбукам (и поэтому они уделяют такое внимание теме алгоритмов на собеседованиях), так и небольшим амбициозным продуктовым командам. Да и самому себе: это один из аспектов, котороый дает возможность программисту "расправить крылья" и полюбить свою работу, получив возможность сделать ее разнообразной и приносящей гораздо большую отдачу. И то количество людей, которые этим активно интересуются, показывает, что на рынке тоже есть такое понимание. Хотя эти знания относятся к базовым для университетского курса Компьютерных наук, далеко не у всех была возможность их освоить: кто-то пришел в профессию не классическим путем, кому-то не повезло с вузом и преподавателями, кто-то, просто, прогулял и теперь, набравшись ума-разумуа, хотел бы наверстать упущенное.

Другой вопрос, что разным категориям из этой группы людей подходят разные форматы обучения. Для кого-то это онлайн-обучение, для кого-то — классический университет, и только немногие, объективно, готовы к формату интенсивных трехмесячных курсов. Практика показывает, что где-то треть людей, записавшихся и начавших ходить на такие курсы, быстро осознает это и отпадает. В итоге, до конца двух первых курсов у меня доходило чуть меньше половины участников, причем далеко не все из них успешно проходили весь материал. Т.е. помимо тех, кто просто попал не туда и ушел сразу, есть еще примерно такое же количество, которые могли бы получить хороший результат, но этого не произошло. Почему так? В первую очередь, конечно, причина в недоработках с моей стороны.

Наконец, если в востребованности направления алгоритмов в целом у меня нет сомнений, то вот потребность именно в продвинутом курсе пока что не доказана. Да, первый набор был закрыт, но мы сделали ключевую ошибку: не отфильтровали его должным образом, т.к. надеялись, что те, кто в себе не уверен, пойдут на базовый курс. Это сработало лишь отчасти. Второй момент: люди ждут от продвинутых алгоритмов нечто большого, чем только алгоритмов — конкретно, они хотят машинного обучения. И это можно понять: все хотят машинного обучения :) Оно было заключительной частью программы последней итерации курса (и прошло, как по мне, на ура), но из-за этого пострадал весь остальной материал, который объективно не вмещался в формат. Поэтому машинного обучения больше не будет в рамках Алгоритмики Про. Впрочем, ее сомневаюсь, что очень скоро оно появится в виде отдельного курса, т.к. спрос на это направление сейчас зашкаливает.

В итоге, что же такое Алгоритмика Про и есть ли в достаточном количестве те, кому она нужна? Это сложный курс и сложный вопрос. Его основная идея — это погрузить практикующих программистов в проблематику решения реальных алгоритмических задач: работу с данными размером как минимумом в гигабайты, или реальными графами с миллионами узлов, разобраться в том, как функционируют современные базы данных, редакторы и системы контроля версий, изучить различные алгоритмы оптимизации и применить их для задач из окружающей действительности, залезть под капот сервера и уменьшить время ожидания запросов в его очереди. К сожалению, пока что я только нащупываю это направление. С одной стороны, не все практические проблемы, до которых можно дотянутся, лежали на поверхности (хотя сейчас по опыту двух курсов эта тема для меня довольно неплохо прояснилась). С другой — для этого не было достаточно готовности, вовлеченности и отдачи со стороны участников. Много ли у нас программистов, которые хотят прокачаться в алгоритмической разработке, находятся на должном уровне и готовы выделить для этого большой кусок времени в течение трех месяцев — вот это главный вопрос, ответа на который мы пока не знаем. Такие люди, одназначно, были на первых двух моих курсах. Но, к сожалению, их можно было сосчитать на пальцах одной руки, когда нужно было бы задействовать хотя бы четыре...

Основные вызовы курса для меня

Первый из них я, фактически, описал выше. С самого первого дня обсуждения мы говорили о том, что этот курс должен быть максимально практичным. И я честно старался добиться этого. Но практичность имеет разные аспекты: один аспект — это практическая работа, т.е., в данном контексте, банальное программирование. К сожалению, мне не удалось привлечь к этому каждого участника, хотя по моими прикидкам и прошлому опыту казалось, что это произойдет естественно. Второй аспект практичности — это обсуждение (и реализация) примеров из реального мира, где это все используется. Этому я также старался уделять должное внимание: как рассказывать кейсы на лекциях, так и давать подобные задания (хотя их было меньше, чем могло быть), так и в аспекте курсовой работы. К сожалению, эта часть, как по мне, основывается на первой, т.е. активном программировании, а без него она тоже сильно буксовала. Это проблема номер один, которую я намерен активно решать в рамках следующего курса.

Она также упирается в наличии удобной среды для такой работы у каждого участника курса, а также ее единства для всех участников, чтобы можно было где-то подталкивать продвижение вперед. Я начал первый курс с демонстрацией примеров кода на Лиспе. Это не всем нравилось, хотя все был об этом честно предупрежденны. В итоге, второй вариант был более абстрактным: описание алгоритмов на доске без привязки к тому или иному языку. Этот AB-тест показал, что так не работает: нужно иметь под рукой код, который можно пощупать и покрутить, и который можно подкинуть человек, который застрял, чтоб он мог двигаться дальше. Учитывая мою собственную привязку к Лиспу, а также то, что язык хорошо подходит для реализации алгоритмов, я планирую продолжать настаивать на его использовании. Почему не Python или что-то другое? Во-первых, многие языки не очень пригодны для изучения алгоритмов вообще: яркий пример — это JavaScript, который слишком не четок, не имеет полноценной поддержки арифметики и нужных структур данных, другая крайность — это статические языки, особенно низкоуровневые, которые, с одной стороны, дают много возможностей для оптимизации, но, с другой, вносят слишком много ограничений (в частности, более сложный процесс разработки) и избыточной сложности. Что до Питона, то он более-менее подходит, но я его, по-просту, не люблю, тем более, что курсов по алгоритмам на Питоне хватает. Что ж до конкретно Лиспа и его особенностей: я считаб, что это хороший фильтр, которых нам не хватало при наборе на предыдущие курсы. На самом деле, разобраться в Лиспе на базовом уровне, необходимом для этого курса, не сложно. И если у человека не хватает мотивации и доверия, чтобы это сделать, это многое говорит о его дальнейшей мотивации преодолевать трудности во время самого курса. А, как показала практика, цитируя одного из студентов курса, "Сам факт, що лекція коштує 550 грн, ще не має достатньо стимулючого ефекту," щоб виконати домашнє завдання :(

А зачем, вообще, платить?

Очевидный и резонный вопрос, на который должен ответить для себя каждый желающий пройти этот курс — это вопрос, стоит ли он того и зачем, вообще, платить? Ведь есть интернет, википедия и прекрасные онлайн-курсы, на которых можно изучить то же самое. И это, действительно, так! В отличие от онлайн-курсов, оффлайн-курсы не могут быть бесплатными, поскольку они должны окупать аренду помещения и другие расходы, достойную оплату преподаваталя и персонала, и давать какую-то прибыль организаторам. И к ним не применима фримиум-модель, которую используют Курсера и другие. Да и, вообще, за все в жизни нужно платить.

Но если взглянуть с практической стороны, ROI любого обучения — это отношение полученного результата к затратам денег и времени на его достижение. По-идее, оффлайн-курсы могут выигрывать за счет более высокого среднего результата и меньших затрат времени. Что может войти в этот лучший результат?

  • во-первых, как ни банально это звучит, "волшебный пендель", т.е. внешняя мотивация пробежать этот забег от начала и до конца. И вложенные деньги тоже являются частью этой мотивации, хотя, как показывает практика, не достаточной. В этих курсах пока не было соревновательного момента, который присущ классическому обучению, и это еще одно направление, над которым нужно немного поработать (наметки есть)
  • во-вторых, возможность личного общения с преподавателем и другими учениками. Для меня это, на самом деле, одна из главных мотиваций делать этот курс: возможность взаимодействия с программистами, которые ищут и хотят развиваться в профессии. Парадокс в том, что даже не смотря на то, что я получаю неплохие деньги за этот курс, я все равно зарабатываю больше за основную свою работу. Т.е. меньший заработок должен компенсироваться чем-то другим. Для меня это другое — это возможность со-творчества с участниками курса. А это значит, что мы должны быть на одной волне и идти, в первую очередь, иметь желание попасть на занятие и провести его полноценно. В идеале, завязавшиеся во время обучения связи должны быть одним из главных долгосрочных активов после окончания курса
  • комфортная среда обучения и общения, причастность к сообществу. Projector делает очень важное дело, создавая на основе своей площадки сообщества профессионалов в сфере дизайна, продуктовой разработки и программирования (а также, в будущем, я думаю и других областях)

Кому стоит и не стоит идти на курсы по алгоритмам в Projector

Для меня, на самом деле, это ключевой вопрос всей этой темы. Ни я, ни Projector не ставим себе цели массовости и сверхприбылей. Во-первых, это не устойчиво и закончится пшиком, во-вторых, никакого внутреннего удовлетворения от такой работы не получишь. Между группой из 8-10 мотивированных людей, которые знают, куда и зачем пришли, и 20 вольно интересующимися я выбираю первый вариант, хотя второй, на самом деле, проще. Первые две итерации курсов были поиском: поиском правильного формата и адекватной ему аудитории.

Мой вывод следующий: эти курсы подходят тем, кто

  • уже имеют некоторый опыт программирования (в идеале, хотя бы пару лет)
  • осознал для себя ценность алгоритмов, и не будет мучаться вечным вопросом украинского студента: "где же это все применяется в реальной жизни?" Ответ на него, с моей точки зрения: кто хочет, тот найдет. Спрос на алгоритмических программистов есть, и хотя он не более нишевый, но в нишах всегда больше и интерес, и доходы
  • готов (как психологически, так и организационно) 3 месяца стабильно уделять минимум 10 часов в неделю этим занятиям, а также, что еще более важно, уделять им основной ресурс своего мозга. Практически, это означает, что в это время не удастся полноценно интенсивно работать. Как показали эти 2 курса, самое лучший период для участия в этой авантюре — это либо перерыв между работами, либо последние курсы вуза. Те, кто пытаются одновременно интенсивно работать в разработке и учиться, либо забивают на курс, либо жалуются, что работа начинает страдать, либо берут отпуск, чтобы подтянуть хвосты. Также это может прокатить для тех, у кого работа сейчас не предполагает активное написание кода. Если же вы только что поменяли работу, как раз должны заканчивать важный проект, ожидаете рождения ребенка (да, и такие случаи уже бывали :) или же собираетесь уехать в середине в отпуск или коммандировку, то этот формат точно не для вас

2016-12-28

5 Steps to Grasping Modern ML

Recently, I've been teaching an advanced Algorithms course, which concluded in a short introduction to Machine Learning. Obviously, ML is its own track in Computer Science curriculum, but, nevertheless, there's a substantial overlap between these 2 disciplines: algorithms and ML. However, ML adds another dimension that is not usually considered in the world of algorithmic thinking.

Anyhow, this experience helped me formulate the minimal selection of concepts that need to be grasped in order to start practical ML work. An ML crash course so to say.

As I've never seen such compilation, I'd like to share it in this post. Here are the 5 essential steps to understanding

Step 1. Understanding the ML problem formulation. kNN algorithm

The first thing one needs to realize is the difference between an ML problem and the common programming problems. Here training/test data and an objective function should be explained alongside with the 3 common "learning" approaches: supervised, unsupervised, and reinforcement. A widely used and good initial examples is the Iris data set and kNN algorithm.

Step 2. Adding features and iterative training into the picture. Perceptron algorithm

The second step is introduction of the concept of feature extraction that allows approaching a problem from different angles. The Iris data set already has features, but initially they may be perceived as given. Iterative training is another common ML approach (although some popular algorithms like kNN or Decision Trees don't rely upon it). Perceptron is the simplest algorithm to explain (which still remains in practical use) and leads nicely to the next step.

A good example task and data set for this part is the Brown Corpus and the problem of POS tagging. And there's a great post outlining its soultion by Matthew Honnibal.

Step 3. Continuous vs discrete learning, gradient descent. Softmax algorithm

The obvious next step is transitioning from a discrete perceptron learning to continuous gradient descent used in Logistic regression. Andrew Ng provides a lucid connection in Part II & III of his tutorial on Linear Models. It also helps that Logistic regression and Softmax are the basic building blocks of Neural Networks that are to be discussed next. The example task for this problem may remain the same POS tagging, although others, like the ones used by Andrew, may be also utilized.

Step 4. Learning graphs (aka neural nets), backprop. Feed-forward Neural Network algorithm

As soon as we understand gradient descent and logistic regression, it's rather easy to make the next step to forming layers of such blocks to allow the combined model to "learn" higher-level feature representations. This is where the Backprop algorithm for efficient training comes into play (that is, by the way, another example of a dynamic programming algorithm). Also in this part, it's possible to talk about vector representations of words and other highly contextualized objects (landmark position in image, etc.) A great explanation of Backprop is presented in this post of Christopher Olah. Also, a good exaple data set here is the MNIST.

Step 5. Bias-variance tradeoff, regularization & ensembles. Random Forest algorithm

Finally, we should return to the beginning and revisit the learning problem, but with some practical experience already under our belt. This is where the essential bias-variance tradeoff and common ways to tackle it should be discussed: regularization and ensembles. It's also a good place to introduce the Decision Tree algorithms and the ensemble methods based upon them (Random Forest and, maybe, others) as one of the most widely-used current practical approach.

"Winning the Bias-Variance Tradeoff" by Juila Evans may be a good introductory text on this.

Overall, due to the highly condensed nature of such presentation, a lot of important things will be almost not covered. For example, unsupervised learning, CV with its convolutions, sequence models. However, I believe that with the obtained knowledge and conceptual understand of the mentioned basis those parts may be grasped quite easily.

If this plan turns out helpful to you or some essential improvements are necessary, please, leave your thoughts and comments...

2016-09-23

The Technology Company Case

What's a Technology Company?

I'm a programmer. Obviously, this means that I have to earn money and realize my talents working for some company that employs programmers (or on my own). It's worth noting that there are several kinds of such companies.

One is traditional enterprises, like banks or government agencies, that need programmers to automate their processes and improve output. Every company needs an accountant, and, likewise, nowadays every needs a programmer.

There are also companies that provide software development and related services - the so-called consulting or outsourcing firms. They employ programmers to automate the work and improve the output of, mainly, the first breed of companies.

Then, there are also technology product companies, like Instagram or Apple, that employ engineers to build their products, services or media, which are then consumed by ordinary people.

Finally, there are truly technology companies that produce new technology that is used by all the previous three groups, as well as by the technology companies themselves. From the business standpoint, this technology may be supplied either in the form of on-the-spot consulting work, licensing or even separate products.

Every group has some percentage of technology work in its operation. This work, often called R&D, comprises of implementation of existing technology (D in R&D) and creation of the new one (R). The share of the two differs substantially between the groups. The companies from the first one may be 1 to 10% dependent on R&D work and have almost 0% of R in it, the second group is 90% R&D work, still, with mere percents of R in it, the third group is just 30-50% R&D, and the share of R in it may rise to 10-20% but rarely more, and the last group should have 90% R&D with >50% R in it.

A technology company should be a thought leader in its sphere. This means not chasing fashions in our pop-culture-like industry but setting an example justified by technological excellence instead of marketing. This means building something that will last and have an impact for a substantially longer period of time than the ever-accelerating hype cycle. This means having an ultimate goal of solving hard technical problems and not chasing profits or market share. While product companies try to change the world by producing their innovative products that merely use technology, a technology company does that by producing technology that enables more innovative products. A closed vs an open approach.

10x Programmers

There's this popular meme of 10x programmers that constantly spurs discussion and flamewars among our peers. Is it just fad, who are those 10xers, do they really exist?

Let's first consider this question from the perspective of other crafts and professions. Are there 10x painters? Well, if we compare painter productivity by the number of pieces drawn it would be hard to tell. But if you think about price, clearly, there are even 1000x ones: an ordinary painter's work may cost $1000, and a famous masterpiece will be in the millions. If we consider the number of people reached the same rule applies: maybe, thousands will see quality works of a common professional painter, and millions or even billions - the works of a master. But you may say that painting, unlike programming, is an art. What about carpentry? Well, I'd compare with professions that require mostly intellectual work. Are there 100x doctors? Surely, there are those who saved 100x more people by inventing a new operation method or treatment. Lawyers? A person who writes a law impacts orders of magnitude more than an ordinary counselor at some random firm. This list may be continued on and on.

I've compiled a book called "Interviews with 100x programmers". To some extent, the name was an exaggeration. But, as they say, every joke has some truth in it. In fact, I fully subscribe to the 10x programmer concept. Moreover, I consider that there are not only 10x ones but also 100x, 1000x... Definitely, there are hardly any 10x coders, i.e. people who produce 10x the amount of code a good professional programmer will create in the same timeframe. But there's much more to programming than merely writing program code.

To be an order of magnitude more productive means to solve problems an order of magnitude more complex than the ones considered accessible at a given point in time. Obviously, such problems exist, and there will, probably, always be an unlimited supply of them. Also, it should be clear from the short history of computing that there are some people capable of bringing a new perspective, coming up with approaches that allow solving such problems either in a much better way or just solve them at all. As Alan Kay, who's for sure one of such 100x programmers, has famously said: "A change in perspective is worth 80 IQ points."

Still, there's more to it than just solving harder problems. Another popular explanation given to the 10x thing is that such a programmer is the one who makes 10 other programmers 2x more productive. This, from my point of view, implies the one who is showing a better approach, in other words, a thought leader, and the one who implements this vision in some technology that other programmers use. In fact, we're productive in our work at our current level mostly thanks to such prolific programmers: every day I use Unix, Emacs, Lisp, git and other tools that were initially conceived and built by a handful of the 10x programmers. Their vision and impulse made thousands and even millions more productive.

Those 10x programmers are the ones I'd like to be around at work. And so, my ideal company is the one that attracts such persons. And although a significant percent of such people are loners, most of them are also highly motivated by the presence of similar colleagues.

So which one of the 4 company types mentioned above will such people choose?

The first one is mostly out of consideration because in it the programmers are not the primary value creators - on the contrary, often they are considered a cost center. I.e. they are just another service function similar to an accountant or a janitor. Surely, there are exceptions to this rule when the company leaders realize the potential that technology change bears to their company, which, basically, means that the firm is transitioning to type 3. Even in such case, it's still a much less productive environment than a type 3 firm built with the right principles in mind from the start.

What about outsourcing companies? Their advantage is that programmers are their primary asset, which means that the company will be built around them, have a substantial number of them and will do a lot to attract and hold prominent people. The nature of work, unfortunately, is usually a severely limiting factor here. First of all, in most of the cases, the customer doesn't really care about the technological excellence or innovative nature of the result. The projects are in most of the cases counter-innovative, i.e. the more mundane, reproducible, and ordinary the technological solution that achieves the desired result is the better. And it's quite reasonable from the business standpoint: innovation is risky. This means that, ultimately, such companies reward uniformity and interchangeability of their stuff and their output, especially, since it's much easier to manage and scale. Have I mentioned that managing programmers is very hard (the common metaphor used is "herding cats")?

Now, let's look at product companies. Are they a heaven for 10x programmers? Well, a lot of such people flock there. One reason is that such companies understand the need for talented programmers because unlike the previous 2 types they may and should face unique technological challenges, and, moreover, their leadership is able to recognize that (type 1 companies also face those challenges, but usually they just don't view them from the technology standpoint). Yet, a product company is only X% new technology and another (100-X)% other things. What is the value of X? Maybe, it's 20-30% at Google or Facebook, and even less at smaller companies with fewer resources. Why? Because, as we discussed above, the ultimate goal of most of such companies is making money by serving masses of customers. This requires huge marketing, sales, operations, and support "vehicles" that employ professionals to operate and programmers to build, maintain and develop. But have quite little interesting technical challenges. Once again, this is the right thing from the business standpoint, especially if you have to earn more and more money each year and grow your market share. But focus on earnings and market share means that technological excellence becomes secondary. Surely, the best of the leaders and managers realize its importance, but they have to make many trade-offs all the time.

That's why I have singled out "pure" technology companies. Such organizations are naturally inclined to make tech excellency their focus. There are, surely, counterexamples that are infected with the Silicon Valley "growth virus" and try to win the market as fast as possible with marketing, but it doesn't mean that it always has to work that way. In my opinion, purely technological companies are the best place for 10x programmers because they will not merely utilize their work to some other end goal but have vested interest in amplifying its influence. They are not inclined to conceal the know-hows and innovations as trade secrets, but will benefit from sharing and promoting them. They may also provide maximum freedom of choice: of approaches, tools, supporting technologies, because their primary concern is not effective scaling of the same ultimately repetitive work to many similar programmers but creating breakthroughs. Their dependence on such ultra-productive programmers is existential.

I don't consider myself to be a 10x programmer, but, surely, I'd like to reach such level someday and I also aspire to work alongside them.

A Company I'd Build

All in all, being part of a technology company seems like the best choice for me both in terms of potential impact and possibilities to have 10x programmer colleagues. Eventually, either you have to join one or create one yourself. For the last 5 years, I've been working in the so-called AI, and my experience both from product company side and individual consultant work shows that demand for research-related technology expertise here is growing much faster than the supply. I see it as a chance for new technology companies to emerge and gather those few capable people in this field to amplify their impact. So I'm seriously considering starting a technology company, and I'm looking for like-minded people who share my values and vision to join our forces.

If I were to start such company, I'd build its foundation on a few things that really matter to me personally. Some principles or, as they used to call them, values. Unfortunately, the notion of "values" has somewhat lost its original meaning in the corporate world. When you see such qualities as effectiveness or adaptability cast as values that's a sign of such misconception. Values are something that you don't compromise upon at all. Surely, it's pointless to compromise any parts of your professionalism (such as effectiveness), so professionalism is a default value not even worth discussing. Real "values", however, are those aspects of your work culture that run a real risk of conflicting with the things that are considered universally important. In business, those are profits, market share, favorable competitive position. So, being true to your values means not forfeiting them even if you're going to lose in those basic areas.

Here is a list of the values that I subscribe to:

  • Technological excellence should be a basic trait of any technology company. For me, an example of applying such value would be using Lisp as a starting point for most of the solutions despite the fact that the language is quite unpopular and underappreciated - my personal experience shows that it works very well, especially in the fields that are heavily knowledge-based. Another example is that in a technology company literally everyone should be technology-savvy: even the office manager should be programming at times.
  • Personalism is the main quality that a company has to support in its dealings with all the people it's interacting with: employees, customers, contractors and providers. This means, for example, striving to provide flexible and productive working conditions to each employee instead of trying to fit everyone in the same conditions (because management is hard). Overall, lack of management competency should never become a limiting factor. One manifestation of this is that a modern technology company should be built as a distributed organization from day 1.
  • Ahimsa is an ancient word meaning not harming anyone. It is a little bit more than our modern-day ethics, but it's worth it. Why create something if you know that it will cause misery and suffering to others? In effect, this means, for example, refusal to provide services to companies that are clearly unethical.
  • Radical openness. As they say, "information wants to be free." :) Maximal sharing and minimal secrecy makes so many things much simpler. And in our lowest-common-denominator technology world, ultimately, the risk of competitors copying and abusing your work is much less than that of brilliant people not joining your cause because they just haven't heard of it.

So... If you're interested in solving complex AI challenges out of whatever part of the world you're living in, working with 10x programmers, using Lisp and other advanced technologies in the process - drop me a line, I'd be glad to chat.

2016-05-17

Improving Lisp UX One Form at a Time

At the recent ELS, I presented a lightning talk about RUTILS and how I see it as a way of "modernizing" CL, i.e. updating the basic language elements to be simpler, clearer and more generic. Thus improving the everyday user experience and answering the complaints of outsiders about "historical cruft" in the Lisp standard. Indeed, Lisp has a lot of unrecognizable names (like mapcar and svref) or just unnecessary long ones (multiple-value-bind or defparameter), and out-of-the-box it lacks a lot of things that many current programmers are used to: unified generic accessors, generators, literal syntax for defining hash-tables or dynamic vectors etc. This may not be a problem for the people working with the language on a regular basis (or if it is they probably have a personal solution for that already), but it impedes communication with the outside world. I'd paid extra attention to that recently as I was preparing code examples for the experimental course on algorithms, which I teach now using Lisp instead of pseudocode (actually, modulo the naming/generics issue, Lisp is a great fit for that).

Unfortunately, the lightning talk format is too short for a good presentation of this topic, so here's a more elaborate post, in which I want to show a few examples from the RUTILS library of using Lisp's built-in capabilities to introduce clear, uniform, and generic syntactic abstractions that may be used alongside the standard Lisp operators, as well as replace them in the cases when we want to get more concise and understandable code.

What's cool about this problem is that, in Lisp, besides a common way to extend the language with functions and methods (and even macros/templates, which find they way into more and more languages), there are several other approaches to the problem that allow to tackle issues that can't be covered by functions and even macros. Those include, for instance, reader macros and aliasing. Aliasing is, actually, a rather simple idea (and can be, probably, implemented in other dynamic languages): duplicating functionality of existing functions or macros with a new name. The idea for such operator came from Paul Graham's "On Lisp" and it may be implemented in the following way (see a full implementation here):


(defmacro abbr (short long &optional lambda-list)
  `(progn
     (cond
      ((macro-function ',long)
       (setf (macro-function ',short) (macro-function ',long)))
      ((fboundp ',long)
       (setf (fdefinition ',short) (fdefinition ',long))
       ,(when lambda-list
          `(define-setf-expander ,short ,lambda-list
             (values ,@(multiple-value-bind
                           (dummies vals store store-form access-form)
                           (get-setf-expansion
                            (cons long (remove-if (lambda (sym)
                                                    (member sym '(&optional &key)))
                                                  lambda-list)))
                         (let ((expansion-vals (mapcar (lambda (x) `(quote ,x))
                                                       (list dummies
                                                             vals
                                                             store
                                                             store-form
                                                             access-form))))
                           (setf (second expansion-vals)
                                 (cons 'list vals))
                           expansion-vals))))))
      (t (error "Can't abbreviate ~a" ',long)))
     (setf (documentation ',short 'function) (documentation ',long 'function))
     ',short))

As you may have noticed, it is also capable of duplicating a setf-expander for a given function if the lambda-list is provided. Using abbr we can define a lot of shorthands or alternative names, and it is heavily used in RUTILS to provide more than 50 alternative names; we'll see some of them in this post. What this example shows is the malleability of Lisp, which allows approaching its own improvement from different angles depending on the problem at hand and the tradeoffs you're willing to make.

Introducing generic element access

One of the examples of historic baggage in CL is a substantial variety of different methods to access elements of collections, hash-tables, structures, and objects with no generic function unifying them. Not to say that other languages have a totally uniform accessor mechanism. Usually, there will be two or three general-purpose ways to organize it: dot notation for object field access, something square-braketish for array and other collections access, and some generic operator like get for all the other cases. And occasionally (e.g. in Python or C++) there are hooks to plug into the built-in operators. Still, it's a much smaller number than in Lisp, and what's more important, it's sufficiently distinct and non-surprising.

In Lisp, actually, nothing prevents us from doing even better — both better than the current state and than other languages — i.e. from having a fully uniform and extensible solution. At first approximation, it's just a matter of defining a generic function that will work on different container types and utilize all the existing optimized accessor functions in its methods. This interface will be extensible for any container object. In RUTILSX (a part of RUTILS where any experiments are allowed) this function is called generic-elt:


(defgeneric generic-elt (obj key &rest keys)
  (:method :around (obj key &rest keys)
    (reduce #'generic-elt keys :initial-value (call-next-method obj key))))

One important aspect you can see in this definition is the presence of an :around method that allows to chain multiple accesses in one call and dispatch each one to an appropriate basic method via call-next-method. Thus, we may write something like (generic-elt obj 'children 0 :key) to access, for instance, an element indexed by :key in a hash-table that is the first element of a sequence that is the contents of the slot children of some object obj.

The only problem with this function is its long name. Unfortunately, most of good short element access names, like elt and nth are already taken in the Common Lisp standard, while for RUTILS I've adopted a religious principle to retain full backward compatibility and don't alter anything from the standard. This is a critical point: not redefining CL, but building on top of it and extending it!

Moreover, element access has two features: it's a very common operation and it's also not a usual function that does some computation, so ideally it should have a short but prominent look in the code. The perfect solution occurred to me at one point: introduce an alias ? for it. Lisp allows to name operations with any characters, and a question mark, in my opinion, matches very well the inner intent of this operation: query a container-like object using a certain key. With it, our previous example becomes very succinct and cool: (? obj 'children 0 :key).

Additionally to element reading, there's also element write access. This operation in Lisp, like in most other languages, has a unified entry point called setf. There's a special interface to provide specific "methods" for it based on the accessor function. Yet, what to do when an access function is polymorphic? Well, provide polymorphic setter companion. (defsetf generic-elt generic-setf). Like generic-elt, generic-setf defers work to already defined specific setters:


(defmethod generic-setf ((obj list) key &rest keys-and-val)
  (setf (nth key obj) (atomize keys-and-val)))

And it also supports key chaining, so you can write: (setf (? obj 'children 0 :key) new-value).

Having this unified access functionality is nice and cool, but some people may still linger for the familiar dot object slot access syntax. We can't blame them: habits are a basis of good UX. Unfortunately, this is contrary to the Lisp way... But Lisp is a pro-choice and future-proof language: if you want something badly, even something not in the usual ways, almost always you can, actually, find a clean and supported means of implementing it. And this case is not an exception. If you can tolerate an small addition — a @-prefix to the object reference (that's also an extra prominent indicator of something unusual going on) — when accessing its slots you can define a reader macro that will expand forms @obj.slot into our (? obj 'slot) or a standard (slot-value obj 'slot). With it, we can write something like (? tokens @dep.govr.id), which is much more succinct and, arguably, readable than (elt tokens (slot-value (slot-value dep 'govr) 'id)).

Still, one issue remains unsolved in this approach: the preferred Lisp slot-access method is not via slot-value, but with an accessor method that is exported. And one of the reasons for it is that slot-names, which are usually short and can clash, are kept private to the package where they are defined. It means that in most cases @obj.slot will not work across packages. (Unlike the OO-languages in which every class is its own namespace, in Lisp, this function is not "complected" within the OO-system, and packages are a namespacing method, while objects serve for encapsulation and inheritance.)

There are two ways to tackle this problem. As I said, Lisp is future-proof: being thoroughly dynamic and extensible, CLOS defines a method that is called when there's a problem accessing an object's slot — slot-missing. Once again, we can define an :around method that will be a little smarter (?) and try to look up slot-name not only in the current package, but also in the class' original package.


(defmethod slot-missing :around
    (class instance slot-name (operation (eql 'slot-value)) &optional new-value)
  (declare (ignore new-value))
  (let ((class-package (symbol-package (class-name (class-of instance)))))
    (if (eql class-package (symbol-package slot-name))  ;; to avoid infinite looping
        (call-next-method)
        (if-it (find-symbol (string-upcase slot-name) class-package)
               (slot-value instance it)
               (call-next-method)))))

This is a rather radical way and comes at a cost: two additional virtual function calls (of the slot-missing method itself and an additional slot-value one). But in most of the cases it may be worth paying it for convenience's sake, especially, since you can always optimize a particular call-site by changing the code to the most direct (slot-value obj 'package::slot) variant. By the way, using slot accessor method is also costlier than just slot-value, so we are compensating here somewhat. Anyway, it's cool to have all the options on the table: beautiful slow and ugly fast method that our backward-compatibility approach allows us. As usual, you can't have a cake and eat it too...

Though, sometimes, you can. :) If you think more of this it becomes apparent that slot-value could be implemented this way from the start: look up the slot name in the class'es original package. As classes or structs are defined together with their slots it is very rare if not almost impossible to see slot-names not available in the package where their class is defined (you have to explicitly use a private name from another package when defining a class to do such a trick). So, slot-value should always look for slot names in the class'es package first. We can define a "smart" slot-value variant that will do just that, and with our nice generic-elt frontend it can easily integrated without breaking backward-compatibility.


(defun smart-slot-value (object slot-name)
  (slot-value object
              (or (find-symbol (string-upcase slot-name)
                               (symbol-package (class-name (class-of instance))))
                  slot-name)))

Unifying variable binding with with

Almost everything in functional variable definition and binding was pioneered by Lisp at some point, including the concept of destructuring. Yet, the CL standard, once again, lacks unification in this area. There are at least 4 major constructs: let and let*, destructuring-bind and multiple-value-bind, and also a few specialized ones like with-slots or ppcre:register-groups-bind. One more thing to mention is that parallel assignment behavior of plain let can be implemented with destructuring-bind and multiple-value-bind. Overall, it just screams for uniting in a single construct, and already there have been a few attempts to do that (like metabang-bind). In RUTILS, I present a novel implementation of generic bind that has two distinct features: a more plausible name — with — and a simple method-based extension mechanism. The implementation is very simple: the binding construct selection is performed at compile-time based on the structure of the clause and, optionally, presence of special symbols in it:


(defmacro with ((&rest bindings) &body body)
  (let ((rez body))
    (dolist (binding (reverse bindings))
      (:= rez `((,@(call #'expand-binding binding rez)))))
    (first rez)))

A very short number of methods covering the basic cases are defined:

  • the first one expands to let or multiple-value-bind depending on the number of symbols in the clause (i.e. for multiple values you should have more than 2)
  • the second group triggers when the first element of the clause is a list and defaults to destructruing-bind, but has special behaviors for 2 symbols ? and @ generating clauses for our generic element access and smart slot access discussed in the previous section

(defun expand-binding (binding form)
  (append (apply #'bind-dispatch binding)
          form))

(defgeneric bind-dispatch (arg1 arg2 &rest args)
  (:method ((arg1 symbol) arg2 &rest args)
    (if args
        `(multiple-value-bind (,arg1 ,arg2 ,@(butlast args)) ,(last1 args))
        `(let ((,arg1 ,arg2)))))
  (:method ((arg1 list) (arg2 (eql '?)) &rest args)
    `(let (,@(mapcar (lambda (var-key)
                       `(,(first (mklist var-key))
                         (? ,(first args) ,(last1 (mklist var-key)))))
                     arg1))))
  (:method ((arg1 list) (arg2 (eql '@)) &rest args)
    (with-gensyms (obj)
      `(let* ((,obj ,(first args))
              ,@(mapcar (lambda (var-slot)
                          `(,(first (mklist var-slot))
                            (smart-slot-value ,obj ',(last1 (mklist var-slot)))))
                        arg1)))))
  (:method ((arg1 list) arg2 &rest args)
    `(destructuring-bind ,arg1 ,arg2)))
In a sense, it's a classic example of combining generic-functions and macros to create a clean and extensible UI. Another great benefit of using with is reduced code nesting that can become quite deep with the standard operators. Here's one of the examples from my codebase:

(with (((stack buffer ctx) @ parser)
       (fs (extract-fs parser interm))
       (((toks :tokens) (cache :cache)) ? ctx))
  ...)
And here's how it would have looked in plain CL:

(with-slots (stack buffer ctx) parser
  (let ((fs (extract-fs parser interm)))
        (toks (gethash :tokens ctx))
        (cache (gethash :cache ctx)))
    ...))

Implementing simple generators on top of signals

One of my friends and a Lisp enthusiast, Valery Zamarayev, who's also a long-time Python user, once complained that the only thing that he misses in CL from Python is generators. This feature is popular in many dynamic languages, such as Ruby or Perl, and even Java 8 has introduced something similar. Sure, there are multiple ways to implement lazy evaluation in Lisp with many libraries for that, like SERIES, pygen or CLAZY. We don't have to wait for another version of the spec (especially, since it's not coming 8-)

In RUTILS I have discovered, I believe, a novel and a very clean way to implement generators — on top of the signal system. The signal or condition facility is, by the way, one of the most underappreciated assets of Common Lisp that often comes to rescue in seemingly dead ends of control flow implementation. And Kent Pitman's description of it is one of my favorite reads in Computer Science. Anyway, here's all you need to implement Python-style generators in Lisp:


(define-condition generated ()
  ((item :initarg :item :reader generated-item)))

(defun yield (item)
  (restart-case (signal 'generated :item item)
    (resume () item)))

(defmacro doing ((item generator-form &optional result) &body body)
  (with-gensyms (e)
    `(block nil
       (handler-bind ((generated (lambda (,e)
                                   (let ((,item (generated-item ,e)))
                                     ,@body
                                     (invoke-restart (find-restart 'resume))))))
         ,generator-form)
       ,result)))

The doing macro works just like dolist, but iterating the generator form instead of an existing sequence. As you can see from this example, restarts are like generators in disguise. Or, to be more correct, they are a more general way to handle such functionality, and it takes just a thin layer of syntactic sugar to adapt them to a particular usage style.

And a few mischiefs

We have seen three different approaches to extending CL in order to accommodate new popular syntactic constructs and approaches. Lastly, I wanted to tread a little in the "danger zone" that may be considered unconventional or plain bad-style by many lispers — modifying syntax at the reader level. One thing that Clojure (following other dynamic languages before it), I believe, has proven is the importance of shorthand literal notation for popular operations. CL standard has predated this understanding: although it has specific print representations for various important objects, and even a special syntax for static arrays. Yet, the language is really future-proof in this respect, because it provides a way to hook into the reader mechanism by modifying the readtables. It was further smoothed and packaged by the popular NAMED-READTABLES library, which allows to treat readtables similar to packages. In RUTILS I have defined several extended readtables that implement a few shortcuts that are used literally in every second function or macro I define in my code. These include:

  • a shorthand notation for zero-, one- or two-argument lambda functions: ^(+ % %%) expands into (lambda (% %%) (+ % %%))
  • a literal syntax for hash-tables: #h(equal "key" "val") will create a EQUAL-hash-table with one key-value pair
  • a syntax for heredoc-strings: #/this quote (") shouldn't be escaped/# (which, unfortunately, doesn't always work smoothly in the repl)

Overall, I have experimented a lot with naming — it was sort of my obsession in this work to find short and obvious names for new things, many of which substitute the existing functionality, under the constraints of not altering what's already in the standard. For this sake, I've ventured into non-character symbols and even the keyword package — a major offence, I reckon... And here are a few of the findings I wanted to share (besides ? and with mentioned previously):

  • call is a new alias for funcall — I suppose, in the 70's it was a really fun experience to call a function hence the name, but now its too clumsy
  • get#, set#, and getset# are aliases and new operations for #-tables (when you can't or won't use ? for that)
  • finally, the grandest mischief is := (alongside :+, :-, :*, :/), which is an alias for setf (and, you've guessed it, incf etc). The justification for this is that everyone is confused about the -f, that setting a variable is a very important operation that we should immediately notice in our clean and functional code ;), and that := is a very familiar syntax for it even used by some languages, such as Pascal or golang. It may be controversial, but it's super-convenient.

The only thing I failed to find a proper renaming for so far is mapcar. It is another one of those emblematic operations that should be familiar to everyone, yet -car creates confusion. For now, I resist the temptation to rename map into map-into and make map smarter by using the first sequence's type for the result expression. However, there's no plausible alternative variant I was able to find even among the zoo of other language's naming of this concept. Any thoughts?

PS. Those were a few prominent examples, but RUTILS, in fact, has much more to offer. A lot of stuff was borrowed from other utility projects, as well as implemented from scratch: anaphoric operators, the famous iter — a replacement for loop, Clojure-style threading macros, a new semantic pair data type to replace cons-cells, lots of utilities to work with the standard data structures (sequences, vectors, hash-tables, strings) making them truly first-class, iteration with explicit indices etc etc. With all that in the toolbox, there's now no ground to claim that Lisp is in any aspect inferior in terms of day-to-day UX compared to some other language, be it Haskell, Ruby or Clojure. Surely, I'm not talking about the semantic differences here.