2016-05-17

Improving Lisp UX One Form at a Time

At the recent ELS, I presented a lightning talk about RUTILS and how I see it as a way of "modernizing" CL, i.e. updating the basic language elements to be simpler, clearer and more generic. Thus improving the everyday user experience and answering the complaints of outsiders about "historical cruft" in the Lisp standard. Indeed, Lisp has a lot of unrecognizable names (like mapcar and svref) or just unnecessary long ones (multiple-value-bind or defparameter), and out-of-the-box it lacks a lot of things that many current programmers are used to: unified generic accessors, generators, literal syntax for defining hash-tables or dynamic vectors etc. This may not be a problem for the people working with the language on a regular basis (or if it is they probably have a personal solution for that already), but it impedes communication with the outside world. I'd paid extra attention to that recently as I was preparing code examples for the experimental course on algorithms, which I teach now using Lisp instead of pseudocode (actually, modulo the naming/generics issue, Lisp is a great fit for that).

Unfortunately, the lightning talk format is too short for a good presentation of this topic, so here's a more elaborate post, in which I want to show a few examples from the RUTILS library of using Lisp's built-in capabilities to introduce clear, uniform, and generic syntactic abstractions that may be used alongside the standard Lisp operators, as well as replace them in the cases when we want to get more concise and understandable code.

What's cool about this problem is that, in Lisp, besides a common way to extend the language with functions and methods (and even macros/templates, which find they way into more and more languages), there are several other approaches to the problem that allow to tackle issues that can't be covered by functions and even macros. Those include, for instance, reader macros and aliasing. Aliasing is, actually, a rather simple idea (and can be, probably, implemented in other dynamic languages): duplicating functionality of existing functions or macros with a new name. The idea for such operator came from Paul Graham's "On Lisp" and it may be implemented in the following way (see a full implementation here):


(defmacro abbr (short long &optional lambda-list)
  `(progn
     (cond
      ((macro-function ',long)
       (setf (macro-function ',short) (macro-function ',long)))
      ((fboundp ',long)
       (setf (fdefinition ',short) (fdefinition ',long))
       ,(when lambda-list
          `(define-setf-expander ,short ,lambda-list
             (values ,@(multiple-value-bind
                           (dummies vals store store-form access-form)
                           (get-setf-expansion
                            (cons long (remove-if (lambda (sym)
                                                    (member sym '(&optional &key)))
                                                  lambda-list)))
                         (let ((expansion-vals (mapcar (lambda (x) `(quote ,x))
                                                       (list dummies
                                                             vals
                                                             store
                                                             store-form
                                                             access-form))))
                           (setf (second expansion-vals)
                                 (cons 'list vals))
                           expansion-vals))))))
      (t (error "Can't abbreviate ~a" ',long)))
     (setf (documentation ',short 'function) (documentation ',long 'function))
     ',short))

As you may have noticed, it is also capable of duplicating a setf-expander for a given function if the lambda-list is provided. Using abbr we can define a lot of shorthands or alternative names, and it is heavily used in RUTILS to provide more than 50 alternative names; we'll see some of them in this post. What this example shows is the malleability of Lisp, which allows approaching its own improvement from different angles depending on the problem at hand and the tradeoffs you're willing to make.

Introducing generic element access

One of the examples of historic baggage in CL is a substantial variety of different methods to access elements of collections, hash-tables, structures, and objects with no generic function unifying them. Not to say that other languages have a totally uniform accessor mechanism. Usually, there will be two or three general-purpose ways to organize it: dot notation for object field access, something square-braketish for array and other collections access, and some generic operator like get for all the other cases. And occasionally (e.g. in Python or C++) there are hooks to plug into the built-in operators. Still, it's a much smaller number than in Lisp, and what's more important, it's sufficiently distinct and non-surprising.

In Lisp, actually, nothing prevents us from doing even better — both better than the current state and than other languages — i.e. from having a fully uniform and extensible solution. At first approximation, it's just a matter of defining a generic function that will work on different container types and utilize all the existing optimized accessor functions in its methods. This interface will be extensible for any container object. In RUTILSX (a part of RUTILS where any experiments are allowed) this function is called generic-elt:


(defgeneric generic-elt (obj key &rest keys)
  (:method :around (obj key &rest keys)
    (reduce #'generic-elt keys :initial-value (call-next-method obj key))))

One important aspect you can see in this definition is the presence of an :around method that allows to chain multiple accesses in one call and dispatch each one to an appropriate basic method via call-next-method. Thus, we may write something like (generic-elt obj 'children 0 :key) to access, for instance, an element indexed by :key in a hash-table that is the first element of a sequence that is the contents of the slot children of some object obj.

The only problem with this function is its long name. Unfortunately, most of good short element access names, like elt and nth are already taken in the Common Lisp standard, while for RUTILS I've adopted a religious principle to retain full backward compatibility and don't alter anything from the standard. This is a critical point: not redefining CL, but building on top of it and extending it!

Moreover, element access has two features: it's a very common operation and it's also not a usual function that does some computation, so ideally it should have a short but prominent look in the code. The perfect solution occurred to me at one point: introduce an alias ? for it. Lisp allows to name operations with any characters, and a question mark, in my opinion, matches very well the inner intent of this operation: query a container-like object using a certain key. With it, our previous example becomes very succinct and cool: (? obj 'children 0 :key).

Additionally to element reading, there's also element write access. This operation in Lisp, like in most other languages, has a unified entry point called setf. There's a special interface to provide specific "methods" for it based on the accessor function. Yet, what to do when an access function is polymorphic? Well, provide polymorphic setter companion. (defsetf generic-elt generic-setf). Like generic-elt, generic-setf defers work to already defined specific setters:


(defmethod generic-setf ((obj list) key &rest keys-and-val)
  (setf (nth key obj) (atomize keys-and-val)))

And it also supports key chaining, so you can write: (setf (? obj 'children 0 :key) new-value).

Having this unified access functionality is nice and cool, but some people may still linger for the familiar dot object slot access syntax. We can't blame them: habits are a basis of good UX. Unfortunately, this is contrary to the Lisp way... But Lisp is a pro-choice and future-proof language: if you want something badly, even something not in the usual ways, almost always you can, actually, find a clean and supported means of implementing it. And this case is not an exception. If you can tolerate an small addition — a @-prefix to the object reference (that's also an extra prominent indicator of something unusual going on) — when accessing its slots you can define a reader macro that will expand forms @obj.slot into our (? obj 'slot) or a standard (slot-value obj 'slot). With it, we can write something like (? tokens @dep.govr.id), which is much more succinct and, arguably, readable than (elt tokens (slot-value (slot-value dep 'govr) 'id)).

Still, one issue remains unsolved in this approach: the preferred Lisp slot-access method is not via slot-value, but with an accessor method that is exported. And one of the reasons for it is that slot-names, which are usually short and can clash, are kept private to the package where they are defined. It means that in most cases @obj.slot will not work across packages. (Unlike the OO-languages in which every class is its own namespace, in Lisp, this function is not "complected" within the OO-system, and packages are a namespacing method, while objects serve for encapsulation and inheritance.)

There are two ways to tackle this problem. As I said, Lisp is future-proof: being thoroughly dynamic and extensible, CLOS defines a method that is called when there's a problem accessing an object's slot — slot-missing. Once again, we can define an :around method that will be a little smarter (?) and try to look up slot-name not only in the current package, but also in the class' original package.


(defmethod slot-missing :around
    (class instance slot-name (operation (eql 'slot-value)) &optional new-value)
  (declare (ignore new-value))
  (let ((class-package (symbol-package (class-name (class-of instance)))))
    (if (eql class-package (symbol-package slot-name))  ;; to avoid infinite looping
        (call-next-method)
        (if-it (find-symbol (string-upcase slot-name) class-package)
               (slot-value instance it)
               (call-next-method)))))

This is a rather radical way and comes at a cost: two additional virtual function calls (of the slot-missing method itself and an additional slot-value one). But in most of the cases it may be worth paying it for convenience's sake, especially, since you can always optimize a particular call-site by changing the code to the most direct (slot-value obj 'package::slot) variant. By the way, using slot accessor method is also costlier than just slot-value, so we are compensating here somewhat. Anyway, it's cool to have all the options on the table: beautiful slow and ugly fast method that our backward-compatibility approach allows us. As usual, you can't have a cake and eat it too...

Though, sometimes, you can. :) If you think more of this it becomes apparent that slot-value could be implemented this way from the start: look up the slot name in the class'es original package. As classes or structs are defined together with their slots it is very rare if not almost impossible to see slot-names not available in the package where their class is defined (you have to explicitly use a private name from another package when defining a class to do such a trick). So, slot-value should always look for slot names in the class'es package first. We can define a "smart" slot-value variant that will do just that, and with our nice generic-elt frontend it can easily integrated without breaking backward-compatibility.


(defun smart-slot-value (object slot-name)
  (slot-value object
              (or (find-symbol (string-upcase slot-name)
                               (symbol-package (class-name (class-of instance))))
                  slot-name)))

Unifying variable binding with with

Almost everything in functional variable definition and binding was pioneered by Lisp at some point, including the concept of destructuring. Yet, the CL standard, once again, lacks unification in this area. There are at least 4 major constructs: let and let*, destructuring-bind and multiple-value-bind, and also a few specialized ones like with-slots or ppcre:register-groups-bind. One more thing to mention is that parallel assignment behavior of plain let can be implemented with destructuring-bind and multiple-value-bind. Overall, it just screams for uniting in a single construct, and already there have been a few attempts to do that (like metabang-bind). In RUTILS, I present a novel implementation of generic bind that has two distinct features: a more plausible name — with — and a simple method-based extension mechanism. The implementation is very simple: the binding construct selection is performed at compile-time based on the structure of the clause and, optionally, presence of special symbols in it:


(defmacro with ((&rest bindings) &body body)
  (let ((rez body))
    (dolist (binding (reverse bindings))
      (:= rez `((,@(call #'expand-binding binding rez)))))
    (first rez)))

A very short number of methods covering the basic cases are defined:

  • the first one expands to let or multiple-value-bind depending on the number of symbols in the clause (i.e. for multiple values you should have more than 2)
  • the second group triggers when the first element of the clause is a list and defaults to destructruing-bind, but has special behaviors for 2 symbols ? and @ generating clauses for our generic element access and smart slot access discussed in the previous section

(defun expand-binding (binding form)
  (append (apply #'bind-dispatch binding)
          form))

(defgeneric bind-dispatch (arg1 arg2 &rest args)
  (:method ((arg1 symbol) arg2 &rest args)
    (if args
        `(multiple-value-bind (,arg1 ,arg2 ,@(butlast args)) ,(last1 args))
        `(let ((,arg1 ,arg2)))))
  (:method ((arg1 list) (arg2 (eql '?)) &rest args)
    `(let (,@(mapcar (lambda (var-key)
                       `(,(first (mklist var-key))
                         (? ,(first args) ,(last1 (mklist var-key)))))
                     arg1))))
  (:method ((arg1 list) (arg2 (eql '@)) &rest args)
    (with-gensyms (obj)
      `(let* ((,obj ,(first args))
              ,@(mapcar (lambda (var-slot)
                          `(,(first (mklist var-slot))
                            (smart-slot-value ,obj ',(last1 (mklist var-slot)))))
                        arg1)))))
  (:method ((arg1 list) arg2 &rest args)
    `(destructuring-bind ,arg1 ,arg2)))
In a sense, it's a classic example of combining generic-functions and macros to create a clean and extensible UI. Another great benefit of using with is reduced code nesting that can become quite deep with the standard operators. Here's one of the examples from my codebase:

(with (((stack buffer ctx) @ parser)
       (fs (extract-fs parser interm))
       (((toks :tokens) (cache :cache)) ? ctx))
  ...)
And here's how it would have looked in plain CL:

(with-slots (stack buffer ctx) parser
  (let ((fs (extract-fs parser interm)))
        (toks (gethash :tokens ctx))
        (cache (gethash :cache ctx)))
    ...))

Implementing simple generators on top of signals

One of my friends and a Lisp enthusiast, Valery Zamarayev, who's also a long-time Python user, once complained that the only thing that he misses in CL from Python is generators. This feature is popular in many dynamic languages, such as Ruby or Perl, and even Java 8 has introduced something similar. Sure, there are multiple ways to implement lazy evaluation in Lisp with many libraries for that, like SERIES, pygen or CLAZY. We don't have to wait for another version of the spec (especially, since it's not coming 8-)

In RUTILS I have discovered, I believe, a novel and a very clean way to implement generators — on top of the signal system. The signal or condition facility is, by the way, one of the most underappreciated assets of Common Lisp that often comes to rescue in seemingly dead ends of control flow implementation. And Kent Pitman's description of it is one of my favorite reads in Computer Science. Anyway, here's all you need to implement Python-style generators in Lisp:


(define-condition generated ()
  ((item :initarg :item :reader generated-item)))

(defun yield (item)
  (restart-case (signal 'generated :item item)
    (resume () item)))

(defmacro doing ((item generator-form &optional result) &body body)
  (with-gensyms (e)
    `(block nil
       (handler-bind ((generated (lambda (,e)
                                   (let ((,item (generated-item ,e)))
                                     ,@body
                                     (invoke-restart (find-restart 'resume))))))
         ,generator-form)
       ,result)))

The doing macro works just like dolist, but iterating the generator form instead of an existing sequence. As you can see from this example, restarts are like generators in disguise. Or, to be more correct, they are a more general way to handle such functionality, and it takes just a thin layer of syntactic sugar to adapt them to a particular usage style.

And a few mischiefs

We have seen three different approaches to extending CL in order to accommodate new popular syntactic constructs and approaches. Lastly, I wanted to tread a little in the "danger zone" that may be considered unconventional or plain bad-style by many lispers — modifying syntax at the reader level. One thing that Clojure (following other dynamic languages before it), I believe, has proven is the importance of shorthand literal notation for popular operations. CL standard has predated this understanding: although it has specific print representations for various important objects, and even a special syntax for static arrays. Yet, the language is really future-proof in this respect, because it provides a way to hook into the reader mechanism by modifying the readtables. It was further smoothed and packaged by the popular NAMED-READTABLES library, which allows to treat readtables similar to packages. In RUTILS I have defined several extended readtables that implement a few shortcuts that are used literally in every second function or macro I define in my code. These include:

  • a shorthand notation for zero-, one- or two-argument lambda functions: ^(+ % %%) expands into (lambda (% %%) (+ % %%))
  • a literal syntax for hash-tables: #h(equal "key" "val") will create a EQUAL-hash-table with one key-value pair
  • a syntax for heredoc-strings: #/this quote (") shouldn't be escaped/# (which, unfortunately, doesn't always work smoothly in the repl)

Overall, I have experimented a lot with naming — it was sort of my obsession in this work to find short and obvious names for new things, many of which substitute the existing functionality, under the constraints of not altering what's already in the standard. For this sake, I've ventured into non-character symbols and even the keyword package — a major offence, I reckon... And here are a few of the findings I wanted to share (besides ? and with mentioned previously):

  • call is a new alias for funcall — I suppose, in the 70's it was a really fun experience to call a function hence the name, but now its too clumsy
  • get#, set#, and getset# are aliases and new operations for #-tables (when you can't or won't use ? for that)
  • finally, the grandest mischief is := (alongside :+, :-, :*, :/), which is an alias for setf (and, you've guessed it, incf etc). The justification for this is that everyone is confused about the -f, that setting a variable is a very important operation that we should immediately notice in our clean and functional code ;), and that := is a very familiar syntax for it even used by some languages, such as Pascal or golang. It may be controversial, but it's super-convenient.

The only thing I failed to find a proper renaming for so far is mapcar. It is another one of those emblematic operations that should be familiar to everyone, yet -car creates confusion. For now, I resist the temptation to rename map into map-into and make map smarter by using the first sequence's type for the result expression. However, there's no plausible alternative variant I was able to find even among the zoo of other language's naming of this concept. Any thoughts?

PS. Those were a few prominent examples, but RUTILS, in fact, has much more to offer. A lot of stuff was borrowed from other utility projects, as well as implemented from scratch: anaphoric operators, the famous iter — a replacement for loop, Clojure-style threading macros, a new semantic pair data type to replace cons-cells, lots of utilities to work with the standard data structures (sequences, vectors, hash-tables, strings) making them truly first-class, iteration with explicit indices etc etc. With all that in the toolbox, there's now no ground to claim that Lisp is in any aspect inferior in terms of day-to-day UX compared to some other language, be it Haskell, Ruby or Clojure. Surely, I'm not talking about the semantic differences here.

No comments:

Post a Comment