2012-11-04

cl-redis: Separation of Concerns in Library Design

TL;DR This article describes a "lispy" approach to implementing a spec, connection handling, and namespacing in the cl-redis library with special variables, generic functions, macros, package, and restarts, and a comparison of it to object-oriented one.

Redis is a simple and powerful tool, that can have a lot of different uses: coordination in a distributed system, message queue, cache, static database, dynamic database for ephemeral data — to list just a few. Seeing such potential, I have created the cl-redis Lisp client back when Redis hadn't yet reached version 1.0. A couple of weeks ago version 2.6 was released, which, as usually, added a handful of commands (now there're 140 of them — more than twofold increase since cl-redis was first released with 64), and a couple of small but sometimes incompatible communication protocol changes.

Upgrading the library to support 2.6 I have implemented a couple of improvements to the overall design, which made me rethink its original premises, and prompted to write this article to summarize those points, as they may be not the most mainstream manifestation of the very mainstream and basic principle of separation of concerns.

Anticipating a rapid rate of change to Redis from the beginning I decided to base the library on the following principles:

  • uniform declarative command definition, separated from the details of protocol implementation
  • a sort of TDD approach: adding each new command requires adding a test case for it (which can currently be extracted from the official docs)
  • implementing each command as a regular Lisp function, exported from REDIS package; and prefixing each command's name to avoid potential symbol conflicts (for instance, get is at the same time a core Lisp function and a Redis command, so it gets defined as red-get in cl-redis)

Declarative Command Definition

Such an approach should be very scalable regarding addition of new commands. The actions required should be: just copying the definition from the spec, putting parentheses around it, copying a test from the spec, recording and running it. And the protocol changes should have no or very little effect on the commands' definition. Those were the assumptions and they worked out pretty well, allowing to relatively easily handle all the 76 new commands added over time, go through the transition from old protocol to new binary-safe one (which in the end prompted only one change on the interface level: removing an output spec from command definition).

This is how a command definition looks in Redis spec:

HMGET key field [field ...]

Available since 2.0.0.
Time complexity: O(N) where N is the number of fields being requested.
Returns the values associated with the specified fields in the hash stored at key.
...
Return value
Multi-bulk reply: list of values associated with the given fields, in the same order as they are requested.
And this is its representation in Lisp code:
(def-cmd HMGET (key field &rest fields) :multi
  "Get the values associated with the specified FIELDS in the hash
stored at KEY.")

The difference from defun is that a return "type" is specified (:multi in this case) and that there's no body — it's a piece of code, that handles communication and is generated automatically.

But still there were some quirks. The biggest one was a small impedance mismatch between how Lisp handles function arguments and how Redis does. It should be said, that among all programming languages Common Lisp has the richest function arguments protocol, only matched to some extent by Python. And from the first look at Redis commands it seamed, that Lisp will be able to accommodate all of them as is. Yet, Redis' version of a protocol turned out to be more ad hoc, and so for some commands additional pre-processing of arguments was required. For instance, the ZRANGE and ZREVRANGE commands have a WITHSCORES argument, which if present should be the string "WITHSCORES". This is something in-between Lisp's &optional and &key arguments. Both choices required some pre-processing of arguments. My final choice was to go with &optional, but ensure, that whatever non-nil value is provided, it's transformed to a proper string. Still it was relatively easy to realize, because the Redis interaction protocol is implemented as 2 generic functions: tell for sending a request and expect for receiving the response. This provides the ability to decorate the methods with additional processing or override them altogether for some specific command. In this case a slight pre-processing is added to tell:

(defmethod tell :before ((cmd (eql 'ZRANGE)) &rest args)
  (when (and (= 4 (length args))
             (last1 args))
    (setf (car (last args)) :withscores)))

There are some more involved cases, like the ZUNIONSTORE command, that poses some restrictions on its arguments and also requires insertion of special keywords WEIGHTS and AGGREGATE:

(def-cmd ZINTERSTORE (dstkey n keys &rest args &key weights aggregate) :integer
  "Perform an intersection in DSTKEY over a number (N) of sorted sets at KEYS
with optional WEIGHTS and AGGREGATE.")

(defmethod tell ((cmd (eql 'ZUNIONSTORE)) &rest args)
  (ds-bind (cmd dstkey n keys &key weights aggregate) args
    (assert (integerp n))
    (assert (= n (length keys)))
    (when weights
      (assert (= (length keys) (length weights)))
      (assert (every #'numberp weights)))
    (when aggregate
      (assert (member aggregate '(:sum :min :max))))
    (apply #'tell (princ-to-string cmd)
           (cl:append (list dstkey n)
                      keys
                      (when weights (cons "WEIGHTS" weights))
                      (when aggregate (list "AGGREGATE" aggregate))))))

Overall, among 140 Redis commands 10 required some special handling.

Proper Incapsulation

The only drawback of the described solution, or rather just a consequence of it being implemented in Common Lisp, is the somewhat ugly format of Redis commands: red-incr looks definitely worse than r.incr. If the commands were defined by the names of their Redis equivalent (incr) this won't allow to import the whole REDIS package into your application, because of name clashes with the COMMON-LISP package and inevitable conflicts with other packages — these names are just too common. This is where objects-as-namespaces approach seams to be better, than Lisp's packages-as-namespaces. But it shouldn't be so, as the Lisp's approach implements proper separation of concerns, not "complecting" things, if you use Rich Hickey's parlance.

And it isn't: the solution is actually so simple, that I am surprised, that I didn't think of it until the latest version of the library. It is to define two packages: the REDIS one with all the ordinary functions, and a special package just for Redis commands — I called it RED, because it's a totally syntactic-sugar addition, so it should be short. RED package should never be imported as a whole. This way we basically get the same thing as before: red:incr, but now it's done properly. You can import a single command, you can rename the package as you wish etc. So this solution is actually more elegant, than the object-oriented one (we don't have to entangle the commands with connection state), and we don't have to sacrifice the good parts, described previously.

Connection Handling

I also worked with a couple of other Redis libraries, written in Java and Python. Generally, they use a classic object-oriented approach: the commands are defined as methods of a Redis class, which also handles the state of the connection to the server. The resulting code looks pretty neat, it is well encapsulated and at first glance has no boilerplate even in Java:

Redis r = Redis.getClient();
Long id = r.incr("WORKER_ID_COUNTER");
r.sadd("WORKERS", id.toString());
r.hset("WORKER" + id, "queue", queueName);

There's an issue of not forgetting to return resources (close a connection), but in a more advanced languages like Python it's solvable with a contextmanager:

with redis(port=3333) as r:
    id = r.incr("WORKER_ID_COUNTER")
    r.sadd("WORKERS", id)
    r.hset("WORKER%s" % id, "queue", queueName)

What can be simpler?

Yet, below the surface it suffers from a serious problem. In my work I've seen a lot of cases (I'd even say, it's a majority), where the Redis connection should be persisted and passed from one function to another, because it's very inefficient to reopen it over and over again. Most of the object-oriented libraries use some form of connection pooling for that. In terms of usability, it's not great, but tolerable. The much greater problem though is handling connection errors: in the case of these long-living connections, they always break at some unpredictable point (timeout or network hiccup), which throws an exception. This exception (or rather two or three different types of exceptions) should be handled in all functions, that use our client by trying to reconnect. And the contextmanager wouldn't help here: this is one of the cases, where it really is no match for its macro-based counterparts, that it tries to mimic. Those conenction errors also break the execution of a current function, and it's often not trivial to restart it. With connection pooling it's even worse, because a bad implementation (and I've seen one) will return to the pool broken connections and they will have an action-at-a-distance effect on other parts of the code, incidentally acquiring them. So, in preactice, connection pooling, which may seam like a neat idea, turns out to be a can of worms. This is one of the cases of excessive coupling, that often arises in object-oriented languages. (It is not to say, that connection pooling can be useful in some cases — it can, but it should be restricted only to those ones).

In Lisp there are elegant ways to solve all these problems: as usual its my favourite Lisp tool — special variables, combined with macros and the condition system. A couple of times I've seen such critic of the Lisp condition system, that its restart facility isn't actually used in practice. Well, it may not be used extensively, but when it's needed, it becomes really indispensible, and this is one of the cases.

The solution is to introduce the notion of a current connection — a *connection* special variable. All Redis commands operate on it, so they don't have to pass the connection around. At the same time, different macros can be created as proper context-managers to alter the state of connection and react to its state changes via condition handlers and restarts.

So if you just work with Redis from the console, as I often do (it's more pleasant, than working with the native client), you simply (connect) and issue the commands as is. A one-time job, that is run inside a function, can use with-connection, that is the analogue of a Python context-manager. The with-pipelining macro will delay reading of Redis replies until all commands in its body are sent to the server to save time. Such trick isn't something special: although a Java library needs to create a special object, that handles this strategy, in Ruby it is done simply with blocks (like: node.pipelined{ data.each{ |key| node.incr(key) }}).

But what can't be done gracefully in this languages is handling a connection hiccup. In the latest cl-redis a macro with-persistent-connection is responsible for handling such situations:

(defmacro with-persistent-connection ((&key (host #(127 0 0 1))
                                            (port 6379))
                                      &body body)
  `(with-connection (:host ,host :port ,port)
     (handler-bind ((redis-connection-error
                     (lambda (e)
                       (declare (ignore e))
                       (warn "Reconnecting to Redis.")
                       (invoke-restart :reconnect))))
       ,@body)))

It doesn't work on its own and requires some support from the command-definition code, though a very small one: just one line — wrapping the code of the command in (with-reconnect-restart ...), which is another macro, that intercepts all the possible failure conditions and adds a :reconnect restart to them, which tries to re-establish the connection once and then retry the body of the command. It's somewhat similar to retying a failed transaction in a database system. So, for instance, all the side-effects of the command are performed twice in such a scenario. But it's a necessary evil, if we want to support long-running operation on the server side.

The Lisp condition system separates the error-handling code in 3 distinct phases: signalling a condition, handling it, and restarting the control flow. This is its unique difference from the mainstream systems, found in Java or Python, where the second and third parts are colocated. It may seem unnecessary, until it's necessary, and there's no other way to acheive the desired properties. Consider the case of pipelining: unlike the ordinary command, sent in solitude, the pipelined command is actually a part of a larger batch, so restarting just a single command after reconnection will not return the expected results for the whole batch. So the whole body of with-pipelining should be restarted. Thanks to this separation it is possible. The trick is to check in the condition handler code, if we're in a pipeline, and not react in such a case — the reaction will be performed outside of the pipeline.

And here's the whole with-pipelining macro implementation. Did I mention, that the pipelined context is also managed with a special variable (even 2 in this case)?.. ;)

(defmacro with-pipelining (&body body)
  `(if *pipelined*
       (progn
         (warn "Already in a pipeline.")
         ,@body)
       (with-reconnect-restart
         (let (*pipeline*)
           (let ((*pipelined* t))
             ,@body)
           (mapcar #'expect (reverse *pipeline*))))))

In total, there's a proper separation of concerns: the package system ensures namespacing, the commands are just functions, which operate on the current connection, and connection-handling logic is completely separate from these functions.

(let ((redis:*echo-p* t))
  (redis:with-persistent-connection (:port 10000)
    (loop (process-job)))

(defun process-job ()
  (red:blpop "queue")  ;; block until job arrives
  (let* ((id (red:incr "id"))
         (results (do-something-involved)))     
         (result-key (fmt "result~A" id)))
    (redis:with-pipelining
      (dolist (result results)
        (red:lpush result-key result))
      (red:lpush "results" result-key))))

In this code we can independently and dynamically toggle debugging, use persistent connection, and pipelining for some part of commands, and the commands bear a bare minimum of information necessary for their operation.

A Note on Testing

In general I'm not a fan of TDD and similar approaches, that put testing at the head of design process, and prefer a much less disciplined REPL-driven development. :) Yet this is one of the classical examples, where testing really has its benefits: we have a clear spec to implement and there's an external way to test the implementation. Developing a comprehensive test-suite covering all commands (except a couple of maintainance ones, that just can't be easily tested) really sped up the whole process and made it much less error-prone. Literally every time I updated the library and added new tests, I have seen some of the tests failing! In total, the test-suite has accumulated around 600 cases for those 140 commands, and yet I didn't come up with a way to test complicated failure conditions on the wire, for which I had to resort to the REPL.

Afterword

In this article I wanted to showcase the benefits of the combination of some of Common Lisp's unique approaches to managing complexity and state: special variables, that have a context attached to them, macros, generic functions and condition-handling protocol. They provide several powerful non-mainstream ways to achieve the desired level of concern separation in the design of complex systems. In Lisp you should really remember about them and don't constrain your solution to recreating common patterns from other languages.

Finally, I would like to acknowledge Kent Pitman and his essay "Condition Handling in the Lisp Language Family", that is one of the most profound articles on the topic of protocols and separation of concerns — highly recommended. I think, the example of cl-redis is a clear manifestation of one trait of the CL condition system: you don't need it, until one day you do, and when you do, there's actually no other way to deal with the problem otherwise.

submit