2009-02-17

Sending SMTP mail with UTF-8 characters from Common Lisp

Today I explored the topic of sending a base64-encoded SMTP message and it turned out to be rather tricky. I discovered, that for this task (if you don't rely on Franz's infrastructure) effectively 4 libraries should be used. And as they are very scarcely documented, I decided to write this short description.

CL-SMTP


Initially I knew very little about SMTP protocol (except for HELO and EHLO). So I started with the plain CL-SMTP functionality of SEND-EMAIL [1]. The function is not documented and has no own errors, it just re-signals the errors of the underlying USOCKET library. That's why it required some effort on my part to understand, why the following code produces USOCKET:UKNOWN-ERROR [2]:
(cl-smtp:send-email "localhost" "noreply@our.domain.net" "test@gmail.com"
"subject"
"тест")

Explanation: Sending mail through the SMTP server on localhost from noreply@test.com to test@gmail.com with the body "тест".

It turned out, that the SMTP server just didn't accept the non-ascii characters in the body, because the default encoding is 7bit.
In the process I discovered the useful debugging feature of CL-SMTP: (setf cl-smtp::*debug* t) [3]. It will print the SMTP interaction log.

CL-MIME


So I asked Google and found this article by Hans Hübner, where he explains his enhancements of CL-SMTP (currently integrated in the codebase) and describes how to send attachments with it. But to properly apply his examples to my case I first had to learn a couple of things about MIME. In the example Hans uses multipart/mixed Content-type for sending a message with attachments. But it is not necessary for the simple task of sending a text message in UTF-8 charset. For that you can use text/plain Content-type and UTF-8 charset. But for non-ascii symbols to be accepted by the mail server they should be encoded (usually) in base64 (Content-encoding header). All this activities are handled with CL-MIME library. The library is quite self-explanatory so the lack of documentation doesn't hurt, except for a couple of moments.
First of all, properly formatted MIME text data is produced with the function PRINT-MIME [4], which takes the CLOS MIME object with the appropriately set fields. The problem is, that the generated data contains both MIME headers and the part, which should go into the message body. So the function's output can't be used as an argument to SEND-EMAIL, because the headers will go to the data section, and the mail-client won't consider them (which will result in decoded body). For this case (and other cases, when you need more control of the process of SMTP interaction) Hans has created a high-level macro WITH-SMTP-MAIL [5]. There's a little catch in it as well: unlike SEND-EMAIL it accepts the list of recipients (while the former — a sole recipient string).

CL-BASE64 & ARNESI


The second thing, which caused me most trouble, actually, was the tricky and once again undocumented handling of :CONTENT initarg of the MIME objects [6]. When you provide :ENCODING initarg, such as, primarily, :BASE64, the content part of the data, emitted with PRINT-MIME, will be subjected to the appropriate encoding (performed by CL-BASE64). The interesting thing is, that it will produce wrong output for UTF-8 strings. The proper argument format is an octet array. And you need a function to reduce the string to this format.

ARNESI is a useful library. It provides a lot of small utilities from different spheres. So I was glad to find out, that the needed function STRING-TO-OCTETS [7] is provided by it, because the lib was already utilized in my project.

It's worth mentioning, that if non-ascii characters are used inside MIME body, they can be sent as is. But, AFAIU, it's not so robust as in base64 encoded form.

Result


So the final code turned out to be like this:
(defun send-email (text &rest reciepients)
"Generic send SMTP mail with some TEXT to RECIEPIENTS"
(cl-smtp:with-smtp-mail (out "localhost" "noreply@fin-ack.com" reciepients)
(cl-mime:print-mime out
(make-instance 'cl-mime:text-mime
:encoding :base64 :charset "UTF-8"
:content (arnesi:string-to-octets text :utf-8))
t t)))


Lessons learned


  1. To send plain text ascii email use CL-SMTP:SEND-EMAIL

  2. If USOCKET:UKNOWN-ERROR is signaled, most probably, the arguments are not properly formatted

  3. For debugging (setf cl-smtp::*debug* t)

  4. To efficiently use MIME utilize CL-SMTP:WITH-SMTP-EMAIL in conjunction with CL-MIME:PRINT-MIME

  5. You need to supply an octet vector, not a string to CL-MIME:TEXT-MIME's :CONTENT initarg.

  6. To break a UTF-8 string into octets use ARNESI:STRING-TO-OCTETS

1 comment: