Functional Fun with Asyncio and Monads

Python 3.4+ provides excellent Asyncio library for asynchronous tasks scheduling and asynchronous I/O operations.   It’s similar to gevent, but here tasks are implemented by generator based coroutines.  Asynchronous I/O is useful for higher I/O loads, where it usually achieves better performance and scalability then other approaches (threads, processes). About a year ago I played with OCaml, where light weight threads/ coroutines and asynchronous I/O  approaches  are also very popular (Ocaml has same limitation for threading as Python – a global lock) and there were two great libraries – lwt and core async.  Both libraries use monads as a programming style to work with asynchronous tasks. In this article we will try to implement something similar on basis of asyncio library. While our solution will  probably not provide “pure” monads it’ll still be fun and we’ll learn something about asyncio.

Monad is a term from category theory, but mostly it’s know by it’s usage in functional programming (in Haskell monads play a key role in the language).  I never got into depth of monads theory, but from practical perspective they are quite similar to pipe operator in unix shell – they enable to chain operations together, where result of one operation is fed as input to another operation. For this purpose monads define two functions – unit – which turns some value into monadic type and bind, which unpacks value from monadic type, calls a function on it and returns function’s result packed as monadic type again. If this is not clear look around for various tutorial about monads (such tutorials were very popular several years ago).

Monads have cool application in asynchronous programming,  because we can use them to express a set of serial tasks – something we would write in shell like:
cat "something" | task1 | task2 | task3
This in  monadic terms can be written as: unit "something"  bind task1 bind task2  bind task3
If bind is represented as a pipe operator we can achieve same concise syntax as above.

For Javascript programers  this sounds quite similar to Promise –
and you are quite right – Promise is ‘kind of’ monad – see here.  Asyncio has similar construct called Future, which we will use later.

Python is multi-paradigm programming language, with a lot of functional features.  So let’s see if we can use monadic approach also with asyncio.  This could also serve as a small introduction into asyncio.

We start with this trivial example:

I assume output of this code would be obvious  (or left as exercise to a reader), more interesting for future comparisons  would be its running time:

Lets try now to implement the same with asyncio (using still python 3.4 syntax, 3.5 syntax  is indeed better, but 3.5 is not yet so common), first some common code:

and run those 3 trivial tasks in asyncio loop:

Same output, much more code, and it runs longer time:

Asyncio has indeed some overhead – tasks have to be scheduled and then picked by internal event loop. While it’s wasteful overhead for this trivial example, it’ll make more sense for complex cases  we’ll see later.

Now back to monads. Let’s create monadic wrapper for asyncio – Async class.  We can use constructor as unit function and our class also implements bind method. Since python is dynamic language, we can use Async for several types – plain values, callables and futures. We also add two overloaded  operators (| and >>) to represent the bind function.

Actual implementation of unit(constructor) and bind could be fairly simple, but for practical reasons we need to add more code to handle exceptions and cancellations – both have to be propagated through chained operations to the last one.

With our new class we can write (using >> operator):

This will add yet some small overhead to execution, but the code is definitely shorter:

Or we can do do the same with already defined coroutines and pipe operator (same semantic as >>, use whatever feels more convenient):

We can also modify our Async class for another great python3 library –  concurrent.futures, which creates asynchronous tasks in either thread or process pool:

With this class we can write:

Actual execution  (when thread pool is started) takes approximately same time as in asyncio:

We can try also process pool – here we have to consider one important limitation –  it can execute only serializable  (by pickle module) objects, so we cannot use lambda or local functions – only top level functions:

Process based execution has indeed much higher overhead:

All above examples were very simple – what about something more realistic – consider this task:  get some basic statistics about web servers usage.  For this task we  need to gather N random web sites, make a HEAD request to all of them and collect HTTP Server header from all responses.

Here is the solution in asyncio:

And result:

Apart of assurance that Apache still rules and some people are still using IIS (and surprise by  Cloudflare popularity), we see fairly compact, readable and powerful code thanks to asyncio.  There is one special thing to mention –  from perspective of connections handling this asyncio code  is highly parallel – so it basically tries to open all 1000 connections at once.  This can easily lead to “Too many open files” errors- as each socket represents one file handle. We can increase limit of open files per process, but better approach here is to limit number of concurrent request with semaphore.

We can rewrite that code with our Async monadic class:

Similar results, compact functional code …


None yet. Just impressed by asyncio and had some functional fun.

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code class="" title="" data-url=""> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong> <pre class="" title="" data-url=""> <span class="" title="" data-url="">