Asyncio Proxy for Blocking Functions

File operations and other IO operations can block asyncio loop and  unfortunately  python does not support true asynchronous disk operations (mainly due to problematic state of async disk IO in underlying os – aka linux – special library is need for true asynchronous disk operations  so normally select (or other IO event library) always reports file as ready to read and write and thus file IO operations block). Current solution is to run such operations in thread pool executor. There is asyncio wrapper library for file object – aiofiles, but there are also many blocking functions in other python modules – like os, shutil etc.  We can easily write wrappers for such methods, but it can be annoying and time consuming if we use many of such methods.   What about to write a generic proxy, which will assure that methods are executed in thread pool and use this proxy for all potentially blocking methods within the module. Continue reading Asyncio Proxy for Blocking Functions

Revival of Neural Networks

My actual master studies topic was AI (more then 20 years ago).   Artificial Neural Networks (ANNs) were already known and popular branch of AI and we had some introduction to basics of artificial neural networks (like perceptron, back propagation, etc.). Though it was  quite interesting topic, I had not seen many practical applications in those days.  Recently  I’ve chatted with old friend of mine,  who stayed in university and is involved in computer vision research, about recent advancements in AI and computer vision and he told me that most notable change in last years was that neural networks are being now used in large scale. Mainly due to increase in computing power neural networks now can are applied to many real world problems. Another hint about popularity of neural networks came from my former boss, who shared with me this interesting article –  about privacy issues related to machine learning. I’ve been looking around for a while and it looks like neural networks are becoming quite popular recently – especially architectures with many layers used in so called deep learning. In this article I’d like to share my initial experiences with TensorFlow, open source library (created by Google), which can be used to build modern, multi-layered neural networks. Continue reading Revival of Neural Networks

Next Adventure in Aurelia – Autocomplete Component

As I have written in this post I’m slowly getting into Aurelia Web UI framework. Recently I needed an autocomplete component.  My requirements were:

  • get suggestions from server via REST API (JSON payload)
  • simple, yet flexible (value can be calculated from received suggestions)
  • flexible ways to display suggestions (ideally provide a template to display suggestions values)
  • suggest values as one types matching starting letters (case insensitive and diacritics insensitive)
  • cache server responses for efficiency
  • can use both mouse and keys (arrows + enter) to select suggestion

I’m sure that existing autocomplete components like typeahead or JQuery UI Autocomplete would serve my purpose quite well. It’s fairly easy to integrate existing components from other frameworks into Aurelia (see this post for instance).  But I decided to create my own, using only Aurelia (and a bit of JQuery – but it can be easily rewritten in pure DOM API – JQuery is used just for convenience, because it’s anyhow  used in Aurelia skeleton app). I though it would be nice learning exercise (and it really was) and also we programmers (especially leisure programmers like myself) do like to reinvent the wheel, right? (But it’s not always bad idea – image world when only one type of wheel exists – by recreating existing solutions, improving them, changing them we can also achieve progress – Web UI frameworks themselves are good example of such progress).  I’d like to share my experiences further in this article. Continue reading Next Adventure in Aurelia – Autocomplete Component

WAMP Is WebSocket on Steroids

If you look for WAMP abbreviation over Internet, you will probably find that WAMP = Windows + Apache + MySQL + PHP – which was popular web stack some time ago (but who wants to run web server on Windows today?  And other components  now also have  viable alternatives).   But in this article I’d like to talk about WAMP = Web Application Messaging Protocol.  WAMP is available  as WebSocket subprotocol, but also can work on plain TCP or Unix domain sockets. Continue reading WAMP Is WebSocket on Steroids

Run Multiple Terminal Tabs with Python Virtualenv

Virtualenv is a must have for python development.  If your project is a complex beast consisting  of multiple services/components you want them see running  in different terminals  (ideally tabs of one terminal window).  Staring all terminal manually could be cumbersome. This simple script starts terminal tabs (in gnome-terminal) with activated virtual environments and eventually appropriate services/applications started:

None -ci argument –  interactive shell must be enforced to run command with virtual environment loaded.w

Also gnome terminal recently drop support for –title parameter, which enabled to set title to the tab (really do not understand why, because it was very useful).   So now our tabs will have same prompt.

This can be somehow fixed with modification of virtualenv activate script to include terminal escape sequence  shown below (thus we will see current terminal directory as tab title):


Functional Fun with Asyncio and Monads

Python 3.4+ provides excellent Asyncio library for asynchronous tasks scheduling and asynchronous I/O operations.   It’s similar to gevent, but here tasks are implemented by generator based coroutines.  Asynchronous I/O is useful for higher I/O loads, where it usually achieves better performance and scalability then other approaches (threads, processes). About a year ago I played with OCaml, where light weight threads/ coroutines and asynchronous I/O  approaches  are also very popular (Ocaml has same limitation for threading as Python – a global lock) and there were two great libraries – lwt and core async.  Both libraries use monads as a programming style to work with asynchronous tasks. In this article we will try to implement something similar on basis of asyncio library. While our solution will  probably not provide “pure” monads it’ll still be fun and we’ll learn something about asyncio. Continue reading Functional Fun with Asyncio and Monads

SQL or NoSQL – Why not to use both (in PostgreSQL)

NoSQL databases have become very popular in last years and there is a plenty of various options available. It looks like traditional relational databases (RDBMs) are almost not needed any more. NoSQL solutions are advertised as faster, more scalable and easier to use. So who would care about relations, joins, foreign keys and similar stuff (not talking about ACID properties, transactions, transaction isolation)? Who would,  if NoSQLs can make your life much easier. But there is a key insight about NoSQL databases – their wonderful achievements are possible because they made their life easier too is some aspects. But that comes with some price – would you be happy, if your bank will store your saving in MongoDb?

However there are many environments, where NoSQL databases shine – especially when there are huge amounts of simple data structures, which need to be scaled massively across the globe and where these data are not of much value – solutions like social networks, instant messaging etc. are not so much concerned about data consistency or data loss, because these data are basically valueless. (Their business model is just based on sharing absolutely trivial data, where one piece can be easily replaced with another and it does not matter if some pieces are lost. Consider – what will happen if whole Facebook will go away in one minute? Nothing! Few people will be pissed off because they think their online profile was cool, few sad that they cannot share their meaningless achievements with so called ‘friends’, but generally considered nothing special will happen and no real value will be lost. People will just switch to another provider and fill it’s database with tons of trivialities and will easily forget about data in their previous account).

I don’t want to create impression that NoSQL databases are useless, they are very good for certain scenarios (and we need to remember that NoSQL is rather broad category, it includes structured documents stores, key-value stores, object databases etc. – each one has it’s particular niche, where it excels), but relational databases are also good, actually very good. Relational model is fairly good abstraction of very many real world situations, data structures, entities, however we call them. And relational databases provide solid tools to works with them. So it make sense to use them in many cases. It might bit more difficult to start with relational database then with schema-less document store, but  in the long run it should pay off. And what is really nice it’s not about one or another solution, but we can use both and combine them smartly and inventively.
So enough of general mumbo jumbo – let’s get to my particular case – I’ve been looking for data store for my new project and considered to try MongoDb this time ( while in past I stuck to relational DBs), however finally decided for PostgreSQL (again) – and I’d like to share some tests, findings and thoughts. Continue reading SQL or NoSQL – Why not to use both (in PostgreSQL)

Starting with Aurelia – Pagination with Back and Sort

I do not like very much programming of User Interfaces (UIs) and frankly spoken I’m not very good at that, but alas sometimes UIs are necessary so I have to try my best. Many recent applications use web browser as UI, and  situation here is  quite messy ( see this nice article about JS Frameworks Fatigue).  Last time I was involved with web UIs I had utilized Backbone with Django based RESTful server.  Recently I’ve decided to rewrite MyBookshelf application with modern technonogies (it’s about 8 years old, which is something like prehistory considering changes in web development).  New architecture should be based on RESTful services and Single Page Application (SPA) relying on recent browser capabilities.   I’ve have been looking around and found that Backbone is already almost forgotten and we have two new stars on the stage – AngujarJS and React – I have very quickly looked at both and finally decide for another framework Aurelia. Continue reading Starting with Aurelia – Pagination with Back and Sort

Parsing PDF for Fun And Profit (indeed in Python)

PDF documents are ubiquitous in today’s world. Apart of common use cases of printing, viewing etc. we need sometimes do something specific with them- like convert tehm to other formats or extract textual content.  Extracting text from PDF document can be (surprisingly) hard task due to the purpose and design of PDF documents.  PDF is intended to represent exact visual representation of document ‘s pages down to the smallest details. And internal representation of document text is following this goal.  Rather the storing text in some logical units (lines, paragraphs, columns, tables …), text is represented as series of commands, which print characters (can be a single character, word, part of line, …) at exact position on the page with given font, font size, color, etc.   In order to reconstruct original text logical structure program  has to scan  all these commands and join together texts, which were probably forming same line or same paragraph.  This task can be pretty demanding and ambiguous –  mutual position of text boxes can be interpreted in various ways ( is this space between words too large because they are in different columns or line is justified to both ends?).

So the task of text extraction looks quite discouraging to try, luckily some smart guys have tried it already and left us with libraries that are doing pretty good job and we can leverage them. Some time ago I’ve created tool called PDF Checker, which does some analysis of PDF document content (presence, absence of some phrases,  paragraphs numbering, footers format etc.). I used there excellent Python PDFMiner library.   PDFMiner is a grea tool and it is quite flexible, but being all written in Python it’s rather slow.   Recently I’ve been looking for some alternatives,  which have Python bindings and provide functionality similar to PDFMiner.  In this article I describe some results of this search, particularly my experiences with libpoppler. Continue reading Parsing PDF for Fun And Profit (indeed in Python)

My Digital Bits And Pieces