WAMP Is WebSocket on Steroids

If you look for WAMP abbreviation over Internet, you will probably find that WAMP = Windows + Apache + MySQL + PHP – which was popular web stack some time ago (but who wants to run web server on Windows today?  And other components  now also have  viable alternatives).   But in this article I’d like to talk about WAMP = Web Application Messaging Protocol.  WAMP is available  as WebSocket subprotocol, but also can work on plain TCP or Unix domain sockets. Continue reading WAMP Is WebSocket on Steroids

Run Multiple Terminal Tabs with Python Virtualenv

Virtualenv is a must have for python development.  If your project is a complex beast consisting  of multiple services/components you want them see running  in different terminals  (ideally tabs of one terminal window).  Staring all terminal manually could be cumbersome. This simple script starts terminal tabs (in gnome-terminal) with activated virtual environments and eventually appropriate services/applications started:

None -ci argument –  interactive shell must be enforced to run command with virtual environment loaded.w

Also gnome terminal recently drop support for –title parameter, which enabled to set title to the tab (really do not understand why, because it was very useful).   So now our tabs will have same prompt.

This can be somehow fixed with modification of virtualenv activate script to include terminal escape sequence  shown below (thus we will see current terminal directory as tab title):


Functional Fun with Asyncio and Monads

Python 3.4+ provides excellent Asyncio library for asynchronous tasks scheduling and asynchronous I/O operations.   It’s similar to gevent, but here tasks are implemented by generator based coroutines.  Asynchronous I/O is useful for higher I/O loads, where it usually achieves better performance and scalability then other approaches (threads, processes). About a year ago I played with OCaml, where light weight threads/ coroutines and asynchronous I/O  approaches  are also very popular (Ocaml has same limitation for threading as Python – a global lock) and there were two great libraries – lwt and core async.  Both libraries use monads as a programming style to work with asynchronous tasks. In this article we will try to implement something similar on basis of asyncio library. While our solution will  probably not provide “pure” monads it’ll still be fun and we’ll learn something about asyncio. Continue reading Functional Fun with Asyncio and Monads

SQL or NoSQL – Why not to use both (in PostgreSQL)

NoSQL databases have become very popular in last years and there is a plenty of various options available. It looks like traditional relational databases (RDBMs) are almost not needed any more. NoSQL solutions are advertised as faster, more scalable and easier to use. So who would care about relations, joins, foreign keys and similar stuff (not talking about ACID properties, transactions, transaction isolation)? Who would,  if NoSQLs can make your life much easier. But there is a key insight about NoSQL databases – their wonderful achievements are possible because they made their life easier too is some aspects. But that comes with some price – would you be happy, if your bank will store your saving in MongoDb?

However there are many environments, where NoSQL databases shine – especially when there are huge amounts of simple data structures, which need to be scaled massively across the globe and where these data are not of much value – solutions like social networks, instant messaging etc. are not so much concerned about data consistency or data loss, because these data are basically valueless. (Their business model is just based on sharing absolutely trivial data, where one piece can be easily replaced with another and it does not matter if some pieces are lost. Consider – what will happen if whole Facebook will go away in one minute? Nothing! Few people will be pissed off because they think their online profile was cool, few sad that they cannot share their meaningless achievements with so called ‘friends’, but generally considered nothing special will happen and no real value will be lost. People will just switch to another provider and fill it’s database with tons of trivialities and will easily forget about data in their previous account).

I don’t want to create impression that NoSQL databases are useless, they are very good for certain scenarios (and we need to remember that NoSQL is rather broad category, it includes structured documents stores, key-value stores, object databases etc. – each one has it’s particular niche, where it excels), but relational databases are also good, actually very good. Relational model is fairly good abstraction of very many real world situations, data structures, entities, however we call them. And relational databases provide solid tools to works with them. So it make sense to use them in many cases. It might bit more difficult to start with relational database then with schema-less document store, but  in the long run it should pay off. And what is really nice it’s not about one or another solution, but we can use both and combine them smartly and inventively.
So enough of general mumbo jumbo – let’s get to my particular case – I’ve been looking for data store for my new project and considered to try MongoDb this time ( while in past I stuck to relational DBs), however finally decided for PostgreSQL (again) – and I’d like to share some tests, findings and thoughts. Continue reading SQL or NoSQL – Why not to use both (in PostgreSQL)

Starting with Aurelia – Pagination with Back and Sort

I do not like very much programming of User Interfaces (UIs) and frankly spoken I’m not very good at that, but alas sometimes UIs are necessary so I have to try my best. Many recent applications use web browser as UI, and  situation here is  quite messy ( see this nice article about JS Frameworks Fatigue).  Last time I was involved with web UIs I had utilized Backbone with Django based RESTful server.  Recently I’ve decided to rewrite MyBookshelf application with modern technonogies (it’s about 8 years old, which is something like prehistory considering changes in web development).  New architecture should be based on RESTful services and Single Page Application (SPA) relying on recent browser capabilities.   I’ve have been looking around and found that Backbone is already almost forgotten and we have two new stars on the stage – AngujarJS and React – I have very quickly looked at both and finally decide for another framework Aurelia. Continue reading Starting with Aurelia – Pagination with Back and Sort

Parsing PDF for Fun And Profit (indeed in Python)

PDF documents are ubiquitous in today’s world. Apart of common use cases of printing, viewing etc. we need sometimes do something specific with them- like convert tehm to other formats or extract textual content.  Extracting text from PDF document can be (surprisingly) hard task due to the purpose and design of PDF documents.  PDF is intended to represent exact visual representation of document ‘s pages down to the smallest details. And internal representation of document text is following this goal.  Rather the storing text in some logical units (lines, paragraphs, columns, tables …), text is represented as series of commands, which print characters (can be a single character, word, part of line, …) at exact position on the page with given font, font size, color, etc.   In order to reconstruct original text logical structure program  has to scan  all these commands and join together texts, which were probably forming same line or same paragraph.  This task can be pretty demanding and ambiguous –  mutual position of text boxes can be interpreted in various ways ( is this space between words too large because they are in different columns or line is justified to both ends?).

So the task of text extraction looks quite discouraging to try, luckily some smart guys have tried it already and left us with libraries that are doing pretty good job and we can leverage them. Some time ago I’ve created tool called PDF Checker, which does some analysis of PDF document content (presence, absence of some phrases,  paragraphs numbering, footers format etc.). I used there excellent Python PDFMiner library.   PDFMiner is a grea tool and it is quite flexible, but being all written in Python it’s rather slow.   Recently I’ve been looking for some alternatives,  which have Python bindings and provide functionality similar to PDFMiner.  In this article I describe some results of this search, particularly my experiences with libpoppler. Continue reading Parsing PDF for Fun And Profit (indeed in Python)

Cython Is As Good As Advertised

I’ve have been aware of Cython for a few years but newer had chance to really test it in practice (apart of few dummy exercises).  Recently I’ve decided to look at it again and test it on my old project adecapcha. I was quite pleased with results, where I was able speed up the program significantly with minimum changes to the code. Continue reading Cython Is As Good As Advertised

Openshift – Second Thoughts

Openshift Online still remains one of most generous Paas offerings on the market. With 3 free containers it’s really good bargain. Recently I’ve modified  couple of my older applications to run in Openshift (myplaces and iching) to run in Openshift.

Previously I’ve created pretty standard and simple Flask application and deployed it on Openshift. The process was pretty straightforward as described in this article. However now situation was different, because both applications are special. Continue reading Openshift – Second Thoughts

Farewell Django

Recently I’ve been reviving 2 years old Django application (myplaces)  (from version 1.5.5 to latest version 1.9) and I was very unpleasantly surprised how tedious it was.   As Django  evolved  some features got deprecated and removed and must have been replaced in the code.  And it’s not only Django but also other contributed libraries are evolving as rapidly.   In my application I was using django-rest-framework,  which changed so significantly in version 3, that I cannot use it in my application without basically rebuilding the whole application.

Some of the changes might be necessary, but many where just cosmetic changes in names ( mimetype -> content_type, etc.), which I do not see as much of value add.  Even core python still keeps bit of naming fuss in favour of backward  compatibility ( for instance string.startswith, string.endswith made it till ver.3,  even if they are not in line with PEP008 – python naming standards).

But it’s not only about changes of interface between versions (there is a fair process to deprecate features so when one follows development,  it’s relatively easy to stay up to date), but it’s mainly all concept of the Django. Django was created more then 10 years ago, when web development was focused around servers and everything happened there.  But situation changed radically ( as I have written some time ago).  Now a lot of things is happening in the browser and you can have complete applications running  there (recently I discovered this cool application, which is running almost completely in the browser, it’s just using  a stream of events from the server).  Accordingly servers now are used more to provide APIs to browser applications or to route real time communication to/from/between browsers. Continue reading Farewell Django

My Digital Bits And Pieces