Python 3.4+ provides excellent Asyncio library for asynchronous tasks scheduling and asynchronous I/O operations. It’s similar to gevent, but here tasks are implemented by generator based coroutines. Asynchronous I/O is useful for higher I/O loads, where it usually achieves better performance and scalability then other approaches (threads, processes). About a year ago I played with OCaml, where light weight threads/ coroutines and asynchronous I/O approaches are also very popular (Ocaml has same limitation for threading as Python – a global lock) and there were two great libraries – lwt and core async. Both libraries use monads as a programming style to work with asynchronous tasks. In this article we will try to implement something similar on basis of asyncio library. While our solution will probably not provide “pure” monads it’ll still be fun and we’ll learn something about asyncio. Continue reading Functional Fun with Asyncio and Monads
NoSQL databases have become very popular in last years and there is a plenty of various options available. It looks like traditional relational databases (RDBMs) are almost not needed any more. NoSQL solutions are advertised as faster, more scalable and easier to use. So who would care about relations, joins, foreign keys and similar stuff (not talking about ACID properties, transactions, transaction isolation)? Who would, if NoSQLs can make your life much easier. But there is a key insight about NoSQL databases – their wonderful achievements are possible because they made their life easier too is some aspects. But that comes with some price – would you be happy, if your bank will store your saving in MongoDb?
However there are many environments, where NoSQL databases shine – especially when there are huge amounts of simple data structures, which need to be scaled massively across the globe and where these data are not of much value – solutions like social networks, instant messaging etc. are not so much concerned about data consistency or data loss, because these data are basically valueless. (Their business model is just based on sharing absolutely trivial data, where one piece can be easily replaced with another and it does not matter if some pieces are lost. Consider – what will happen if whole Facebook will go away in one minute? Nothing! Few people will be pissed off because they think their online profile was cool, few sad that they cannot share their meaningless achievements with so called ‘friends’, but generally considered nothing special will happen and no real value will be lost. People will just switch to another provider and fill it’s database with tons of trivialities and will easily forget about data in their previous account).
I don’t want to create impression that NoSQL databases are useless, they are very good for certain scenarios (and we need to remember that NoSQL is rather broad category, it includes structured documents stores, key-value stores, object databases etc. – each one has it’s particular niche, where it excels), but relational databases are also good, actually very good. Relational model is fairly good abstraction of very many real world situations, data structures, entities, however we call them. And relational databases provide solid tools to works with them. So it make sense to use them in many cases. It might bit more difficult to start with relational database then with schema-less document store, but in the long run it should pay off. And what is really nice it’s not about one or another solution, but we can use both and combine them smartly and inventively.
So enough of general mumbo jumbo – let’s get to my particular case – I’ve been looking for data store for my new project and considered to try MongoDb this time ( while in past I stuck to relational DBs), however finally decided for PostgreSQL (again) – and I’d like to share some tests, findings and thoughts. Continue reading SQL or NoSQL – Why not to use both (in PostgreSQL)
I do not like very much programming of User Interfaces (UIs) and frankly spoken I’m not very good at that, but alas sometimes UIs are necessary so I have to try my best. Many recent applications use web browser as UI, and situation here is quite messy ( see this nice article about JS Frameworks Fatigue). Last time I was involved with web UIs I had utilized Backbone with Django based RESTful server. Recently I’ve decided to rewrite MyBookshelf application with modern technonogies (it’s about 8 years old, which is something like prehistory considering changes in web development). New architecture should be based on RESTful services and Single Page Application (SPA) relying on recent browser capabilities. I’ve have been looking around and found that Backbone is already almost forgotten and we have two new stars on the stage – AngujarJS and React – I have very quickly looked at both and finally decide for another framework Aurelia. Continue reading Starting with Aurelia – Pagination with Back and Sort
PDF documents are ubiquitous in today’s world. Apart of common use cases of printing, viewing etc. we need sometimes do something specific with them- like convert tehm to other formats or extract textual content. Extracting text from PDF document can be (surprisingly) hard task due to the purpose and design of PDF documents. PDF is intended to represent exact visual representation of document ‘s pages down to the smallest details. And internal representation of document text is following this goal. Rather the storing text in some logical units (lines, paragraphs, columns, tables …), text is represented as series of commands, which print characters (can be a single character, word, part of line, …) at exact position on the page with given font, font size, color, etc. In order to reconstruct original text logical structure program has to scan all these commands and join together texts, which were probably forming same line or same paragraph. This task can be pretty demanding and ambiguous – mutual position of text boxes can be interpreted in various ways ( is this space between words too large because they are in different columns or line is justified to both ends?).
So the task of text extraction looks quite discouraging to try, luckily some smart guys have tried it already and left us with libraries that are doing pretty good job and we can leverage them. Some time ago I’ve created tool called PDF Checker, which does some analysis of PDF document content (presence, absence of some phrases, paragraphs numbering, footers format etc.). I used there excellent Python PDFMiner library. PDFMiner is a grea tool and it is quite flexible, but being all written in Python it’s rather slow. Recently I’ve been looking for some alternatives, which have Python bindings and provide functionality similar to PDFMiner. In this article I describe some results of this search, particularly my experiences with libpoppler. Continue reading Parsing PDF for Fun And Profit (indeed in Python)
I’ve have been aware of Cython for a few years but newer had chance to really test it in practice (apart of few dummy exercises). Recently I’ve decided to look at it again and test it on my old project adecapcha. I was quite pleased with results, where I was able speed up the program significantly with minimum changes to the code. Continue reading Cython Is As Good As Advertised
Openshift Online still remains one of most generous Paas offerings on the market. With 3 free containers it’s really good bargain. Recently I’ve modified couple of my older applications to run in Openshift (myplaces and iching) to run in Openshift.
Previously I’ve created pretty standard and simple Flask application and deployed it on Openshift. The process was pretty straightforward as described in this article. However now situation was different, because both applications are special. Continue reading Openshift – Second Thoughts
Recently I’ve been reviving 2 years old Django application (myplaces) (from version 1.5.5 to latest version 1.9) and I was very unpleasantly surprised how tedious it was. As Django evolved some features got deprecated and removed and must have been replaced in the code. And it’s not only Django but also other contributed libraries are evolving as rapidly. In my application I was using django-rest-framework, which changed so significantly in version 3, that I cannot use it in my application without basically rebuilding the whole application.
Some of the changes might be necessary, but many where just cosmetic changes in names ( mimetype -> content_type, etc.), which I do not see as much of value add. Even core python still keeps bit of naming fuss in favour of backward compatibility ( for instance string.startswith, string.endswith made it till ver.3, even if they are not in line with PEP008 – python naming standards).
But it’s not only about changes of interface between versions (there is a fair process to deprecate features so when one follows development, it’s relatively easy to stay up to date), but it’s mainly all concept of the Django. Django was created more then 10 years ago, when web development was focused around servers and everything happened there. But situation changed radically ( as I have written some time ago). Now a lot of things is happening in the browser and you can have complete applications running there (recently I discovered this cool application, which is running almost completely in the browser, it’s just using a stream of events from the server). Accordingly servers now are used more to provide APIs to browser applications or to route real time communication to/from/between browsers. Continue reading Farewell Django
Emails are still one of the most important means of electronic communication. Apart of everyday usage with some convenient client ( like superb Thunderbird), from time to time one might need to get messages content out of the mailbox and perform some bulk action(s) with it – an example could be to download all image attachments from your mailbox into some folder – this can be done easily manually for few emails, but what if there is 10 thousands of emails? Your mailbox is usually hosted on some server and you can access it via IMAP protocol. There are many possible ways how to achieve this, however most of them require to download or synchronize full mailbox locally and then extract required parts from messages and process them. This could be very inefficient indeed. Recently I have a need for automated task like one above – search messages in particular IMAP mailbox, identify attachments of certain type and name and download then and run a command with them, after command is finished successfully delete email (or move it to other folder). Looking around I did not found anything suitable, which would meet my requirements (Linux, command line, simple yet powerful). So having some experiences with IMAP and python, I decided to write such tool myself. It’s called imap_detach, and you can check details on it’s page. Here I’d like to present couple of use cases for this tool in hope they might be useful for people with similar email processing needs.
Sometimes you need to test a terminal application, which reads user inputs from terminal and prints results to terminal. These tasks are very common in introductory programming courses. Simple testing tool can help here, and students can learn good practices – automatic testing – from the very beginning. I’ve been looking around and does not find anything, simple enough, that ie can be used by beginner and provide basic actions – for testing output of program and supplying inpu to itt. So I created such tool – simpletest. Continue reading Testing Terminal Apps
From time to time one might need to write simple language parser to implement some domain specific language for his application. As always python ecosystem offers various solutions – overview of python parser generators is available here. In this article I’d like to describe my experiences with parsimonious package. For recent project of mine ( imap_detach – a tool to automatically download attachment from IMAP mailbox) I needed simple expressions to specify what emails and what exact parts should be downloaded. Continue reading Writing Simple Parser in Python