Starting with Aurelia – Pagination with Back and Sort

I do not like very much programming of User Interfaces (UIs) and frankly spoken I’m not very good at that, but alas sometimes UIs are necessary so I have to try my best. Many recent applications use web browser as UI, and  situation here is  quite messy ( see this nice article about JS Frameworks Fatigue).  Last time I was involved with web UIs I had utilized Backbone with Django based RESTful server.  Recently I’ve decided to rewrite MyBookshelf application with modern technonogies (it’s about 8 years old, which is something like prehistory considering changes in web development).  New architecture should be based on RESTful services and Single Page Application (SPA) relying on recent browser capabilities.   I’ve have been looking around and found that Backbone is already almost forgotten and we have two new stars on the stage – AngujarJS and React – I have very quickly looked at both and finally decide for another framework Aurelia. Continue reading Starting with Aurelia – Pagination with Back and Sort

Parsing PDF for Fun And Profit (indeed in Python)

PDF documents are ubiquitous in today’s world. Apart of common use cases of printing, viewing etc. we need sometimes do something specific with them- like convert tehm to other formats or extract textual content.  Extracting text from PDF document can be (surprisingly) hard task due to the purpose and design of PDF documents.  PDF is intended to represent exact visual representation of document ‘s pages down to the smallest details. And internal representation of document text is following this goal.  Rather the storing text in some logical units (lines, paragraphs, columns, tables …), text is represented as series of commands, which print characters (can be a single character, word, part of line, …) at exact position on the page with given font, font size, color, etc.   In order to reconstruct original text logical structure program  has to scan  all these commands and join together texts, which were probably forming same line or same paragraph.  This task can be pretty demanding and ambiguous –  mutual position of text boxes can be interpreted in various ways ( is this space between words too large because they are in different columns or line is justified to both ends?).

So the task of text extraction looks quite discouraging to try, luckily some smart guys have tried it already and left us with libraries that are doing pretty good job and we can leverage them. Some time ago I’ve created tool called PDF Checker, which does some analysis of PDF document content (presence, absence of some phrases,  paragraphs numbering, footers format etc.). I used there excellent Python PDFMiner library.   PDFMiner is a grea tool and it is quite flexible, but being all written in Python it’s rather slow.   Recently I’ve been looking for some alternatives,  which have Python bindings and provide functionality similar to PDFMiner.  In this article I describe some results of this search, particularly my experiences with libpoppler. Continue reading Parsing PDF for Fun And Profit (indeed in Python)

Cython Is As Good As Advertised

I’ve have been aware of Cython for a few years but newer had chance to really test it in practice (apart of few dummy exercises).  Recently I’ve decided to look at it again and test it on my old project adecapcha. I was quite pleased with results, where I was able speed up the program significantly with minimum changes to the code. Continue reading Cython Is As Good As Advertised

Openshift – Second Thoughts

Openshift Online still remains one of most generous Paas offerings on the market. With 3 free containers it’s really good bargain. Recently I’ve modified  couple of my older applications to run in Openshift (myplaces and iching) to run in Openshift.

Previously I’ve created pretty standard and simple Flask application and deployed it on Openshift. The process was pretty straightforward as described in this article. However now situation was different, because both applications are special. Continue reading Openshift – Second Thoughts

Farewell Django

Recently I’ve been reviving 2 years old Django application (myplaces)  (from version 1.5.5 to latest version 1.9) and I was very unpleasantly surprised how tedious it was.   As Django  evolved  some features got deprecated and removed and must have been replaced in the code.  And it’s not only Django but also other contributed libraries are evolving as rapidly.   In my application I was using django-rest-framework,  which changed so significantly in version 3, that I cannot use it in my application without basically rebuilding the whole application.

Some of the changes might be necessary, but many where just cosmetic changes in names ( mimetype -> content_type, etc.), which I do not see as much of value add.  Even core python still keeps bit of naming fuss in favour of backward  compatibility ( for instance string.startswith, string.endswith made it till ver.3,  even if they are not in line with PEP008 – python naming standards).

But it’s not only about changes of interface between versions (there is a fair process to deprecate features so when one follows development,  it’s relatively easy to stay up to date), but it’s mainly all concept of the Django. Django was created more then 10 years ago, when web development was focused around servers and everything happened there.  But situation changed radically ( as I have written some time ago).  Now a lot of things is happening in the browser and you can have complete applications running  there (recently I discovered this cool application, which is running almost completely in the browser, it’s just using  a stream of events from the server).  Accordingly servers now are used more to provide APIs to browser applications or to route real time communication to/from/between browsers. Continue reading Farewell Django