In python newly created sub-process inherits file descriptors from parent process and these descriptors are left open – at least this was default till python ver. 3.3. subprocces.Popen
constructor has parameter close_fds
(defaults to False on python ver. 2.7), which can say if to close inherited FDs or not. Leaving them open FDs for child process can lead to many problems as explained here and here. Continue reading Subtle evil of close_fds parameter in subprocess.Popen
Category Archives: Programming
Opa – Mixed Impressions
Coming little bit late to Opa (looks like real hype was couple years ago), I was still caught by this interesting new language. Opa is a new language – cross-over between JavaScript ( providing JS like syntax) and OCaml (using many functional programming idioms from that language , plus Opa complier is written in OCaml). Opa is used solely to program web applications – so Opa is both language and web framework. Opa compiles to JavaScript, which on client side runs in a browser and on server side in node.js. You write just one Opa code and compiler decided, where the code should run.
I have spent some time looking into Opa recently – mainly trying some of Opa tutorial plus and doing some small experiments myself and I’d like to share my experiences and impressions. Continue reading Opa – Mixed Impressions
Ocaml, Ocsigen, I Ching and Web Applications
Reading recently great Philip K Dick novel The Man in the High Castle I learned about I Ching – ancient Chinese philosophical, cosmological, but mainly divination text. I’m no big fan of divination, so in case of I Ching I would generally agree with this critical review. However the procedure of divination used within I Ching is quite interesting – hexagrams actually represent one of oldest binary codes. Idea that one’s fortune could be represented by 6 bits (actually it’s 12 bits, because for practical divination purpose we use 6 x 4 states) is quite amusing. So I decided to create online I Ching application as an exercise to learn bit more about Ocsigen web framework. You can check result of my effort here. Continue reading Ocaml, Ocsigen, I Ching and Web Applications
Plugins in OCAML with Dynlink library
I slowly continue with learning of OCAML – as a training project I work on simplified Map-Reduce framework (utilizing Core and Async libraries). Here I had a need to plug a selectable code (map reduce algorithm) to main program. Ocaml provides Dynlink library, which can dynamically link either byte-code or native object/library to running program. This can be utilize to create simple plugin framework as explained below. Continue reading Plugins in OCAML with Dynlink library
Streaming video file from BitTorrent P2P network
Although BitTorrent (BT) protocol was not designed for media streaming, in practice it can be used, with certain extent, to stream a video file from P2P network. Key trick is to force sequential download in BT client (normally BT client selects first pieces, that are least available in swarm, which contributes to better distribution of the file, sequential download is playing against it, so it is not enabled in regular BT clients).
But if we force BT client to download sequentially, cache incoming pieces and have enough incoming bandwidth from peers, we can stream incoming video directly into video player. Indeed it’s a poor man streaming, because it lacks any advanced features like stream synchronization, stream seeking etc., but in many cases it works just good enough. Continue reading Streaming video file from BitTorrent P2P network
Ocaml performance – native code vs byte code
As noted in this post I’ve been exploring OCaml language. OCaml is a compiled language and can be compiled either to native code (for supported platforms) or to byte code, which runs interpreted in provided run-time environment. I was wondering, what is performance difference between these two target codes. Continue reading Ocaml performance – native code vs byte code
Not Always PyPy Is Faster
PyPy is an alternative Python interpreter, which is known for it’s speed. However it does not have to be always faster as ‘classic’ Python interpreter (called here CPython). For one small project of mine – PDF Checker – I was testing PyPy hoping to speed up PDF document processing (basically parsing to extract text – pdfminer library is used and document parsing takes majority of time). Below are results from running program for two different files and in CPython interpreter or in PyPy (with JIT and without JIT compilation):
CPython | PyPy | PyPy with JIT disabled | |
Small PDF (110kB) | 1.1 s | 2.4 s | 2.5 s |
Big PDF (996kB) | 16.6 s | 10.9 s | 36.5 s |
Decoding Audio Captchas in Python
For good or bad many sites are now using CAPTCHAs to determine if visitor is human or computer program. Captcha presents a task – usually reading some distorted letters and writing them back to a form. This is considered to be hard for computer to do, so user must be human. To improve accessibility visual captchas are accompanied by audio captchas, where letters are spelled (usually with some background noise to make letters recognition more difficult) . However audio captchas are know to be easier to break. Inspired by this article [1] I created a python implementation of audio captchas decoding using commonly available libraries and with just a general knowledge of speech recognition technologies. Software is called adecaptcha and I tested it on couple of sites, where I got 99.5% accuracy of decoded letters for one site and 90% accuracy for other site (which has much distorted audio). Continue reading Decoding Audio Captchas in Python
Running uWSGI for gevent enabled application
Gevent is a great library that uses greenlets (a Python co-routine library) to enable asynchronous I/O, while providing API that looks like normal synchronous API, so it’s easier to use and understand. The async magic is done automatically by Gevent, which is running an event loop on background and switching between coroutines as necessary.
This approach can be very useful for concurrent applications, which spend a lot of time in waiting for I/O. Like web applications – so Gevent is popular there. For certain type of workloads it can be quite useful – it can enable higher concurrency, while using less resources (greenlet is much lighter then thread or process). Continue reading Running uWSGI for gevent enabled application
APEX Application to View Log Files
Oracle APEX is keeping all data in database and makes it easy to create different reports for tables or views. But what if we want to present something outside of database? Like text log files – how this could be done in APEX? For regular web server it is a trivial task – usually simple configuration of web server enables to list directory and download any files from it (and it probably would be easiest way to do it). But what if we need to integrate logs browsing into APEX application? Actually there is a way to list and serve files even in APEX, if it is required. Continue reading APEX Application to View Log Files