Tag Archives: python

Streaming video file from BitTorrent P2P network

Although BitTorrent (BT) protocol was not designed for media streaming, in practice it can be used, with certain extent, to stream a video file from P2P network. Key trick is to force sequential download in BT client (normally BT client selects first pieces, that are least available in swarm,  which contributes to better distribution of the file, sequential download is playing against it, so it is not enabled in regular BT clients).

But if we force BT client to download sequentially, cache incoming pieces and have enough incoming bandwidth from peers, we can stream incoming video directly into video player.  Indeed it’s a poor man streaming, because it lacks any advanced features like stream synchronization, stream seeking etc., but in many cases it works just good enough. Continue reading Streaming video file from BitTorrent P2P network

Not Always PyPy Is Faster

PyPy is an alternative Python interpreter, which is known for it’s speed.  However it does not have to be always faster as ‘classic’ Python interpreter (called here CPython). For one small project of mine – PDF Checker – I was testing PyPy hoping to speed up PDF document processing (basically parsing to extract text – pdfminer library is used and document parsing takes majority of time).  Below are results from running program for two different files and in CPython interpreter or in PyPy (with JIT and without JIT compilation):

CPython PyPy PyPy with JIT disabled
Small PDF (110kB)  1.1 s  2.4 s  2.5 s
Big PDF (996kB)  16.6 s  10.9 s  36.5 s

Continue reading Not Always PyPy Is Faster

Decoding Audio Captchas in Python

For good or bad many sites are now using CAPTCHAs to determine if visitor is human or computer program. Captcha presents a task – usually reading some distorted letters  and writing them back to a form.  This is considered to be hard for computer to do, so user must be human.  To improve accessibility visual captchas are accompanied by audio captchas, where letters are spelled (usually with some background noise to make letters recognition more difficult) .  However audio captchas are know to be easier to break.  Inspired by this article [1]  I created a python implementation of audio captchas decoding using commonly available libraries and with just a general knowledge of speech recognition  technologies. Software is called adecaptcha and I tested it on couple of sites, where I got 99.5% accuracy of decoded letters for one site and 90% accuracy for other site (which has much distorted audio). Continue reading Decoding Audio Captchas in Python

Running uWSGI for gevent enabled application

Gevent is a great library that uses greenlets (a Python co-routine library) to enable asynchronous I/O, while providing  API that looks like normal synchronous API, so it’s  easier to use and understand.  The async magic is done automatically by Gevent, which is running an event loop on background and switching between coroutines as necessary.

This approach can be very useful for concurrent applications, which spend a lot of time in waiting for I/O.  Like web applications – so Gevent is popular there.  For certain type of workloads it can be quite useful – it can enable higher concurrency,  while using less resources (greenlet is much lighter then thread or  process). Continue reading Running uWSGI for gevent enabled application

Hiding Secret Message in Unicode Text

An art of hiding secret message into another innocent looking message is called steganography and it is an old discipline, where techniques like invisible ink, micro dots have been used.   With rise of digital technologies new possibilities for stenography  appeared and attracted interest of computer scientists and fans.     Common approach is to hide secret information into multimedia files – pictures,  music,  videos ….  Main advantages here are omnipresence of media today,  significant size of media file,   so there is enough space for additional information and the nature of the media format, which often enables to hide information in very clever way( if you change last bit of color information for a pixel in an image it is unidentifiable  by human eye).   But we can also hide secret messages in regular text, especially if we are using Unicode text encoding (which is now very common).

Continue reading Hiding Secret Message in Unicode Text

Protecting Django Application Against Brute Force Password Guessing

lockWhen you bring  your web application live, you can expected various types of attacks –   one could be a brute force scanning of possible logins.   As a standard mean of prevention against such types of attacks login should be temporarily disabled after some number of unsuccessful attempts.  For Django nice package called django-lockout exists.

Main advantage of this package is that it keeps history of unsuccessful login attempts in memory (using Django cache system),  so checks are very quick.   django-lockout is fairly easy to implement, however I’ve found one issue, when it is used together with django admin site.

Continue reading Protecting Django Application Against Brute Force Password Guessing

Voronoi Diagrams

Some time ago I was looking for an algorithms that can generate a ‘map like’ like pictures –  e.g. tessellation of a plane into set of more or less random polygons.    I found Voronoi diagrams –   which give very nice pictures and have many useful properties.
Most common case of Voronoi diagram is known in  an Euclidean plane,   where we have a set of points (call seeds) then Voronoi  diagram splits the plane into areas – called Voronoi cells – around each seed,   where inside each area any point is closer to that seed then to any other.   Areas are then convex polygons (for Euclidean metric). This definition is best illustrated on the picture below – the Voronoi diagram for 100 random points in range 0-100 – Voronoi cells are marked by red lines, blue points are seeds:

voro-100

Continue reading Voronoi Diagrams

Web Clients Are Getting Thick

Remembering days when client-server rules the world, then days when everybody praised light web clients where all user interface (UI) was prepared on web server and any user action was communicated back to server (this could lead to heavy network traffic – I’ve seen one mainstream ERP  program, where a change in one input, say line item quantity,  lead to several megabytes being sent over network).  I’m quite amused to see how we’re returning back to thick clients and passing  more and more UI tasks back to user devices.   This probably make sense, taking into account the computing power available in user devices now (my mobile has approximately same computing power (dual core 1.2 GHz ARM CPU)  as  a reasonable  server  ten years back(Sun V240 for instance)) and  improvement of web browsers and especially their Javascript engines.   Normally utilization on an average client machine would be very low, unless client is dealing with digital media, so using  available computing  power there  is an obvious step. Network bandwidth could be now  more precious resource then client  computing cycles. Continue reading Web Clients Are Getting Thick