OpenSubtitles provide easy to use API

When working on btclient, I was interested in possibility of downloading a subtitles for a video file, that is played. This seems to be common option in many player.  I’ve found that provides XML-RPC remote API,  which is very easy to use. With help of python xmlrpclib module, it’s really a matter of minutes to create a simple working client.

The real trick then consists in finding right subtitles, which are synchronized with video file.  Opensubtitles provides here useful help – so called moviehash –  this hash is calculated as (file size) + (checksum of first 64kB of file) + (checksum of last 64kB of the file).  Hash is easy and quick to calculate and can be used to search subtitles.  The moviehash is also being uploaded by video players, which can update moviehash when finishing to play video file with certain subtitles, thus giving evidence that this subtitles matched given video file.

But not always subtitles is found by moviehash,  then  other search options are available: by tag (one of tags if filename) or full text query.  However then subtitles returned may not by synchronized – and it’s  then at discretion  of user (or advanced client program) to choose right one.  One of easy to use heuristics could be to use for keywords –  like: if movie file name contains BluRay or BRRip or BDRip  then subtitles file name should contain also one of these strings ( based on an observation that BlueRay rips usually have same timing).

Moviehash can be easily calculated even in streaming clients ( like btclient ), if client supports seeking. First and last 64kB are  seeked and read before video is played.

My implementation of client, in python, is available here. Couple of implementation notes:

  • default xmlrpclib transport does not support HTTP proxy – but this is easily fixed by using transport based on urllib2
  • servers are not very stable, often overloaded and returning 500 error  for searches – so  some retries are needed to get  subtitles more reliably –  internally client has possibility of retries in download_if_not_exists function
  • moviehash is calculated from file-like objects (to enable calculation from streams)

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code class="" title="" data-url=""> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong> <pre class="" title="" data-url=""> <span class="" title="" data-url="">