When working on btclient, I was interested in possibility of downloading a subtitles for a video file, that is played. This seems to be common option in many player. I’ve found that opensubtitles.org provides XML-RPC remote API, which is very easy to use. With help of python
xmlrpclib module, it’s really a matter of minutes to create a simple working client.
The real trick then consists in finding right subtitles, which are synchronized with video file. Opensubtitles provides here useful help – so called
moviehash – this hash is calculated as (file size) + (checksum of first 64kB of file) + (checksum of last 64kB of the file). Hash is easy and quick to calculate and can be used to search subtitles. The moviehash is also being uploaded by video players, which can update moviehash when finishing to play video file with certain subtitles, thus giving evidence that this subtitles matched given video file.
But not always subtitles is found by moviehash, then other search options are available: by tag (one of tags if filename) or full text query. However then subtitles returned may not by synchronized – and it’s then at discretion of user (or advanced client program) to choose right one. One of easy to use heuristics could be to use for keywords – like: if movie file name contains BluRay or BRRip or BDRip then subtitles file name should contain also one of these strings ( based on an observation that BlueRay rips usually have same timing).
Moviehash can be easily calculated even in streaming clients ( like btclient ), if client supports seeking. First and last 64kB are seeked and read before video is played.
My implementation of opensubtitles.org client, in python, is available here. Couple of implementation notes:
- default xmlrpclib transport does not support HTTP proxy – but this is easily fixed by using transport based on urllib2
- opensubtitles.org servers are not very stable, often overloaded and returning 500 error for searches – so some retries are needed to get subtitles more reliably – internally client has possibility of retries in
- moviehash is calculated from file-like objects (to enable calculation from streams)