Convert Dictionary for PocketBook eBook Reader

I’m great fan of ebooks and recently I changed my old Hanlin to new reader – PocketBook 626.   PocketBook is an European company, which is distributing  PocketBook readers, the actual development of devices is done by Ukrainian company Obreey Products. Ukrainian and Russian programmers have been always very active in ebooks technology (FB2 format,  CoolReaderFBReader, OpenInkPot (opensource FW for ebook) and more), so no wonder they have been able to produce quite a nice device.   I personally do prefer it to Kindle   because of wider format support (especially for support of epub format) and for broader possibilities how to customize the device  (price wise they are basically similar to Kindle).

One of first tasks was to get there more useful language dictionaries (there are some built in, but can get better ones). Many high quality free dictionaries are available in stardict format (for instance here for Czech language dictionaries;  other sites are available, which offer also other formats of dictionaries).

Actually it’s quite big mess in dictionary formats out there.  There are many private formats (actually each ebook vendor tends to use its own format) and few (if any) public standards.  Already mentioned stardict was fairly popular private format (however stardict application is abandoned, you can use it successor Goldendict, which supports stardict format). Stardict dictionary is usually composed of 3 files .ifo, .dict (or dict.dz – compressed definition) and .idx. (if .ifo is missing it can be easily recreated – it’s text file).

One of standard formats is XDXF, which is basically XML with terms/keys and definitions. XDXF site also contains many dictionaries in this format. Here is a small sample of one entry in XDXF file  (text can contain some basic html formating – like bold, italics …):

Detailed description of the XDXF format is here.

In order to work efficiently dictionaries usually need two data structures –   terms definitions (like one above) and indexes – which enables to quickly find right definition ( I guess most implementations use some kind of binary tree to search efficiently). It depends on particular format how these two are represented ( stardict has definitions in .dict   file, index in .idx file).  Definitions are usually compressed somehow to save space (stardict uses dictzip).

You cannot rely only file extensions to determine dictionary format – it can be deceiving ( for instance Bookeen is also using .dict and .idx – but format is different – .idx is actually sqlite3 database). To be sure look into files with hexadecimal editor/viewer.

PocketBook uses custom format dictionary format, to which we must convert required dictionary, before uploading it to the device. It has extension .dic and combines definitions and index in one file.    The only tool that can convert to this format is private PocketBook’s tool called DictionaryConverter ( links are below), which runs only on windows ( or under wine on Linux). It needs XDXF dictionary as an input and produces .dic file.

In order to get XDXF dictionary from existing dictionaries, we might need some other tool/converter, here are few I’ve tried:

  • linguae tool – GUI tool written in python, worked well for me to convert from startdict format to XDXF. Site is in French, but tool has English interface too.
  • makedict – command like tool – official converter to XDXF format, written in C (so you might need to compile it locally).  However for me previous tool worked better – see below.
  • pyglossary –  GUI tool written in python, cannot convert to XDXF (but can from XDXF), however can convert between many formats.
  • penelope – command line tool written in python, useful for converting Bookeen or Kobe dictionaries ( cannot convert to XDXF, but can convert to stardict, which can be used as intermediary format). Requires dictzip.

So general approach for PocketBook dictionary is – get required dictionary in some format (ideally in XDXF), if not XDXF use some tool to convert to this format. Use DictionaryConverter tool to convert to .dic and copy this file to your PocketBook. Here are detailed steps, which worked for me:

  1. We first need to convert dictionary into XDXF format (XML format for dictionaries exchange). I used  linguae tool,   a GUI tool in python, working fine in Linux ( I also tried makedict tool, but it did not work well, because it does not support sametypesequence=g format of entries in dictionary – have to be m and then the formating was ugly).
    One small note to stardict format –   it may have compress dict file (.dict.dz) – some conversion tools do not support it – uncompress with dictzip or just rename it to .dict.gz and then ungzip.
  2. Get PocketBook DictionaryConverter (newer version) or DictionaryConverter (older version I used originally)  (or here DictionaryConverter with cs language definition)
  3. In Dictionary converter you might need additional definitions of “from” language”  –  English, German, Russian (and Czech in sthe econd file) are already available. For other language: copy en directory to other directory, rename it to appropriate language code ( cz, es … two letters language code). Edit 3 files there (if necessary) :
    collates.txt –   the collation (order of search/results) – defines all characters that are considered equal for ordering (can keep english as default for Latin alphabets)
    keyboard.txt –  I think it’s not used for new firmware (you can switch to predefined national keyboard) – so just leave it as it is.
    morphems.txt – it’s used to define same words in different forms,  If you used English base edit it(leave first two rows defining  consonants and vowels, eventually modify for your language, but it is not essential ), or  you can leave it empty if you do not know how to use it.
  4. Run wine converter.exe <path_to__dict.xdxf> <lang_code>
    wine
    – Windows emulator – use only in linux, in Win you can run directly
    then path to dict.xdxf: dictionary created in previous step (e.g. “../czech-cizi/dict.xdxf”)
    lang_code –  language code (name of dictionary with language definition –   e.g. cs )
    The program is quite particular about well formed XML input – it crashes if file is ill formatted – look for program messages, fix XML and try again.
  5. Copy resulting .dic file (will be in same folder as dict.xdxf) to PocketBook – system/dictionaries folder

As sample here is Czech Foreign Words dictionary (Slovnik Cizich Slov), or Spanish dictionary, both created by this process.

6 thoughts on “Convert Dictionary for PocketBook eBook Reader”

  1. Hello,

    I was trying to create dictionary the way you wrote it in your above mentioned tutorial. I do not know what i did wrong. I downloaded the Dictionary converter, but when i ran it, it only displayed terminal with statement:
    Usage: converter.exe DIC_FILE LANG_DIR
    ” press any key to continue”
    How should I continue?

    I am running Debian wheezy and Pocketbook 626 touch lux 2.

    thank you

    1. Obviously you did not supply right command line paramaters – check that you supply correct path to xdxf file and correct language definition name (it has to be subdirectory in the directory from which you are running converter.exe)

  2. Hi,
    I was trying to convert xdxf file with convert.exe utility. What I got is:

    Loading collates…
    Loading morphems…
    Loading keyboard…
    Loading dictionary file…
    dict.xdxf, line 141344: unclosed xml tag

    then program crashes and Win 7 message box pops up.
    As I can see the version of convert.exe you refer to is quite old. Where can I get a new version of this utility?

    Thanks in advance.

  3. @John: Accoirding to the error msg:
    >>>dict.xdxf, line 141344: unclosed xml tag<<<

    it seems that the script hitted an open xml tag i.e. '’ is missing. Did you check the line mentioned above?
    I solved this problem by deleting ‘<'

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code class="" title="" data-url=""> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong> <pre class="" title="" data-url=""> <span class="" title="" data-url="">