Convert Dictionary for PocketBook eBook Reader

I’m great fan of ebooks and recently I changed my old Hanlin to new reader – PocketBook 626.   PocketBook is an European company, which is distributing  PocketBook readers, the actual development of devices is done by Ukrainian company Obreey Products. Ukrainian and Russian programmers have been always very active in ebooks technology (FB2 format,  CoolReaderFBReader, OpenInkPot (opensource FW for ebook) and more), so no wonder they have been able to produce quite a nice device.   I personally do prefer it to Kindle   because of wider format support (especially for support of epub format) and for broader possibilities how to customize the device  (price wise they are basically similar to Kindle).

One of first tasks was to get there more useful language dictionaries (there are some built in, but can get better ones). Many high quality free dictionaries are available in stardict format (for instance here for Czech language dictionaries;  other sites are available, which offer also other formats of dictionaries).

Actually it’s quite big mess in dictionary formats out there.  There are many private formats (actually each ebook vendor tends to use its own format) and few (if any) public standards.  Already mentioned stardict was fairly popular private format (however stardict application is abandoned, you can use it successor Goldendict, which supports stardict format). Stardict dictionary is usually composed of 3 files .ifo, .dict (or dict.dz – compressed definition) and .idx. (if .ifo is missing it can be easily recreated – it’s text file).

One of standard formats is XDXF, which is basically XML with terms/keys and definitions. XDXF site also contains many dictionaries in this format. Here is a small sample of one entry in XDXF file  (text can contain some basic html formating – like bold, italics …):

<ar><k>Example</k>
(n.) One or a portion taken to show the character or quality of the whole; a sample; a specimen.</ar>

Detailed description of the XDXF format is here.

In order to work efficiently dictionaries usually need two data structures –   terms definitions (like one above) and indexes – which enables to quickly find right definition ( I guess most implementations use some kind of binary tree to search efficiently). It depends on particular format how these two are represented ( stardict has definitions in .dict   file, index in .idx file).  Definitions are usually compressed somehow to save space (stardict uses dictzip).

You cannot rely only file extensions to determine dictionary format – it can be deceiving ( for instance Bookeen is also using .dict and .idx – but format is different – .idx is actually sqlite3 database). To be sure look into files with hexadecimal editor/viewer.

PocketBook uses custom format dictionary format, to which we must convert required dictionary, before uploading it to the device. It has extension .dic and combines definitions and index in one file.    The only tool that can convert to this format is private PocketBook’s tool called DictionaryConverter ( links are below), which runs only on windows ( or under wine on Linux). It needs XDXF dictionary as an input and produces .dic file.

In order to get XDXF dictionary from existing dictionaries, we might need some other tool/converter, here are few I’ve tried:

  • linguae tool – GUI tool written in python, worked well for me to convert from startdict format to XDXF. Site is in French, but tool has English interface too.
  • makedict – command like tool – official converter to XDXF format, written in C (so you might need to compile it locally).  However for me previous tool worked better – see below.
  • pyglossary –  GUI tool written in python, cannot convert to XDXF (but can from XDXF), however can convert between many formats.
  • penelope – command line tool written in python, useful for converting Bookeen or Kobe dictionaries ( cannot convert to XDXF, but can convert to stardict, which can be used as intermediary format). Requires dictzip.

So general approach for PocketBook dictionary is – get required dictionary in some format (ideally in XDXF), if not XDXF use some tool to convert to this format. Use DictionaryConverter tool to convert to .dic and copy this file to your PocketBook. Here are detailed steps, which worked for me:

  1. We first need to convert dictionary into XDXF format (XML format for dictionaries exchange). I used  linguae tool,   a GUI tool in python, working fine in Linux ( I also tried makedict tool, but it did not work well, because it does not support sametypesequence=g format of entries in dictionary – have to be m and then the formating was ugly).
    One small note to stardict format –   it may have compress dict file (.dict.dz) – some conversion tools do not support it – uncompress with dictzip or just rename it to .dict.gz and then ungzip.
  2. Get PocketBook DictionaryConverter (newer version) or DictionaryConverter (older version I used originally)  (or here DictionaryConverter with cs language definition)
  3. In Dictionary converter you might need additional definitions of “from” language”  –  English, German, Russian (and Czech in the third file) are already available. For other language: copy en directory to other directory, rename it to appropriate language code ( cz, es … two letters language code). Edit 3 files there (if necessary) :
    collates.txt –   the collation (order of search/results) – defines all characters that are considered equal for ordering (can keep english as default for Latin alphabets)
    keyboard.txt –  I think it’s not used for new firmware (you can switch to predefined national keyboard) – so just leave it as it is.
    morphems.txt – it’s used to define same words in different forms,  If you used English base edit it(leave first two rows defining  consonants and vowels, eventually modify for your language, but it is not essential ), or  you can leave it empty if you do not know how to use it.
  4. Run wine converter.exe <path_to__dict.xdxf> <lang_code>
    wine
    – Windows emulator – use only in linux, in Win you can run directly
    then path to dict.xdxf: dictionary created in previous step (e.g. “../czech-cizi/dict.xdxf”)
    lang_code –  language code (name of dictionary with language definition –   e.g. cs )
    The program is quite particular about well formed XML input – it crashes if file is ill formatted – look for program messages, fix XML and try again.
  5. Copy resulting .dic file (will be in same folder as dict.xdxf) to PocketBook – system/dictionaries folder (in Windows you might need to change File Explorer settings to see system folder – View tab / Options/ View tab / Advanced Settings – uncheck Hide protected operating system file)

As sample here is Czech Foreign Words dictionary (Slovnik Cizich Slov), or Spanish dictionary, both created by this process.

29 thoughts on “Convert Dictionary for PocketBook eBook Reader”

  1. Hello,

    I was trying to create dictionary the way you wrote it in your above mentioned tutorial. I do not know what i did wrong. I downloaded the Dictionary converter, but when i ran it, it only displayed terminal with statement:
    Usage: converter.exe DIC_FILE LANG_DIR
    ” press any key to continue”
    How should I continue?

    I am running Debian wheezy and Pocketbook 626 touch lux 2.

    thank you

    1. Obviously you did not supply right command line paramaters – check that you supply correct path to xdxf file and correct language definition name (it has to be subdirectory in the directory from which you are running converter.exe)

  2. Hi,
    I was trying to convert xdxf file with convert.exe utility. What I got is:

    Loading collates…
    Loading morphems…
    Loading keyboard…
    Loading dictionary file…
    dict.xdxf, line 141344: unclosed xml tag

    then program crashes and Win 7 message box pops up.
    As I can see the version of convert.exe you refer to is quite old. Where can I get a new version of this utility?

    Thanks in advance.

  3. @John: Accoirding to the error msg:
    >>>dict.xdxf, line 141344: unclosed xml tag<<<

    it seems that the script hitted an open xml tag i.e. '’ is missing. Did you check the line mentioned above?
    I solved this problem by deleting ‘<'

    1. Hi John and Esoj,

      We ran into the same issue, “unclosed xml tag” error on an otherwise well formed and valid XML.

      What I found now was that the error appears when an XML tag (the start or end tag itself, not the content) crosses a 4096 character boundary. For example < is character 4095, / is character 4096, letter i is 4097 and > is 4098.

      I think the converter processes the file line by line, and each line in 4096 character blocks. The blocks don’t have to be well formed, so it’s not a problem if <i> is in one block and the matching </i> is in the next. But in the case I mentioned only </ is in the first block, and i> is in the next.

    1. Jestli myslis zpetnou konverzi z PocketBook dic souboru to txt nebo neceho jineho, tak to nevim. Podle me nic neexistuje – PocketBook .dic je prorietarni a jediny program co s tim pracuje preklada do tohoto formatu, ale ne naopak.

  4. HI, Thanks for your work. I cannot find the /system/dictionaries/ folder on my pocketbook basic 3. It seems that it is a hidden file? I am having difficulties finding this. When I plug it into my usb port to my pc all I get is the books and the download folder and other preinstalled books. Where is it? What must I do? I have ticked the boxes on my windows 8 control panel to show hidden files but nothing has worked.

    1. If it does not exists create it – it’s just for your custom dictionaries – so I may not exist if you do not have any.

    2. it is possilbe to see system folder and alll other hidden folders just printing pass after your pocketbook disk name. For example F://system

  5. Hello,
    Ok, I now have the dict.xdxf file of the English Dictionary. Do I paste that somewhere? I have the pocket book basic 3. What next?
    Thanks.

    1. Now follow steps above from step 2. I hope it’s explained relatively clearly. If not look for other guides on the net.

  6. hi, i downloaded some xdxf dictionaries from the site you linked (dicts.info) and i’m trying to convert them with the tool, but it returns “this file is not in XDXF format”, even if it clearly is and i’m quite sure i put all’ arguments correctly. Why this happens? Thanks in advance!

    1. Don’t know – I was just using this tool, which was written by some guys from PocketBook. Source code is not available unfortunately, so it’s hard to say what’s wrong. Try some conversion program – to convert to other format and back to XDXF, if this helps?

      1. The 2 lines got lost, making another attempt with entities:

        <?xml version=”1.0″ encoding=”UTF-8″?>
        <!DOCTYPE xdxf SYSTEM “xdxf_strict.dtd”>

  7. Hi, I followed the guides and could create and compile a custom dictionary. Copied to the device (Touch Lux 3) and tested on a book successfully. There was 1 error though: the title of the dictionary was missing. It only appeared as an empty line in the dictionary overview and the selection box that comes up from the definition popup window. Has anyone seen this error?

    1. Never seen this problem before, I guess title might be missing already in XDXF? As it’s XML it could be easy added there manually.

      1. This is the XML I used:

        <?xml version=”1.0″ encoding=”UTF-8″?>
        <!DOCTYPE xdxf SYSTEM “xdxf_strict.dtd”>
        <xdxf lang_from=”ENG” lang_to=”ENG” format=”logical” revision=”DD”>
            <meta_info>
                <title>My Dictionary</title>
                <full_title>My Test Dictionary</full_title>
                <description>A dictionary for testing the converter.</description>
                <publisher>Me</publisher>
                <authors>
                    <author>Me</author>
                    <author>Myself</author>
                    <author>I</author>
                </authors>
                <file_ver></file_ver>
                <creation_date>01-01-2001</creation_date>
                <last_edited_date>01-01-2011</last_edited_date>
                <dict_edition>2</dict_edition>
                <publishing_date>01-01-2011</publishing_date>
                <dict_src_url>http://localhost/</dict_src_url&gt;
            </meta_info>
            <lexicon>
                <ar>
                    <k>often</k>
                    <def>
                        <def>
                            <deftext>definition of often</deftext>
                        </def>
                    </def>
                </ar>
            </lexicon>
        </xdxf>

        Titles are there in several tags, but something might still be missing…

        1. The error is connected to the XDXF version. The title doesn’t show up if the source file is in revision 33 with xdxf_strict.dtd. It does appear if the source file uses xdxf_lousy.dtd. Some more experimentation (or finding the documentation…) is still needed to determine exactly which versions/revisions of XDXF are supported, but at least now we know what’s going on.

    2. try
      1. Open the converted file with the necessary name for viewing as Hex.
      2. Open the dictionary * .dic hex editor.
      3. Starting at offset 0x40, we replace the unreadable name by the required one byte.
      There is a limit on the length of the name – a maximum of 31 characters (already other data come from the offset 0x80). The name must be terminated with two zero bytes (maximum at offset 0x7e and 0x7f).

  8. Hi,
    thank you for your article!
    By any chance do you know how to convert .xdxf format into .xml or .csv format that can be used for WikiMedia import?
    Thank you!

  9. Postupoval jsem přesně podle toho, jak píšete, a když zmáčknu enter, tak se příkazový řádek prostě zavře a nic se neděje. Windows 10.

Leave a Reply

Your email address will not be published. Required fields are marked *