imap_detach Tool – Download Email Attachments

imap_detach is a command line tool, which enables you to automatically download email attachments (or particular email parts) from your mailbox  via IMAP protocol.  The tool is fairly universal and can be used with various IMAP servers for various tasks. The tool is written in Python.

How it works

To work efficiently with this tool you need at least basic understanding of two email technologies MIME  and IMAP.

MIME defines structure of an email message and it enables the message to have multiple parts. The message and each of its parts have a header, which defines it’s type (Content-type field in the header) and how it’s attached to the message (Content-disposition field in the header, which also defines file name of the attachment).  Content type defines media type within particular message part – for message text it can be text/plain or text/html, for PDF document it’s application/pdf,  for PNG image it’s image/png etc.  Content type consists from two parts: type and subtype, separated by slash. Content types are registered by IANA for every possible media type. List of all content types is here.

Email message following MIME standard can have multiple parts, which can be nested, so message parts form a tree structure. For the purpose of parts nesting MIME defines special content types – multipart.  Below is a sample MIME message structure:

mime

IMAP is a protocol for communication with mailbox server – e.g. server where your emails are stored.  IMAP server organizes emails into folders (Inbox, Sent, Drafts …,  you can also easily create your own folders) and enables you to access to particular folders.  IMAP protocol is pretty complex and provides many advanced operations with messages –  for our purpose these three are of special importance:  search, MIME message structure parsing and MIME message parts download.

Similar tools or scripts for downloading attachments often take simplified approach – they download all messages from a folder and then parse message locally and extract required parts and save them.  This approach is indeed inefficient,  because much more data are downloaded, then is finally needed.

Our tool exploits advanced features of IMAP to get only those data, that are necessary to download required parts of email messages. We use a simple filter expression, described in detail in the next section. With this expression you can easily specify, which attachments you’d like to download.  For example we can specify:
! seen & from ~="john.doe" & mime= "application/pdf".
This expression basically says – get me all attached PDF files, which were sent by john.doe (from whatever domain) and which I have not seen yet.

And this is our formula how to get relevant attachments / message parts using the filter expression:

  1. Convert filter expression into IMAP search keywords. However IMAP search is less powerful then our expressions, especially many servers cannot search on part attributes like content type, file name etc.   So IMAP search can only provide a superset of messages we are interested in.  For our sample filter expression above IMAP search will look like (NOT SEEN FROM “john.doe”) – so it provides message IDs for all unseen messages from john.doe.
  2. For all messages identified by IMAP search we download message structure (BODYSTRUCTURE), which provides necessary details about message parts
  3. Use filter expression again for all parts – now we have all details to identify relevant parts – for our sample expression we select just PDF attachments.
  4. Download all matching parts.
  5. Optionally for processed messages we can do some IMAP actions on the message – like delete it, move to other folder or mark is as seen.

It has to be understood that compliance level of various IMAP server implementations differs –  our tool has been tested mostly against dovecot – which provides best IMAPv4 compliance level.  From tests with other servers we saw some differentness  ( for instance Gmail subject search works only for full words). It’s left to users to explore peculiarities of particular IMAP server.  And you are always welcomed to log issue on github.

Usage

Warning: If used with some arguments (–delete, –move) it can significantly modify your mailbox – so be careful!

Commad to run is called detach.py,   detach.py -h will show help message – for convenience help is shown in next section.

Connection to server must be specified with --host, --user and --password arguments, --host is just a server name or host:port, if non standard port is used.  By default connection is using SSL encryption,  plain, non-encrypted connection can be enforced by --no-ssl argument. There is also --insecure-ssl  argument, which switch off check of SSL certificate (useful mainly for testing).

Folder argument --folder defines which folder(s) to be searched. If not specified default is INBOX, you can specify one or more folders ( with multiple --folder arguments). Also patterns are supported:  * (matches all characters except /), ** (match all characters) and ? (matches one character).

If you specify --threads x,  download of emails parts  runs in x separate threads (each having separate connections).  This can significantly speed up download if there are many messages ( but could be overkill for few messages).

filter argument specifies a logical expression, which identifies message parts that are to be downloaded. Expression consist of literals, comparisons, variables and logical operators.

Literals:

string Enclosed in double quotes “some text”
integers numbers[kMG] 1234 5k 12M
date YYYY-MM-DD 2015-11-214
datetime YYYY-MM-DD HH:SS 2015-11-21 17:04

 Comparisons (variable operator literal):

= equals universal, for strings it ignores case mime = “image/png”
~= contains strings only, case insensitive subject ~= “test”
^= starts with strings only, case insensitive subject ^= “Re:”
$= ends with strings only, case insensitive name $= “.pdf”
< less then integers or dates date < 2015-11-21
> greater then integers or dates size > 2M
<= less then or equal integers or dates date <= 2015-11-21
>= greater then or equal integers or dates size >= 2M

Variables:
Various variables with information related to email header or email part are available.  These variables can be also used as substitution variables for output file name or for command run on part content.

subject string Email header subject of the email
from string Email header email address of the sender
sender string Email header email address of the sender – can be different from the from
to string Email header email addresses of recipients, separated by comma
cc string Email header email addresses of CC recipients, separated by comma
bcc string Email header email addresses of BCC recipients, separated by comma
date datetime Email header date, when email arrived to mailbox – e.g. receipt date,  ignores time zone
year integer Email header year, when email arrived to mailbox
month integer Email header month (1 -12), when email arrived to mailbox
day integer Email header day (1-31), when email arrived to mailbox
answered boolean IMAP Flag Email has Answered flag – it’s set by client, when answering the email
seen boolean IMAP Flag Email has Seen flag- it has been opened by some email client
flagged boolean IMAP Flag Email has Flagged flag – often it’s called “stared” email by clients
deleted boolean IMAP Flag Email has Deleted flag – it has been deleted by client, but not yet expunged from the folder
recent boolean IMAP Flag Email has Recent flag – this is the first session, when this email is available
draft boolean IMAP Flag Email has Draft flag
flags string list IMAP Flag All flags available, even non-standard. You can test them as string, test succeeds, if string matches any flag.
mime string Email part Content type of this part – type/subtype
attached boolean Email part Is it attachment (Content-disposition is attachment)?
size integer Email part Approximate size of this part in bytes (inexactness is because we know only encoded size –  for base64 difference can be max 3 bytes due to padding, but for quoted-printable it can be bigger)
name string Email part File name of attachment (or empty if this part is not attachment)
section string Email part Number of part in email structure – like 1, 2.1, 1.2.1 etc.

Logical operators:

Simple expressions (comparisons or standalone boolean variables)  can be combined by logical operators:
! – not
& – and
| – or
Priority of operators is ! & |,  bracket ( and ) can be used to explicitly define evaluation order.

Example of filter expression:

Next after filter we have to define what happens to downloaded email parts, there are three options:

  • save part in the file named after --file argument (with possible variables replacements explained below)
  • run the command specified by –command argument – part content is passed to the command as standard input
  • both save the file first and then run command on it (use both --file and --command arguments),  command then must contain file name variable.

Both command and output file arguments can contain variables in  curly brackets – {variable}.  Same variables as for filter can be used for command – so for instance {subject} is replaced by email subject, {name} by attachment file name (the original name specified in the email), {date} by received date etc. There is also possibility for very simple combinations of variables {name|subject}, expands to name, if available, else to subject, {name+subject} expands to name joined with subject by ‘_’ character. Both can be combined so we can write {name|subject+section}.

Additionally --command can contain following variables related to already expanded output file name:

file_name full output file name with path
file_base_name output file name without path and without extension
file_dir directory component of output file name

--delete-file argument deletes file after it’s processed by command ( so it makes sense to use it only if combination of --file --command is used).

After message parts are processed we can apply an action to the whole email message containing these parts – we can either delete email message (--delete), move to other folder (--move) or mark as seen (--seen). These 3 arguments are only ‘unsafe’ arguments that modify the mailbox.

To follow the progress of the tool and debug potential issues several other arguments are available:
--test –  dry run – just prints which message parts it should download based on the filter, but does not download them
--verbose – prints some basic message about message parts being downloaded
--debug – prints debug information
--log-file – messages are printed to this file instead of standard error output
--no-imap-search – implementations of IMAP SEARCH commands varies by server. This option causes to skip IMAP SEARCH and to get all messages in folder and to filter them locally.

Help

detach.py -h

License

GNU General Public License v3 (GPLv3)

Download

Download source from github.

Install with pip:

Support

Have a problem with tool? The best solution is go tothe  code and fix it yourself. Then send pull request to github.  By this you’ll assure that it’s fixed and you can also help others.

If you cannot fix it you are welcomed to post issue on github project pages.

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code class="" title="" data-url=""> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong> <pre class="" title="" data-url=""> <span class="" title="" data-url="">

My Digital Bits And Pieces