imap_detach is a command line tool, which enables you to automatically download email attachments (or particular email parts) from your mailbox via IMAP protocol. The tool is fairly universal and can be used with various IMAP servers for various tasks. The tool is written in Python.
How it works
To work efficiently with this tool you need at least basic understanding of two email technologies MIME and IMAP.
MIME defines structure of an email message and it enables the message to have multiple parts. The message and each of its parts have a header, which defines it’s type (Content-type
field in the header) and how it’s attached to the message (Content-disposition
field in the header, which also defines file name of the attachment). Content type defines media type within particular message part – for message text it can be text/plain
or text/html
, for PDF document it’s application/pdf
, for PNG image it’s image/png
etc. Content type consists from two parts: type and subtype, separated by slash. Content types are registered by IANA for every possible media type. List of all content types is here.
Email message following MIME standard can have multiple parts, which can be nested, so message parts form a tree structure. For the purpose of parts nesting MIME defines special content types – multipart
. Below is a sample MIME message structure:
IMAP is a protocol for communication with mailbox server – e.g. server where your emails are stored. IMAP server organizes emails into folders (Inbox, Sent, Drafts …, you can also easily create your own folders) and enables you to access to particular folders. IMAP protocol is pretty complex and provides many advanced operations with messages – for our purpose these three are of special importance: search, MIME message structure parsing and MIME message parts download.
Similar tools or scripts for downloading attachments often take simplified approach – they download all messages from a folder and then parse message locally and extract required parts and save them. This approach is indeed inefficient, because much more data are downloaded, then is finally needed.
Our tool exploits advanced features of IMAP to get only those data, that are necessary to download required parts of email messages. We use a simple filter expression, described in detail in the next section. With this expression you can easily specify, which attachments you’d like to download. For example we can specify:
! seen & from ~="john.doe" & mime= "application/pdf"
.
This expression basically says – get me all attached PDF files, which were sent by john.doe (from whatever domain) and which I have not seen yet.
And this is our formula how to get relevant attachments / message parts using the filter expression:
- Convert filter expression into IMAP search keywords. However IMAP search is less powerful then our expressions, especially many servers cannot search on part attributes like content type, file name etc. So IMAP search can only provide a superset of messages we are interested in. For our sample filter expression above IMAP search will look like (NOT SEEN FROM “john.doe”) – so it provides message IDs for all unseen messages from john.doe.
- For all messages identified by IMAP search we download message structure (BODYSTRUCTURE), which provides necessary details about message parts
- Use filter expression again for all parts – now we have all details to identify relevant parts – for our sample expression we select just PDF attachments.
- Download all matching parts.
- Optionally for processed messages we can do some IMAP actions on the message – like delete it, move to other folder or mark is as seen.
It has to be understood that compliance level of various IMAP server implementations differs – our tool has been tested mostly against dovecot – which provides best IMAPv4 compliance level. From tests with other servers we saw some differentness ( for instance Gmail subject search works only for full words). It’s left to users to explore peculiarities of particular IMAP server. And you are always welcomed to log issue on github.
Usage
Warning: If used with some arguments (–delete, –move) it can significantly modify your mailbox – so be careful!
Commad to run is called detach.py, detach.py -h
will show help message – for convenience help is shown in next section.
Connection to server must be specified with --host
, --user
and --password
arguments, --host
is just a server name or host:port, if non standard port is used. By default connection is using SSL encryption, plain, non-encrypted connection can be enforced by --no-ssl
argument. There is also --insecure-ssl
argument, which switch off check of SSL certificate (useful mainly for testing).
Folder argument --folder
defines which folder(s) to be searched. If not specified default is INBOX, you can specify one or more folders ( with multiple --folder
arguments). Also patterns are supported: * (matches all characters except /), ** (match all characters) and ? (matches one character).
If you specify --threads x
, download of emails parts runs in x separate threads (each having separate connections). This can significantly speed up download if there are many messages ( but could be overkill for few messages).
filter
argument specifies a logical expression, which identifies message parts that are to be downloaded. Expression consist of literals, comparisons, variables and logical operators.
Literals:
string | Enclosed in double quotes | “some text” |
integers | numbers[kMG] | 1234 5k 12M |
date | YYYY-MM-DD | 2015-11-214 |
datetime | YYYY-MM-DD HH:SS | 2015-11-21 17:04 |
Comparisons (variable operator literal):
= | equals | universal, for strings it ignores case | mime = “image/png” |
~= | contains | strings only, case insensitive | subject ~= “test” |
^= | starts with | strings only, case insensitive | subject ^= “Re:” |
$= | ends with | strings only, case insensitive | name $= “.pdf” |
< | less then | integers or dates | date < 2015-11-21 |
> | greater then | integers or dates | size > 2M |
<= | less then or equal | integers or dates | date <= 2015-11-21 |
>= | greater then or equal | integers or dates | size >= 2M |
Variables:
Various variables with information related to email header or email part are available. These variables can be also used as substitution variables for output file name or for command run on part content.
subject | string | Email header | subject of the email |
from | string | Email header | email address of the sender |
sender | string | Email header | email address of the sender – can be different from the from |
to | string | Email header | email addresses of recipients, separated by comma |
cc | string | Email header | email addresses of CC recipients, separated by comma |
bcc | string | Email header | email addresses of BCC recipients, separated by comma |
date | datetime | Email header | date, when email arrived to mailbox – e.g. receipt date, ignores time zone |
year | integer | Email header | year, when email arrived to mailbox |
month | integer | Email header | month (1 -12), when email arrived to mailbox |
day | integer | Email header | day (1-31), when email arrived to mailbox |
answered | boolean | IMAP Flag | Email has Answered flag – it’s set by client, when answering the email |
seen | boolean | IMAP Flag | Email has Seen flag- it has been opened by some email client |
flagged | boolean | IMAP Flag | Email has Flagged flag – often it’s called “stared” email by clients |
deleted | boolean | IMAP Flag | Email has Deleted flag – it has been deleted by client, but not yet expunged from the folder |
recent | boolean | IMAP Flag | Email has Recent flag – this is the first session, when this email is available |
draft | boolean | IMAP Flag | Email has Draft flag |
flags | string list | IMAP Flag | All flags available, even non-standard. You can test them as string, test succeeds, if string matches any flag. |
mime | string | Email part | Content type of this part – type/subtype |
attached | boolean | Email part | Is it attachment (Content-disposition is attachment)? |
size | integer | Email part | Approximate size of this part in bytes (inexactness is because we know only encoded size – for base64 difference can be max 3 bytes due to padding, but for quoted-printable it can be bigger) |
name | string | Email part | File name of attachment (or empty if this part is not attachment) |
section | string | Email part | Number of part in email structure – like 1, 2.1, 1.2.1 etc. |
Logical operators:
Simple expressions (comparisons or standalone boolean variables) can be combined by logical operators:
! – not
& – and
| – or
Priority of operators is ! & |, bracket ( and ) can be used to explicitly define evaluation order.
Example of filter expression:
subject ~= "test" & date > 2015-10-21 & ( ! deleted | ! seen)
Next after filter we have to define what happens to downloaded email parts, there are three options:
- save part in the file named after
--file
argument (with possible variables replacements explained below) - run the command specified by –command argument – part content is passed to the command as standard input
- both save the file first and then run command on it (use both
--file
and--command
arguments), command then must contain file name variable.
Both command and output file arguments can contain variables in curly brackets – {variable}. Same variables as for filter can be used for command – so for instance {subject} is replaced by email subject, {name} by attachment file name (the original name specified in the email), {date} by received date etc. There is also possibility for very simple combinations of variables {name|subject}, expands to name, if available, else to subject, {name+subject} expands to name joined with subject by ‘_’ character. Both can be combined so we can write {name|subject+section}.
Additionally --command
can contain following variables related to already expanded output file name:
file_name | full output file name with path |
file_base_name | output file name without path and without extension |
file_dir | directory component of output file name |
--delete-file
argument deletes file after it’s processed by command ( so it makes sense to use it only if combination of --file
--command
is used).
After message parts are processed we can apply an action to the whole email message containing these parts – we can either delete email message (--delete
), move to other folder (--move
) or mark as seen (--seen
). These 3 arguments are only ‘unsafe’ arguments that modify the mailbox.
To follow the progress of the tool and debug potential issues several other arguments are available:
--test
– dry run – just prints which message parts it should download based on the filter, but does not download them
--verbose
– prints some basic message about message parts being downloaded
--debug
– prints debug information
--log-file
– messages are printed to this file instead of standard error output
--no-imap-search
– implementations of IMAP SEARCH commands varies by server. This option causes to skip IMAP SEARCH and to get all messages in folder and to filter them locally.
Help
detach.py -h
usage: detach.py [-h] -H HOST -u USER [-p PASSWORD] [--no-ssl] [--insecure-ssl] [--folder FOLDER] [-f FILE_NAME] [-c COMMAND] [-t THREADS] [-v] [--debug] [--log-file LOG_FILE] [--test] [--seen] [--delete] [--delete-file] [--move MOVE] [--timeit] [--no-imap-search] [--unsafe-imap-search] [--version] filter positional arguments: filter Filter for mail parts to get, simple expression with variables comparisons ~=, = , > etc. and logical operators & | ! and brackets optional arguments: -h, --help show this help message and exit -H HOST, --host HOST IMAP server - host name or host:port -u USER, --user USER User name -p PASSWORD, --password PASSWORD User password --no-ssl Do not use SSL, use plain unencrypted connection --insecure-ssl Use insecure SSL - certificates are not checked --folder FOLDER mail folder(s), can specify more, or pattern with ?, * or **, default is INBOX -f FILE_NAME, --file-name FILE_NAME Pattern for outgoing files - supports {var} replacement - same variables as for filter -c COMMAND, --command COMMAND Command to be executed on downloaded file, supports {var} replacement - same variables as for filter, if output file is not specified, data are sent via standard input -t THREADS, --threads THREADS Download message parts in x separate threads -v, --verbose Verbose messaging --debug Debug logging --log-file LOG_FILE Log is written to this file (otherwise it's stdout) --test Do not download and process - just show found email parts --seen Marks processed messages (matching filter) as seen --delete Deletes processed messages (matching filter) --delete-file Deletes downloaded file after command --move MOVE Moves processed messages (matching filter) to specified folder --timeit Will measure time tool is running and print it at the end --no-imap-search Will not use IMAP search - slow, but will assure exact filter match on any server --unsafe-imap-search Advanced IMAP search features, requiring full IMAPv4 compliance, which are supported only by some servers (dovecot) --version show program's version number and exit Variables for filter: answered attached bcc cc date day deleted draft flagged flags from mime month name recent section seen sender size subject to year Operators for filter: = ~= (contains) ^= (starts with) $= (ends with) > < >= <= Date(time) format: YYYY-MM-DD or YYYY-MM-DD HH:SS - enter without quotes Additional variables for command: file_name file_base_name file_dir
License
GNU General Public License v3 (GPLv3)
Download
Download source from github.
Install with pip:
pip install imap_detach
Support
Have a problem with tool? The best solution is go tothe code and fix it yourself. Then send pull request to github. By this you’ll assure that it’s fixed and you can also help others.
If you cannot fix it you are welcomed to post issue on github project pages.