Download Email Attachments Automagically

Emails are still one of the most important means of electronic communication.  Apart of everyday usage with some convenient client ( like superb Thunderbird), from time to time one might need to get messages content out of the mailbox and perform some bulk action(s) with it – an example could be to download all image attachments from your mailbox into some folder – this can be done easily manually for few emails, but what if there is 10 thousands of emails?  Your mailbox is usually hosted on some server and you can access it via IMAP protocol. There are many possible  ways how to achieve this, however most of them require to download or synchronize full mailbox locally and then extract required parts from messages and process them.  This could be very inefficient indeed.   Recently I have a need for automated task like one above – search messages in particular IMAP mailbox,  identify attachments of certain type and name and download then and run a command with them, after command is finished successfully delete email (or move it to other folder).   Looking around I did not found anything suitable, which would meet my requirements (Linux, command line, simple yet powerful).  So having some experiences with IMAP and python, I decided to write such tool myself.   It’s called imap_detach, and you can check details on it’s page. Here I’d like to present couple of use cases for this tool in hope they might be useful for people with similar email processing needs.

Let’s start with simple example:

detach.py -H imap.example.com -u user -p password  -f ~/tmp/attachments/{year}/{from}/{name} -v 'attached'

This will download all attachments from all emails in user’s  inbox and save them in subdirectories – first grouped by year, then by sender. If there are many emails it can take quite some time. In some cases you might notice error messages complaining that output file isa  directory, which means that attachment does not have any name defined within the email.

This is resolved in next example by using more sophisticated naming of output file using {name|subject+section}  replacement ( | serves as ‘or’,  + joins two variables – so if attachment does not have name we use subject and section as a file name –  so it can look like “Important message_2.1”)

We also can try to add argument –threads, which will enable concurrent download of attachments in separate threads:

detach.py -H imap.example.com -u user -p password  -f ~/tmp/attachments/{year}/{from}/{name|subject+section} -v --threads 5 'attached'

In my tests with my gmail mailbox concurrent download with 5 threads was 3.7 times faster then single threaded ( downloading ~1200 files, ~450MB).

But we are not limited just to email attachments, all email parts are available to us.  What about to get all plain text parts and put them into one big file, which we can later use for some analysis :

detach.py -H imap.example.com -u user -p password -v -c "cat >> /home/you/tmp/emails.txt" -v 'mime="text/plain"'

We might be more specific on which messages to get – for instance we are interest just in junk messages from this year:

detach.py -H imap.example.com -u user -p password -v -c "cat >> /home/you/tmp/junk{year}.txt" -v 'mime="text/plain" & year=2015 & flags="Junk"'

Text message parts in an email can have different charsets encodings ( for instance for Czech language we can  have iso-8859-2 or win-1250 or UTF-8). The tool solves this by re-encoding text to UTF-8, so the in output file all text is in this charset.

Similarly we can look at messages  in other folders – say folder Spam and all it’s sub-folders and just look for text in first sub-part of the email message (that should be the text of the email) and getting only emails, where subject starts with “Re:”:

detach.py -H imap.example.com -u user -p password -v -c "cat >> /home/you/tmp/spam.txt" -v --folder "Spam**" 'mime="text/plain" & (section="1" | section~="1.") & subject^="re:"'

And what about finding all links in your mailbox (with a bit of quote escape madness):

detach.py -H imap.example.com -u user -p password -v -c 'grep -ioP '\''<a .*?href=["'\'\\\'\''][^"'\'\\\'\'']+["'\'\\\'\'']'\'' | sed -r '\''s/.*href=["'\'\\\'\'']([^"'\'\\\'\'']+)["'\'\\\'\'']/\1/i'\'' >> /home/you/tmp/links.txt' 'mime="text/html"'

Or using fairly complex filter:

detach.py -H imap.example.com -u user -p password -v -f "/home/you/tmp/{name|subject+section}" -v '(mime="application/pdf" & ! from ~= "bill" & ! cc~="james" & size>100k & size<1M) | (mime="image/png" & ! name^="bi" & from~="bill") | (mime^="image" & (name$="gif" | from~="matt"))'

And there are many more possibilities – check details on the tool home page.

27 thoughts on “Download Email Attachments Automagically”

  1. I am using python 2.7 I went to your git hub.
    I did a pip install.

    I cannot seem to use it . Could you write the steps for windows 10 how I can use it

    Normally i run scripts using python xyz.py

    This is not working for me

    1. This is not working for me

      It’s pretty broad statement – what exactly is not working? On linux it installs runnable script detach.py to /usr/bin. Not sure what is happing in Windows because I’m not using it. So look where it gets installed (I think it’s something like C:/Python27/Scripts ) and run it with path. My primary platform is Linux, never tested on Windows so cannot guarantee it’ll work there.

  2. Thank you very much for that fantastic script!!

    It works fine and fullfills al my needs!

    One question: I could not work out how to mark a mail as read or delete it after it has been processed.

    Could you please explain it in an example command line?

    This would help me a lot!!

    Thank you!!

  3. Wow, please forget about my previous question!

    I asked too quick. I have found out by myself (RFM) :).

    But there is another thing I can’s figure out.

    How do I tell the programm to process only unread messages?

    Thank you once again!

    1. use seen variable in filter – you can learn more details following link on the bottom of the article

      1. Thank you!

        I tried for a few hours to use “seen” but I could’nt make it work.

        detach.py -H xxxx:993 -u xxx.com -p xxxxxxx -f /var/opt/processing/{to}/{name} –seen –log-file /var/log/detach.log -v ‘attached’

        This is how far I got but I am in trouble with the syntax of the filtering. I always end up in syntax errors and so on…

        One example would help me a lot. In the above line I would like to process only unseen mails.

        Thank you!!!

    1. You can specify folder to use by parameter –folder, but there is no way to download from all folders in the tool in one go. However you can create shell script to loop through list of known folders.

  4. I tried to execute below code but got an error message as syntax error in imap

    detach.py -H imap.gmail.com -u python37@gmail.com -p Python3.7 -f ~/tmp/attachments/{2018}/{kk@ao.com/{name.subject_section}  -v –threads 5 ‘attached’.

    1. The file name is clearly incorrect – even brackets are not balanced (and   got there by copy&paste error). If file name {x} is place to be replaced by some variable value, not a place for filter!
      If you need to filter mails add it to filter expression (last argument) – so command line should look rather like:

      detach.py -H imap.gmail.com -u python37@gmail.com -p Python3.7 -f ~/tmp/attachments/{name|subject+section} -v –threads 5 ‘attached & year=2018 & from=”kk@ao.com”’

      See link at the end of this article to look for more details of this tool.

      1. Thank You for your immediate response.
        Since I’m new to python still finding it difficult to understand. Currently, I have modified command as per your reply. Please see the below command. What happens is once it got excuted a notepad file gets open with content “#!d:\kk\python_installation\python.exe
        from imap_detach.cmd import main

        if __name__ == ‘__main__’:
        main() ”

        Command:
        detach.py -H imap.gmail.com -u python37@gmail.com -p Python3.7 -f ~/tmp/attachmen
        ts/{2018}/{kk2@co.com} -v ‘attached’

  5. Hi
    I am trying to grab all new attachments from a mail domain with the following code:

    “./detach.py -H domain -u username -p ‘password’ -f ‘./New Attachments/{name}’ -v ‘attached & ! seen’ –seen –insecure-ssl”

    But I get an “UnicodeDecodeError: ‘ascii’ codec can’t decode byte 0x92 in position 26: ordinal not in range(128)” if I have “attached” in the filter. I hope someone can help me 🙂

    1. Please log issue at https://github.com/izderadicka/imap_detach/issues
      Put there complete error stack trace and also python version.
      Problem is with some non-ascii character somewhere in email – maybe attachment name? (Try to limit cause of problem by more detailed filtering).
      As possible workaround try python 3 (If you are using 2, 3 has UTF-8 encoding as default, or try to enforce UTF-8 encoding – it’s described here https://stackoverflow.com/a/7892892/1720030)

  6. Hi,
    first of all, a great one script.
    I have a question:
    i tried your script to download a imap sub folder in this ways (the name of the folder is “Scarto”):

    1)
    detach.py -H SERVER -u USER -p PWD –folder “INBOX\Scarto” -f ./cbfservizi/{year}/{from}/{name} -v ‘attached’
    I have this error: No folders patching [‘INBOX\\Scarto’]

    2)
    detach.py -H SERVER -u USER -p PWD –folder “INBOX/Scarto” -f ./cbfservizi/{year}/{from}/{name} -v ‘attached’ (with slash before the sub folder name)
    I have this error:
    Processing folder INBOX/Scarto
    Runtime Error
    Traceback (most recent call last):
    File “/usr/lib/python2.7/site-packages/imap_detach/imap_client.py”, line 186, in main
    process_folder(c, pool, folder, imap_filter, charset, eval_parser, opts)
    File “/usr/lib/python2.7/site-packages/imap_detach/imap_client.py”, line 233, in process_folder
    selected=c.select_folder(folder)
    File “/usr/lib64/python2.7/site-packages/imapclient/imapclient.py”, line 474, in select_folder
    self._command_and_check(‘select’, self._normalise_folder(folder), readonly)
    File “/usr/lib64/python2.7/site-packages/imapclient/imapclient.py”, line 1174, in _command_and_check
    self._checkok(command, typ, data)
    File “/usr/lib64/python2.7/site-packages/imapclient/imapclient.py”, line 1180, in _checkok
    self._check_resp(‘OK’, command, typ, data)
    File “/usr/lib64/python2.7/site-packages/imapclient/imapclient.py”, line 1060, in _check_resp
    raise self.Error(“%s failed: %s” % (command, to_unicode(data[0])))
    error: select failed: Client tried to access nonexistent namespace. (Mailbox name should probably be prefixed with: INBOX.) (0.001 + 0.000 secs).

    Can you help me?

    Best

    Cristiano Aluffi

    1. Not exactly sure, what can be the problem. Are you sure that folder path is correct (is Scarto really under INBOX?). What IMAP server are you connecting to? If problem remains try to log issue on github, there it’s more visible.

      1. Hi and thank you for the reply.
        The server is imaps.pec.aruba.it
        The folder name Scarto is OK.
        Can you help me?
        Cristiano

        1. Not sure – but you can log issue at github, I’ll look at it then – In ticket please specify: python version, program version, program command line, detailed error stack trace, debug log and if possible test imap account to demostrate the problem on.
          I.

  7. Running on windows I get this error:

    detach.py -H server.com -u user@name.com -p password -v attached –debug
    DEBUG:imap_client:SSL status is True
    DEBUG:imap_client:IMAP filter:
    ERROR:imap_client:Runtime Error
    Traceback (most recent call last):
    File “c:\python37\lib\site-packages\imap_detach\imap_client.py”, line 164, in main
    c=IMAP_client_factory(host,port,use_ssl=ssl)
    File “c:\python37\lib\site-packages\imap_detach\utils.py”, line 75, in IMAP_client_factory
    ssl_context=imapclient.create_default_context()
    File “c:\python37\lib\site-packages\imapclient\tls.py”, line 109, in create_default_context
    context.load_verify_locations(cadata=certs)
    File “c:\python37\lib\site-packages\backports\ssl\core.py”, line 654, in load_verify_locations
    self._ctx.load_verify_locations(cafile, capath)
    File “c:\python37\lib\site-packages\OpenSSL\SSL.py”, line 776, in load_verify_locations
    _raise_current_error()
    File “c:\python37\lib\site-packages\OpenSSL\_util.py”, line 54, in exception_from_error_queue
    raise exception_type(errors)
    OpenSSL.SSL.Error: []

    Any idea why?

    1. Wrong OS? 🙂 I never tested on Win, so not sure how SSL works there in python.
      From error it look like it’s problem with SSL/TLS library, while loading CA certificates – maybe CA certificates are in wrong directory, in wrong format etc. Problem lies in IMAP library, which call SSL library.

  8. Hi, first thank you for this code that could help each others noobs with email twisting.

    unfortunately i cannot use it, iget

    radikal_loulou@radikal-loulou:~$ sudo detach.py -h
    Traceback (most recent call last):
    File “/usr/local/bin/detach.py”, line 2, in
    from imap_detach.cmd import main
    File “/usr/local/lib/python2.7/dist-packages/imap_detach/cmd.py”, line 7, in
    from imap_detach.mail_info import DUMMY_INFO
    File “/usr/local/lib/python2.7/dist-packages/imap_detach/mail_info.py”, line 2, in
    from imap_detach.utils import decode, email_decode, lower_safe
    File “/usr/local/lib/python2.7/dist-packages/imap_detach/utils.py”, line 4, in
    import imapclient
    File “/usr/local/lib/python2.7/dist-packages/imapclient/__init__.py”, line 10, in
    from .imapclient import *
    File “/usr/local/lib/python2.7/dist-packages/imapclient/imapclient.py”, line 26, in
    from . import tls
    File “/usr/local/lib/python2.7/dist-packages/imapclient/tls.py”, line 20, in
    raise ImportError(“backports.ssl is not installed”)
    ImportError: backports.ssl is not installed

    Tried to reinstall imapclient with pip install but it doesnt help…
    Any idea ?

    Thanks in advance

      1. Thank you for this quick response, i get :

        radikal_loulou@radikal-loulou:~$ pip install backports.ssl
        DEPRECATION: Python 2.7 will reach the end of its life on January 1st, 2020. Please upgrade your Python as Python 2.7 won’t be maintained after that date. A future version of pip will drop support for Python 2.7.
        Requirement already satisfied: backports.ssl in /usr/local/lib/python2.7/dist-packages (0.0.9)

        it seems that it’s already installed. I did check the compatibility, but as a non native english speaker and beginner programmer, i don’t understand the subtilitys ofcompatibility with python 2.7 or not.

        Do i need to upgrade to latest python ?

  9. I’m actually using this on 2.7 , so this should not be problem, however using 3.5+ will eliminate this problem probably because backports are not used there. Try from backports import ssl in the interpreter you are using to see if it is really installed there

  10. Can you help me to run script?
    I have error on any try:

    [kpv@m1 ~]$ detach.py -H mail*** -u *** -p *** –insecure-ssl -f ~/tmp/attachments/{year}/{from}/{name} -v ‘attached’
    Runtime Error
    Traceback (most recent call last):
    File “/usr/lib/python2.7/site-packages/imap_detach/imap_client.py”, line 164, in main
    c=IMAP_client_factory(host,port,use_ssl=ssl)
    File “/usr/lib/python2.7/site-packages/imap_detach/utils.py”, line 75, in IMAP_client_factory
    ssl_context=imapclient.create_default_context()
    AttributeError: ‘module’ object has no attribute ‘create_default_context’

    Thanks in advance

  11. this script is exactly what I was looking for. I’m not an experienced programmer though I usally get things running.
    When using your script with Pyton 3.7.4 under Windows 10 I keep getting this error: “Invalid syntax of filter: Rule ‘expr’ didn’t match at ”mime=application/pd’ (line 1, column 1).”
    My instructions (modified from your site): “detach.py -H imap.gmx.net -u name@provider -p pwd -v -f “C:\Users\Name\Desktop\MailTest\123.pdf” -v –folder “Junk**” ‘mime=”application/pdf”.
    I’ve tried other instructions from your site as well, I just keep getting error messages. I’m running out of options. Help would be much appreciated.

    1. Error is clearly about filter expression parsing. I never tested on windows, so not sure if windows shell is behaving same here, but you need to assure that whole filter expression is in single quotes – in your example ending single quote is missing. And string literals must be surrouned by double quotes. Dont trust examples from site completely, I could also made typo – try different filters.
      You can also try in Linux to see if it is Win issue (then you’re on your own as I can help with Linux only)

Leave a Reply to Cristiano Aluffi Cancel reply

Your email address will not be published. Required fields are marked *