How are attachments stored in Zarafa
Email can be used to transport different kinds of data in the form of attachments. These can be anything you like, such as photos, documents or music. When it comes to Zarafa server configuration an administrator has two choices: storing attachments on disk or storing them in the database. While keeping attachments in the database might seem a good idea from a data management perspective (keeping everything together), this is not considered best practice: it can make the database grow rapidly in size and could lead to a performance problem or backups that take too long.
Zarafa has opted for the sensible default of storing all attachments on disk. Configuration of this feature is handled in the configuration file '/etc/zarafa/server.cfg' and looks like this:
# Where to place attachments. Value can be "database" or "files". attachment_storage = files
The path where the attachments are stored can also be found in this configuration file:
# When attachment_storage is "files", use this path to store the files attachment_path = /var/lib/zarafa/attachments
When yo take a look in this directory (find -ls), you'll see a list of attachments like this:
13636244 4 drwx------ 12 root root 4096 Mar 5 15:09 /var/lib/zarafa/attachments 13637400 4 drwx------ 22 root root 4096 Aug 26 2014 /var/lib/zarafa/attachments/8 13637429 20 drwx------ 2 root root 20480 Feb 13 16:06 /var/lib/zarafa/attachments/8/13 13665895 4 -rw-r--r-- 1 root root 322 Feb 13 15:29 /var/lib/zarafa/attachments/8/13/161738.gz 13669294 4 -rw-r--r-- 1 root root 322 Feb 13 16:02 /var/lib/zarafa/attachments/8/13/165138 13669694 4 -rw-r--r-- 1 root root 323 Feb 13 16:06 /var/lib/zarafa/attachments/8/13/165538.gz
Each of these files (with or without gz-extensin) represents a file that was originally attached to an email. However, the separate files give you no information on what type of file or even its original name it was. So how can you find out what the original file name was?
Attachments in MAPI
Every attachment has a number for a filename, which does not seem to have any relation with the original filename. This is correct: the number represents a MAPI property. Every item in MAPI is stored as a property and can be referred to by its number - and Zarafa uses MAPI for storage.
In the example above there are several attachments, including one with property number 161738. From the file name on disk you can see that it has been compressed with GZIP (the ".gz.' extension). The number of the attachment (161738) is stored in the Zarafa MySQL database in the table "singleinstances".
As described, every attachment has a property number for its data. There is also a property number for the file name - file names are stored separately, so we can have several names for one file to reduce the storage usage. Each property has a 'value' and a 'tag'. The value represents the contents, which in case of attachments this could be the file name or even the complete attachment (if you have chosen to store attachments in the database). The tag determines what type the value is. A tag with a value of '0x3707' means: this is a filename. A tag '0x0015' represents an expiry time stamp and '0x002A' is the recipient time stamp.
In the database we have a table 'hierarchy', which is used to connect different objects to each other: it connects file names to attachments, recipients to emails, etc. We can use this information to get the details on all types of MAPI objects, such as attachment file names.
Getting the attachment details
To get from the file name to the number describing the file name of the attachment on disk, we need to do some SQL magic. You need to join the 'properties' table, the 'hierarchy' table and the 'singleinstances' table. This may sound complicated, but it is actually pretty simple!
Fire-up your mysql-client, connect to the Zarafa database and enter the following:
SELECT CONCAT(singleinstances.instanceid % 10, "/", FLOOR(singleinstances.instanceid / 10) % 20, "/", singleinstances.instanceid, "\t", properties.val_string) FROM singleinstances, hierarchy, properties WHERE hierarchy.type=7 AND hierarchy.id=singleinstances.hierarchyid AND hierarchy.id=properties.hierarchyid AND properties.tag=0x3707;
Keep in mind: this query may severely affect performance while it is running!
The result looks like this:
... | 1/13/150131 afexlogo.gif| | 3/13/150133 Diary.pdf | | 4/13/150134 Scarlett.jpg| | 5/13/150135 kw.png | | 5/18/165585 inline.txt | ...
To verify, I execute:
root@system:/home/folkert# zcat /var/lib/zarafa/attachments/4/13/150134 > /tmp/Scarlett.jpg root@system:/home/folkert# file /tmp/Scarlett.jpg /tmp/Scarlett.jpg: JPEG image data, JFIF standard 1.02, comment: "Scarlett Johansson"
So if you ever can't find an email with a certain attachment, this may be a way to at least recover the attachment!