Once we’d fixed up all our message IDs (see part 1) we let imapsync loose on the first few beta testers who noticed that the dates on many of their older messages were wrong, sometimes hilariously so. This was pretty confusing since we weren’t touching the dates as part of the migration; how had they changed?
Fortunately the observed dates were a clue, many of them were the date of the previous migration from an even older mail system. We had managed to set the message date in imap to the date of that migration years ago. Because many mail clients use the Date: header to order messages nobody noticed until they switched to the new system which uses the imap server’s timestamp. It definitely had to be fixed and fortunately we already had a nice platform to extend with this functionality.
Date math is notoriously fiddly, doing the simple and obvious thing is almost always wrong. Fortunately the python datetime library encapsulates most of the complexity allowing us to make a few assumptions about the timezone (that it’s local to the box this is running on) and let it handle the rest:
mtime = os.stat(fullpath).st_mtime
date_mtime = datetime.datetime.fromtimestamp(mtime, dateutil.tz.gettz())
for i, hname in enumerate(["Delivery-date", "Date"]):
if message.has_key(hname):
date_header = dateutil.parser.parse(message[hname])
if not date_header.tzinfo:
date_header = date_header.replace(tzinfo=dateutil.tz.gettz())
break
delta = abs(date_header - date_mtime)
if delta > datetime.timedelta(days=2):
new_mtime = time.mktime(date_header_parsed.timetuple())
os.utime(filename, (new_mtime, new_mtime))
We merged this into the maildir lint script from the previous post, did the requisite testing to make sure it was behaving as expected and ran it on the maildirs of our intrepid beta testers. Which presented us with another problem: you can’t modify the date of a message via imap without deleting it and re-adding it to the server. We elected to simply bulk-delete the mailboxes (with another python script that’s four lines of deleting things and two dozen lines of sanity checking) and re-run the whole synchronization process from scratch.
There are tens of thousands of messages per mailbox so we let it run over a weekend, came back Monday and…the dates were still wrong. In fact the dates on the synchronized messages were unchanged from the first attempt even though the mtimes were correct. A quick trip to djb’s original Maildir specification demonstrated just how wrong we had been. Not only does the mtime not record the timestamp, mtime isn’t used in the spec at all. Instead the first component of the message’s filename is it’s date. A few facepalms and a quick modification to the script later and we were ready for the third try:
date_maildir = datetime.datetime.fromtimestamp(float(os.path.basename(fullpath).split('.')[0]))
if not date_maildir.tzinfo:
date_maildir = date_maildir.replace(tzinfo=dateutil.tz.gettz())
if abs(date_header - date_maildir) > datetime.timedelta(days=2):
head, tail = os.path.split(fullpath)
new_file = os.path.join(head,
re.sub("^d+.",
str(int(time.mktime(date_header.timetuple()))) + '.',
tail))
if len(new_file) != len(fullpath):
# If someone is running this in 2286 I sincerely apologize.
raise Exception("Sanity check failed for renaming %s to %s" % (filename, new_file))
os.rename(fullpath, new_file)
We again blew away the migrated mailboxes, re-linted the maildirs and re-synchronized them. The dates were correct on the other side this time so we declared victory on this small slice of the migration.