Mining Mail

Tuesday, December 3rd, 2002

One convenient source of a massive corpus of data suitable for data mining is the mass of e-mail that arrives at our system every day.

E-mail is a surprisingly interesting data format. It contains a lot of structured, regular data in the form of mail headers, which is easy for a computer to parse. Unfortunately, the utility of the mail headers is pretty hit and miss.

Simon Cozens, Mining Mail, The Perl Journal, December 2002