Working at an electronic discovery services provider has taught me a few lessons over the years. One is that eDiscovery consultants like to break things. Set someone loose on a data processing project and BOOM, there is anguished cry from the back room. “Why can’t I get the !@*&!! software to do what I need?” It’s enough to bring a man to his knees trying to make a Friday evening client deadline.
Testing software with a standardized data set is an excellent way to break software in a methodical manner. Up until now most electronic discovery service providers have been conducting their testing using the old Enron data set. Many have assembled their own internal data sets. The Enron data, set while commonly used, has become rather quaint and antiquated. It consists almost entirely of emails and contains exactly zero attachments.
What the world of eDiscovery needs is a standardized set of modern files. Files which we encounter every day. Files like Microsoft Office, graphics files such as Photoshop, Illustrator and JPEG. CAD files, audio files, various Mac files – you get the idea.
We’re looking for a few good files – if you are reading this you can help! Elluma just developed an upload utility http://files.edrm.net where you can upload any kind of user-generated file and help us build a modern new corpus of files.
If you register and upload, you will be able to access download anything from the current collection.
Bring us your WORST files. Files that cause conniptions in the back office. Bring us the files that process properly. Right now we are just looking for any kind of copyright-free files that do not contain any PII (personally identifiable information). If you are an eDiscovery consulant, or an electronic discovery services provider you can help us generate and validate a modern, standardized file set that will be available to everyone and addresses many of the issues with the Enron data set.
Contribute your files at http://files.edrm.net.