Elluma Blog

Social Media Challenges for eDiscovery

by Eric Robi on June 18th, 2013

Social Media – Facebook, Twitter, and YouTube used to be personal means of social media and keeping up with friends. But now it is estimated that 72% of corporations are utilizing social media technology in some manner, and predictions are that adoption will expand within companies to improve internal collaboration and communication.

While the increased use of social media can lead to increased sales and communications, the eDiscovery preservation risk grows exponentially. Whenever litigation involves content from a social media site, there are special challenges that need to be addressed. This is also true of “non-content” data, or information about someone’s activities that might be relevant.

Consider the scenario of a company’s Facebook site, which regularly posts marketing information about new products. A disgruntled employee creates a fictitious person to log into those sites and post negative reviews of the products or company. Does the company have the right to find out the true identity of the poster? If discovered, can the poster be fired, and if there is ensuing litigation, what content is discoverable? Courts are increasingly addressing these issues of privacy and free speech – often siding with the employer’s or litigant’s right to discovery.

However, in addition to the constitutional issues, there are very practical challenges to preserving social media data in the event of litigation.

Unlike documents or e-mail stored on a local server, most social media content is stored on the provider’s servers. Without consent from the user, access to the data usually requires a court order. Given the temporal nature of the data, actions should be taken quickly to preserve information before it is purged. Even after access is granted, care needs to be taken that specialized tools are used to properly collect and preserver the data.

Attempts to take a “self-collection” approach by using screen captures or web “crawlers” can easily lead to missing critical data or metadata. Or worse, damaging or overwriting data. Plus, there is the continuing challenge of observing chain of custody to avoid challenges later in court. Whatever scenario is presented, the timing and techniques for collecting and preserving data often require specialized eDiscovery technology and skills.

As more companies adopt social media for marketing and internal communication, there are increased risks of preserving data for eDiscovery. Plans by a company to use social media in any capacity should also include policies, procedures, and plans for collection and preservation of the data. Consulting with an eDiscovery expert can help develop those policies and procedures. More importantly, when confronted with social media eDiscovery, timing, tools and techniques are critical to meet the preservation challenge, and often require help from an eDiscovery specialist.

————–
[1] McKinsey Global Institute, “The social economy: Unlocking value and productivity through social technologies.
[1] Although a distinction is often made between private and public information, courts are increasingly recognizing there is no personal privacy when material is relevant to litigation.  See Romano v. Steelcase Inc., 907 N.Y.S.2d 650 (Sept. 21, 2010); McMillen v. Hummingbird Speedway, Inc., No. 113-2010 CD (C.P. Jefferson Sept. 9, 2010).

What computer forensics tells us about the Boston bombers

by Eric Robi on April 19th, 2013
Dzhokhar Tsarnayev

Dzhokhar_Tsarnayev, Boston bombing suspect

Law enforcement investigating the Boston bombing have been pouring over clues in an all-out effort to find the people responsible for this moment that has absolutely gripped Boston. Surviving suspect Dzhokar Tsarnaev and his dead brother Tamerlan have left behind evidence that computer forensic analysts are now searching for clues as to motives, co-conspirators, targets and other threats.

Already, a Youtube account  has been identified which contains Playlists named “Terrorists” and “Islam”. The videos in the “Terrorists” playlist cannot be viewed because the accounts associated with those videos have been terminated. The FBI or local police will have served an emergency production demand on Youtube to access the Tsarnaev account and probaly other related accounts to pull subscriber information such as IP addresses used to login, email addresses and subscribers. With an IP address computer forensic analysis can be used to locate a suspect’s physical street address. An email address can be subpoenaed for additional evidence which may point to co-conspirators.

CNN has verified the Twitter account twitter.com/J_tsar as belonging to Dzhokar Tsarnaev. The IP used to log into this account can also be used to verify a physical street address and also potentially what type of cell phone and which carrier were used to access the account.

Most importantly however are the computers seized at the Tsarnaev’s home. These computers contain a wealth of data ripe for computer forensic analysts to comb through with a fine-tooth comb. Browser history can reveal which websites were visited perhaps showing research into bomb-making techniques, extremist websites. Emails and instant messages are being analyzed to find out if the brothers plotted alone or were directed by others.

As the investigation unfolds, digital forensic services provided by Federal and local authorities will undoubtedly provide valuable information about how the Boston bombing suspects planned their devastating attacks and perhaps their motives behind them.

You Really Got a Hold on Me (Or At Least You Should)

by Eric Robi on March 26th, 2013

Duty to Preserve Creates Potential Sanctions for Companies Who Don’t

Last year’s Conference on Preservation Excellence released a Signature Paper from a panel of eDiscovery experts recommending best practices in the field.

The paper referenced a 2012 survey showing 55% of those companies interviewed did not issue or track legal holds. This should be an alarming number for eDiscovery practitioners, for it shows companies are ignoring the potential liability implicit in the duty to preserve electronic stored information (ESI).

Litigation hold

Litigation hold

Although the duty to preserve information relevant to impending litigation arises from early common law, the unique challenges of preserving ESI began to emerge as more companies stored information on computers. Since the first Zubulake[1] opinion in 2003, courts have repeatedly warned companies that the duty to preserve ESI arises when a party “reasonably anticipates” litigation. The duty includes not just suspending the company’s document retention policies, but also implementing a litigation or legal hold. The Pension Committee[2] opinion in 2010 held that a legal hold should be in writing, with attorney oversight of the process and ongoing monitoring.

So, What Exactly Does a Legal Hold Look Like?

Experienced eDiscovery experts will be the first to caution, when it comes to legal holds, “One size does not fit all.” The steps involved in preserving ESI depend largely upon the issues in the litigation, the company’s computer systems, the people who manage them, and the individuals most likely to “own” the relevant data.  Reviewing both the Signature Paper, and the 2010 Sedona Conference® Commentary on Legal Holds produces a valuable overview of tasks and responsibilities to consider:

1. Have a procedure in place.  This, of course, is the ideal situation – planning before it happens.  It is a good idea to pull together a team who will coordinate eDiscovery responses with representatives from various groups: IT, HR, legal, and any other special interest groups who might have a stake in litigation.  Establish lines of communication and policies for what needs to happen when a legal hold is “triggered”. Work with the IT organization to develop comprehensive “data maps” of all the systems and places where data is stored.

2. Determine what will trigger a legal hold.  It’s generally understood that “reasonable anticipation” of litigation establishes the duty to preserve.  In real life situations, many events and factors are involved – consult the Sedona Commentary for examples.  In most cases, have legal counsel review the information and make a determination if a legal hold should be instituted.

3. Determine the scope of preservation.  Identify key custodians and repositories of data relevant to the issues in the matter. Use the data maps prepared in advance to determine where the data is physically stored, and who is managing the system where it resides. Call upon the eDiscovery team to coordinate communication between who has the data and who is managing it.

4. Issue a written directive.  Besides notifying the key custodians, a written legal hold should notify IT and anyone else in (or outside) the organization responsible for the data.  Suspend data retention policies that could automatically destroy data.

5. Follow up and monitor the effort.  Here is where many organizations get into trouble.  Litigation can take years to resolve, and the data needs to be preserved throughout the life of the litigation – and potentially during appeal.  IT systems change, backup programs are upgraded, and personnel move around or leave.  Make sure there are ways to monitor the hold to ensure data does not lost down the road. Courts won’t accept a lackadaisical approach – “we tried” is not enough. You need to keep an ongoing vigil, or you will get into trouble[1].

eDiscovery Experts Can Help

Clearly, this outline is intended as only an overview of points to consider.  Each situation is unique and more complex – with different legal issues, various key parties, and unusual systems or circumstances that will need to be addressed.  The best approach is to consult with an eDiscovery expert who can help you identify the steps, build an eDiscovery Response Team, develop data maps, and guide you in creating internal policies and procedures.  It doesn’t have to be perfect – just reasonable and in good faith. It also makes sense to have the eDiscovery expert work with you to review the numerous legal hold technologies available to determine if any are appropriate and cost-effective for your organization.

With the right team, proper procedures, and follow-up, you can be sure you really got a hold on it.

 


[1] Zubulake v. UBS Warburg LLC, 217 F.R.D. 309 (S.D.N.Y. 2003)

[2] Pension Committee of the University of Montreal Pension Plan v. Banc of America Securities, LLC, 685 F. Supp. 2d 456 (2010 S.D.N.Y.). (Subsequently amended by Judge Scheindlin.)

[3] See Apple Inc. v. Samsung Electronics Co., Ltd., No. C 11-1846, 2012 WL 3042943 (N.D. Cal. July 25, 2012), in which the court noted both sides lack of preservation, and cautioned that ongoing monitoring is critical to avoid sanctions.

Computer Assisted Review – Good or bad CARRMa?

by Eric Robi on March 12th, 2013

Speaking the Same Language Will Help Practitioners Adopt CAR

The eDiscovery think tank that brought us the illustrative (and ubiquitous) Electronic Discovery Reference Model (EDRM) has developed a visual aid designed to guide the industry in its adoption of Computer Assisted Review (CAR). Released in December, 2012 by the group’s EDRM Search project, the first draft is called Computer Aided Review Reference Model, or CARRM.

Computer Assisted Review Reference Model - CARRM

Computer Assisted Review Reference Model – CARRM

In addition to providing a visualization of the steps involved in CAR, the chart opens the discussion for standardization of terms and guidelines for best practices. This bodes well for a fast-paced industry where new technology continues to evolve and drive our practice, but our lexicology is often ad hoc. Especially when you see promises of reduction in review times of 50 to 90 percent compared to linear review, you know your clients will want you to embrace this type of eDiscovery solution as quickly as possible.

This draft already provides valuable direction by suggesting one standard name and acronym. Known variously as Technology-Assisted Review (TAR), Predictive Coding (PC), and Intelligent Review Technology (IRT), agreeing on a singular vernacular will go a long way to promote the adoption of CAR.

As with the EDRM, the CARRM graphic depicts stages in the process that ultimately will produce relevant, responsive documents for production. Emphasizing a planning and education approach for the first three stages, the model addresses the concern that, unless properly thought-out, CAR technology, alone, will not produce desired results.  CAR is not a self-directing automaton.  Critical input from expert humans provides the subjective coding decisions that train the system for what is relevant. Hence the preliminary stages of setting goals, establishing “protocol” or rules of review, than educating the reviewers – all are critical steps before “feeding the machine”.

Likewise, enforcing precise procedures for human interaction and quality control go a long way to address legal issues of defensibility. The circular process depicts the iterative steps taken by an e-Discovery consulting team to determine if the system is producing meaningful results.  It starts with predicting how the system should behave and then determining if the results stack up to those expectations. Using random samples, the reviewers “test” the system with documents known to be unresponsive. If the results are below an agreed-upon threshold, this indicates the “training” of the system was successful.  The process continues in this loop until the team determines there are diminishing returns of relevant documents.

Keep in mind this model is in draft stage, and the group welcomes input either by e-mail to mail@edrm.net or posts on the EDRM site.  Whatever final model is adopted, this first draft goes a long way towards establishing common guidelines, best practices, and one acronym  for an evolving technology.  And that’s a good thing.

In the interest of full disclosure I should mention that I’m an active participant in the EDRM on the Data Set and Metrics projects.

Top 5 Trends in eDiscovery for 2013

by Eric Robi on March 4th, 2013

2013 is shaping up to be another dynamic year for eDiscovery. Some issues are jointly driven by technology and court opinions – a continuing trend which foretells more involvement by the judiciary. Some are the natural evolution of an industry maturing into its second decade, as organizations come to grips with the impact of eDiscovery. Predictions can be risky, but here are the top 5 eDiscovery topics eDiscovery experts and solutions providers will be watching this year:

 1. Big Data

This is not just an issue that needs an eDiscovery solution. Managing, analyzing, and preserving data has always been a challenge to organizations. With the exponential

eDiscovery professionals deal with big data on a daily basis.

eDiscovery professionals deal with big data on a daily basis.

increase in the creation of Electronically Stored Information (ESI), and the preponderance of “unstructured” data, companies continue to grapple with this challenge called Big Data. E-mails, web sites, online transactions, social media, graphics, music, voice, videos, with lots of duplication – it all add up to a mass of information. Data experts in 2011 projected there were 1.8 trillion gigabytes (zettabytes) in the world . With data doubling every two years, the sheer size of data will continue to be a major challenge for eDiscovery professionals trying to manage ESI this year.

2. Information Management Platforms

In response to the Big Data challenge, technology companies are touting enterprise platforms that provide the ability to search, flag, and protect data before it is subject to eDiscovery. Some refer to it as moving the focus “upstream” on the EDRM chart to the Information Management box where data is created and stored. The concept it is to create data in an environment that automatically categorizes and captures critical aspects of the information contained in the unit. Several companies are offering solutions that purport to address these issues . Although not offered as just an eDiscovery solution, these large systems could emerge as the answer to the multi-pronged challenges of leveraging corporate intelligence, as well as capturing data that needs to be preserved. But don’t expect a rush to these solutions, as they come with a hefty price tag, and eDiscovery risks are often assigned lower priorities than other business initiatives. That is, until the litigation fire drill hits.

3. Cloud Computing

Corporations will continue to move to data services over the internet (the cloud) because it makes business sense. Faced with the opportunity to lower IT capital costs, as well as implement fast, flexible applications, and free up IT resources, it is a bottom-line decision. Software as a Service (SaaS) is often the driving force as companies implement e-mail, CRM, and document collaboration solutions online.

Yet, the cloud presents unique challenges for eDiscovery professionals. Because the cloud is not just data storage – it is a dynamic interface between the company and outside data networks – the line of demarcation becomes, um, cloudy (Sorry). Having a third party manage their data does not relieve a company of the responsibility to preserve a particular data set. Courts have determined that data is within a company’s “control” if it has a contract with a third-party provider . So the evolving challenge to in-house eDiscovery experts and their eDiscovery solution providers will be how to work with their third-party providers to anticipate litigation holds, preservation and collection.

4. Social Media

It is clear, social media is discoverable . However, there are technical and constitutional challenges for any eDiscovery professional who wants to preserve or collect content from social media providers. Although most case law revolves around an individual litigant’s duty to preserve and produce social media content, this still presents issues for companies. For example, using social media sites for marketing or advertising clearly creates the duty to preserve. Likewise, an employer could be called upon to produce information they gleaned from a site to make employment decisions. As more companies start using social media to reach their customers and constituents, we should more see courts this year clarifying the rules for accessing and preserving that data.

5. TAR, CAR, Predictive Coding

No list would be complete without including the latest development in human/machine coordination that promises to greatly increase review rates and accuracy: Technology Assisted Review (TAR) or Computer Assisted Review (CAR) or Predictive Coding. Using the decisions of an expert reviewer, the system applies a statistical modeling approach to rank unreviewed documents according to how responsive they are likely to be. The obvious advantage is the ability to process masses of documents very quickly compared to linear review. With last February’s ground-breaking opinion by Magistrate Judge Andrew Peck’s in the Da Silva case, there is now judicial direction that, when the parties agree to use predictive coding, and the process is transparent to both parties, this is a defensibly approach to reviewing large amounts of documents. Subsequent opinions have followed this lead . With so much to gain in productivity, the adoption of TAR, Car or Predictive Coding by more eDiscovery providers will likely be the biggest topic to follow this year. Now, if we can only agree on what we call it, that would be a major accomplishment this year.

 

[1] Columbia Pictures Indus. Et al. v. Bunnell, No. CV 06-1093FMCJCX, 2007 WL 2080419 (C.D. Cal. May 29, 2007

[1] EEOC v. Simply Storage Management, LLC, 270 FRD 430 (SD Ind 2010)

[1] Da Silva Moore v. Publicis Groupe 11 Civ. 1279 (ALC) (AJP; Upheld in Da Silva Moore v. Publicis Groupe, No. 11 Civ.1279 (S.D.N.Y. April 26, 2012).

[1] See, In re Actos (Pioglitazone) Products Liability Litigation, 6:11-md-2299 (W.D. La. July 27, 2012).

e-Discovery Evolutionary Scale

by Eric Robi on January 31st, 2013

In 2012 Elluma Discovery interviewed litigators at many law firms throughout Los Angeles, California ranging in size from 2 to over 1,000 attorneys. I have always suspected that there is a veritable smorgasbord of approaches to dealing with discovery, but I wanted to get some real-life data. You see, working at an electronic discovery services provider I process and analyze evidence ranging from paper, to native files to cell phones and increasingly, video and databases.

As the founder of a 10-year-old e-discovery services provider I fight on the frontlines to tame an always-evolving adversary of electronic evidence. I wanted to find out was what types of evidence litigators most commonly encounter and their approaches to tackling it. What I found surprised me.

How do litigators approach discovery?
My goal was to understand real-life processes used by real-life litigators at a wide variety of law firms. I should mention up front that my approach was not designed to withstand statistical scrutiny. Although anecdotal, I believe what I found accurately represents the state of litigation in Los Angeles. Through our conversations my colleagues and I discovered that firms array themselves along an evolutionary scale of discovery techniques ranging from pure paper to primarily electronic. How would you rank your firm on this scale?

I learned there are five primary rungs on the e-Discovery Evolutionary Scale.

e-Discovery Evolutionary Scale

  1. Paper only
  2. Scanned paper
  3. Scanned and OCRed paper with Acrobat
  4. Scanned and OCRed paper with a legal review tool
  5. Native documents with a legal review tool

1. Paper only

Somewhat surprisingly, a large number of firms of fewer than 50 attorneys receive and produce boxes of 8.5×11” paper. These firms typically do not specify or negotiate for alternate forms of production. Many firms view it as a tactical advantage to ‘paper’ the opposing firm in hopes of running up their costs. This type of firm typically does not have much experience with document review on a computer.

I encountered firms up to 100 attorneys that have recently assembled document reviewers in conference rooms and used sticky notes to mark and separate important documents. In some instances the review process lasted months and involved dozens of attorneys. I can only imagine how difficult it would be to keep track of all the paper and how difficult it would be to correlate key events or create chronologies activity. Some firms scanned in documents only as a ‘backup’, but did nothing with the resulting electronic files.

2. Scanned paper

A little farther along the evolutionary scale I met with several firms who said that they had become overwhelmed with paper and now reviewed discovery electronically. I found quite number of firms that scanned in the discovery they received and saved results as Acrobat (PDF) files. Attorneys would then review the discovery on their computer. Some used dual monitors and would view documents on one screen while typing a brief on the other screen.

While this approach dramatically reduced the amount of paper a firm has to deal with, many attorneys complained that they were unable to search the discovery. Many said that they believed they were spending less time reviewing discovery, but felt that they could be more efficient if they were able to search across the document sets.

Roughly half of the firms I interviewed fell into these first two categories.

3. Scanned and OCRed paper with Acrobat

Quite surprisingly a much smaller number of firms apply OCR to their scanned paper

Ray Kurzweil, inventor of OCR, commonly used in eDiscovery

Ray Kurzweil, inventor of OCR, commonly used in eDiscovery

documents. OCR (Optical Character Recognition) is a software process developed by (Ray Kurzweil to make typewritten words searchable. Kurzweil introduced the technology commercially in 1978 and ultimately sold it to Xerox. Today, scanning a document is somewhat akin to making an electronic

photocopy of a document. However, instead making a copy of a piece of paper, you end up with a computer file – usually a PDF. Like a photograph, the resulting file is not searchable. Running an OCR process on a document makes the text in the document searchable.

I found that firms who OCRed their discovery often searched the documents using a free tool such as Adobe Acrobat Reader. Litigators that recently gained the ability to search their documents electronically reported greatly increased efficiency. Frequently attorneys told us that they spent far less time reviewing irrelevant documents and were able to locate documents central to the case in far less time. Litigators said this was particularly important when reviewing discovery provided to them by their own clients.

Next article
In the next article, we’ll look at law firms who:

1. Scan and OCRed paper with a legal review tool
2. Review native documents with a legal review tool

Federal Circuit’s Model Order on E–Discovery Doesn’t make Things Easier for Judge in Software Patent Case

by Eric Robi on January 16th, 2013

The District Court for the Northern District of California recently granted a software company’s motion to compel production of documents and privilege logs and also granted in part its motion to compel discovery of electronic data.  The court said that computer forensics data should be sorted out by the parties and was be governed by the Federal Circuit’s Model Order on E–Discovery in Patent Cases.

eDiscovery

eDiscovery

The defendant Microstrategy Inc. wanted Vasudevan Software, Inc. (“VSI”) to produce certain computer forensics data.  It also sought a protective order to prevent VSI from seeking excessive ediscovery of its data, which VSI had requested.

Microstrategy claimed that these documents were “non-privileged, relevant, responsive documents” within its custody or control, but failed to provide either the communications between VSI employees and its attorneys, or a detailed privilege log of those communications.  VSI said that all of the computer forensics data at issue were protected by the attorney-client or work-product privileges; a detailed privilege log was unwarranted.

Microstrategy also argued that VSI’s requests violated the Model Order as they constituted a fishing expedition beyond the Model Order’s parameters or necessary for the underlying litigation.  Judge Paul S. Grewal wrote that the Federal Circuit’s Model Order attempted to simplify the process of discovery in cases of computer forensics data recovery, and established a default method of splitting costs among parties.  For requests falling within the limits set in the Model Order (or the court’s modifications), the producing party bears the cost; for requests falling outside of those limits, the requesting party must pay.

Judge Grewal commented, “Unfortunately, despite being a topic fraught with traps for the unwary, the parties invite the court to enter this morass of search terms and discovery requests with little more than their arguments.  The inquiry is difficult when neither party has provided particularly helpful details…”  To wit, Microstrategy concluded that the terms were overbroad and would require burdensome discovery.  VSI, on the other hand, argued only that its terms were relevant to the underlying litigation and that the limited information provided illustrated that the requests were not burdensome.  Prior to making a determination as to the reasonableness of VSI’s request, the court—and the parties—needed more information.  Judge Grewal ordered Microstrategy to run a search using each of VSI’s terms against the designated five custodians.  The parties were then ordered to meet and confer on the results

Microstrategy did not sufficiently show how naming a specific company executive as custodian undermined this objective.  As a result, its motion for a protective order for those emails was denied.  However, VSI was required to produce a log of the communications it asserted were potentially responsive to Microstrategy’s requests but were privileged.  VSI’s refusal to provide this log was unreasonable; however, Microstrategy’s request for item-by-item logs was also unreasonable.  VSI was ordered only to provide categorical logs, grouping documents by type and indicating how each of those categories is privileged.

Judge Grewal finally said that he didn’t want to sort out all of the search terms, custodians, and hit counts in a case of computer forensics data recovery.  To ensure that the parties achieved more fully the collaboration expected in the Model Order, counsel were instructed to meet and confer in person before any future discovery disputes were brought to the judge.

Vasudevan Software, Inc. v. Microstrategy Inc., No. 11–cv–06637–RS–PSG, Slip Copy, 2012 WL 5637611 (N.D.Cal. November 15, 2012)

How did the FBI trace CIA Director David Petreus’ emails?

by Eric Robi on November 19th, 2012

Former CIA Director David Patreus and his lover Paula Broadwell learned that every move you make, every step you take, leaves an online trace. I would expect that someone in Patreus’ position would surely know that the FBI’s crack computer forensic analysts can trace an email using an IP address. However even those in the top spy game are apparently not immune from the garden-variety misjudgments which have defined the current Washington scandal thus far.

The Wall Street Journal reports that Jill Kelly informed the FBI about harassing emails she began receiving in her inbox. The FBI then traced those emails almost certainly by starting with a subpoena to Google. The government is afforded certain extended subpoena powers not available to civil computer forensic analysts. With a criminal subpoena Google will provide the government with the contents of emails. Civil forensic analysts on the other hand have to contend with a much slower subpoena process that results only in IP addresses used to log in to an account.

Now simply having an IP address and the emails is not sufficient to prove that Paula Broadwell sent the emails to Jill Kelly. To trace the email to the source, the FBI would also have had to subpoena the service provider used by Kelly to correlate her identity with her service provider.

Once FBI analysts had traced the email to Broadwell, it was then a matter of obtaining a warrant to monitor Broadwell’s email accounts. According to the New York Times, computer forensic analysts at the FBI were able to link together other email accounts accessed by the same IP address Kelly had used to send the harassing emails.

With the knowledge of the first email account at hand, it was not difficult for the FBI to locate other email accounts used jointly by Broadwell and Petreus.

In an effort to escape detection, Broadwll and Petreus utilized an old al-Qaeda terrorist trick. However since the email account was already under surveillance it didn’t do them much good. The cat was already out of the bag with the initial email exchanges.

Rather than transmitting emails to the other’s inbox, they composed at least some messages and instead of transmitting them, left them in a draft folder or in an electronic “dropbox,” the official said. Then the other person could log onto the same account and read the draft emails there. This avoids creating an email trail that is easier to trace. Washington Post

While this trick certainly eliminated emails sent back and forth between Petreus and Broadwell it did provide the FBI with the IP address used by Petreus. Undoubtedly their computer forensic analysts then traced that IP back to him and the rest as they say is history.

We’re looking for a few good files

by Eric Robi on October 31st, 2012

Working at an electronic discovery services provider has taught me a few lessons over the years. One is that eDiscovery consultants like to break things. Set someone loose on a data processing project and BOOM, there is anguished cry from the back room. “Why can’t I get the !@*&!! software to do what I need?”  It’s enough to bring a man to his knees trying to make a Friday evening client deadline.

Testing software with a standardized data set is an excellent way to break software in a methodical manner. Up until now most electronic discovery service providers have been conducting their testing using the old Enron data set. Many have assembled their own Enroninternal data sets. The Enron data, set while commonly used, has become rather quaint and antiquated. It consists almost entirely of emails and contains exactly zero attachments.

What the world of eDiscovery needs is a standardized set of modern files. Files which we encounter every day. Files like Microsoft Office, graphics files such as Photoshop, Illustrator and JPEG. CAD files, audio files, various Mac files – you get the idea.

We’re looking for a few good files – if you are reading this you can help!  Elluma just developed an upload utility http://files.edrm.net where you can upload any kind of user-generated file and help us build a modern new corpus of files.

If you register and upload, you will be able to access download anything from the current collection.

Bring us your WORST files. Files that cause conniptions in the back office. Bring us the files that process properly. Right now we are just looking for any kind of copyright-free files that do not contain any PII (personally identifiable information). If you are an eDiscovery consulant, or an electronic discovery services provider you can help us generate and validate a modern, standardized file set that will be available to everyone and addresses many of the issues with the Enron data set.

Contribute your files at http://files.edrm.net.