EDiscovery Revisited A Broader Perspective for IR Researchers
E-Discovery Revisited: A Broader Perspective for IR Researchers Jack G. Conrad, Thomson R&D ICAIL 07 / DESI Workshop June 4, 2007
EDD Outline • EDD ― The Big Picture • Motivations • Background • EDD interactions: the “dance” of the litigants • The complete EDD pipeline • Alternative view of the enabling technologies June 4, 2007 Thomson Research & Development 2
EDD ― The Big Picture • Electronic Data Discovery ― – Context: Practical Research & TREC – Motivations: § (1) Recent characterization of State of the Art in EDD § (2) Informational materials available for participants in forums like TREC June 4, 2007 Thomson Research & Development 3
EDD ― The Big Picture • Electronic Data Discovery ― – Presently exist 300 -500 companies offering some form of EDD software or services. – Several offer complete services across the E-Discovery spectrum – Kroll On-Track § Recently acquired Engenium (Symetric), the concept search engine co. – LN § Acquired Applied Discovery in recent past, and also offers a full spectrum of EDD services – EDD performance bar constantly being raised – Essential need to share diverse perspectives in field with next generation researchers § What is the “dance of the litigants”? … the complete EDD pipeline? … possible interactions of the enabling technologies? June 4, 2007 Thomson Research & Development 4
Source of EDD Survey Responses • The Socha-Gelbmann Report, 2005 – In total, 240 consumers/providers of EDD software / services were contacted § 139 expressed interest in participating § 72 of those were surveyed via spreadsheet or phone interview § 3 of the final spreadsheets did not contain enough info to be used – Conducted among 69 E-Discovery consumers & providers § 24 consumers; 45 providers – Consumers § A cross-section of Am Law 200 law firms + large U. S. companies – Providers § A broad-based collection of software & service providers who market their offerings as E-Discovery tools or services June 4, 2007 Thomson Research & Development 5
E-Discovery ― Areas of Industry Strength June 4, 2007 Thomson Research & Development 6
E-Discovery ― Areas of Industry Weakness June 4, 2007 Thomson Research & Development 7
EDD Scenarios — “the dance of the litigants” Employment Discrimination Party A vs. Company B (David vs. Goliath) EDD resources Securities Fraud Gov’t vs. Company C EDD resources Intellectual Property Company D vs. Company E EDD resources June 4, 2007 Thomson Research & Development EDD resources 8
The EDD Work Flow Model Identification (relevant content and its scope) • Data Entry & Scanning June 4, 2007 Breadth and depth of discoverable materials established Data transferred Vetting performed Primary review Hard copy media Electronically from original or to reduce volume stage. Data converted (e. g. , E-Discovery Pipeline stored info. is intermediate of data (incl. transferred to OCR) or audio preserved from media to uniform filtering, deduping, dedicated records multiple sources media for analysis clustering, etc. ) repository transcribed • Data Searching based Advice to clients • Data • Media upon sources, on strategies. Processi & • Online Gatherin • Product Restorati dates, orig. file procedures for ng Review: g ion & on (data (filtering, types, key words, Hosting & Preservati conducting EDelivery trans. to a format Searching etc. on & Discovery std. media) conversio Collection processing n) Delivery of reports to clients, • E-Discovery Consulting (throughout process) systems, in diff. formats & media Thomson Research & Development 9
The EDD Work Flow Model Identification (scope, depth of information) • Data Entry & Scanning Proposed extended scope of text ‘retrieval’ task (i. e. , including filtering, organizing & report generation) E-Discovery Pipeline • Data Gatherin g Preservati on & Collection • Media Restorati on (data trans. to a std. media) • Data Processi ng (filtering, format conversio n) • Online Review: Hosting & Searching • Product ion & Delivery • E-Discovery Consulting (throughout process) June 4, 2007 Thomson Research & Development 10
E-Discovery Technology Pyramid Reporting Fourth Tier ― analyzing: consolidating & summarizing; production IV Navigating Searching & Third Tier ― organizing: classifying or clustering; tagging & linking III Indexing II Second Tier ― vetting: filtering, deduping, handling similar doc-objects Hosting I June 4, 2007 Foundation ― collecting: identification, conversion, migration Thomson Research & Development 11
Additional E-Discovery Challenges • Workflow Support • Process Efficiencies – Per Step – Overall • Tool Integration • Ease of Use – For Customers – For Support • High Value to Cost Ratio – Added value through advanced technologies • A TREC-like forum has much potential to contribute here – Both within and beyond the context of IR June 4, 2007 Thomson Research & Development 12
E-Discovery Revisited: A Broader Perspective for IR Researchers Jack G. Conrad, Thomson R&D ICAIL 07 / DESI Workshop June 4, 2007
- Slides: 13