An ERFlow Diagram for Big Data Carlos Ordonez
An ER-Flow Diagram for Big Data Carlos Ordonez Sikder Tahsin Al-Amin Ladjel Bellatreche 1
Motivation • There is no data model for big data: just processing, just feature engieering, no data(base) design • ER diagram is fundamental and the foundation of many diagrams in software engineering • Tons of temporary big data sets and source code • Difficult to understand reuse existing temporary data sets, files and source code • Analysts require an bird’s eye view of big data 2
Our proposed ER-Flow Diagram • • • Modern UML notation: cleaner Data-oriented view Consider data flow in multi-step processing Minimal changes: flow direction and entity labeling Feasible automatic construction from source code (reverse engineering) and existing entities 3
ER-Flow diagram example 4
Pros and cons • +: data&flow together, ER diagrams can work with any data (beyond relational databases), intuitive, dataoriented, construction can be partially automated • -: PK and equivalent of FKs may have to be discovered, diagram may not scale to hundreds of entities (but can group into subject areas, name spaces), developer variable/file names may not be intuitive, entity resolution unexplored on source code, semantics and abstraction are not well captured, but we believe future research can tackle them 5
- Slides: 5