Unstructured Information Information Audit Workflow and Discovery Peter

  • Slides: 65
Download presentation
Unstructured Information, Information Audit / Workflow and Discovery Peter Fox Xinformatics 4400/6400 Week 10,

Unstructured Information, Information Audit / Workflow and Discovery Peter Fox Xinformatics 4400/6400 Week 10, April 14, 2015 1

Contents • Information Audit • Unstructured Information 2

Contents • Information Audit • Unstructured Information 2

Businessdictionary. com • Analysis and evaluation of a firm's information system (whether manual or

Businessdictionary. com • Analysis and evaluation of a firm's information system (whether manual or computerized) to detect and rectify blockages, duplication, and leakage of information. 3

Objective? • The objectives of this audit are to improve accuracy, relevance, security, and

Objective? • The objectives of this audit are to improve accuracy, relevance, security, and timeliness of the recorded information. 4

What is an information audit? • An information audit is a process that effectively

What is an information audit? • An information audit is a process that effectively determines the current information environment within an organization by identifying and mapping: – What information is currently available? – Where the information lives? 5

Results/ format (e. g. ) • The results of an information audit are twofold:

Results/ format (e. g. ) • The results of an information audit are twofold: there is a detailed report which includes: – What information do staff acquire? Where from? At what cost? How is it used? – What information do staff create? What happens to it? Where does it go? 6

Results/ format (e. g. ) – What information is stored and why? What purpose

Results/ format (e. g. ) – What information is stored and why? What purpose will it serve? – What information is passed on or delivered? To whom? For what purpose? In what form? 7

Results/ format (e. g. ) – Is there a gap, or a match, between

Results/ format (e. g. ) – Is there a gap, or a match, between that which is available and that which is needed? – What are the skills and responsibilities of the people who carry out these tasks? – What equipment and tools do they have available (hardware, software, filing cabinets, web sites, etc)? 8

Results/ format (e. g. ) – Are there any control documents, such as policy

Results/ format (e. g. ) – Are there any control documents, such as policy statements, guidelines, service level agreements, procedures, manuals? – Is any of the information (produced, acquired, processed, re-delivered, or stored) superfluous to needs? – Are any of the information-handling activities nonproductive? 9

Results/ format (e. g. ) • There is also a detailed flow chart: –

Results/ format (e. g. ) • There is also a detailed flow chart: – A visual map that show the areas, processes, functions and activities through which information passes, clarifying gaps or fault-lines that need to be plugged or bottlenecks and overflows that need to be unblocked • Sound familiar? 10

How to use? • An information audit can be used as a baseline for

How to use? • An information audit can be used as a baseline for making major improvements to the business process of an organization. • It is extremely helpful in the identifying, buying, and implementation of enterprise systems – finance systems, portfolio management systems, document management systems, learning and knowledge management systems, etc. 11

Remember the use case doc? Developed for NASA TIWG

Remember the use case doc? Developed for NASA TIWG

Event/application Developed for NASA TIWG

Event/application Developed for NASA TIWG

Remember • It never hurts to know what you have • Build it into

Remember • It never hurts to know what you have • Build it into the routine and do not leave it as an after-thought (yep, just like documenting your code!) • To help – MIF – Management Information Format (supported by MSDN applications, like SMS) • http: //myitforum. com/myitforumwp/2014/07/0 9/what-is-a-management-information-formatmif-file/ 14

15

15

Sources and uses of unstructured information - audio, video, graphics, social media messages, etc.

Sources and uses of unstructured information - audio, video, graphics, social media messages, etc. – that which fall outside the purview of traditional databases 16

Data<->Information<->Knowledge • Where is the structure? Experience Data Creation Gathering Information Presentation Organization Knowledge

Data<->Information<->Knowledge • Where is the structure? Experience Data Creation Gathering Information Presentation Organization Knowledge Integration Conversation Context 17

Informatics • Oh, wait – people structure information! • Cognitive processes – Semiotics –

Informatics • Oh, wait – people structure information! • Cognitive processes – Semiotics – Mental representation – Intuition – Expertise • But not in the same way computers can! 18

19

19

So what happens? • If a structured representation of fundamentally unstructured information is useless?

So what happens? • If a structured representation of fundamentally unstructured information is useless? – Why would it be? • What role does visual representation play in structuring information? Hint: 20

More than 10 years ago… • Unstructured Information Management Architecture (UIMA) from IBM –

More than 10 years ago… • Unstructured Information Management Architecture (UIMA) from IBM – “Unstructured information management (UIM) applications are software systems that analyze unstructured information (text, audio, video, images, and so on) to discover, organize, and deliver relevant knowledge to the user. In analyzing unstructured information, UIM applications make use of a variety of analysis technologies, including statistical and rule-based Natural Language Processing (NLP), Information Retrieval (IR), machine learning, and ontologies. – IBM's Unstructured Information Management Architecture (UIMA) is an architectural and software framework that supports creation, discovery, composition, and deployment of a broad range of analysis capabilities and the linking of them to structured information services, such as databases or search engines. – The UIMA framework provides a run-time environment in which developers can plug in and run their UIMA component implementations, along with other independently-developed components, and with which 21 they can build and deploy UIM applications. ”

From way back… 22

From way back… 22

23

23

Data<->Information<->Knowledge • Future? Experience Data Creation Gathering Information Presentation Organization Knowledge Integration Conversation Context

Data<->Information<->Knowledge • Future? Experience Data Creation Gathering Information Presentation Organization Knowledge Integration Conversation Context 24

Reading for this week • http: //en. wikipedia. org/wiki/Information_audit • http: //www. librijournal. org/pdf/2003

Reading for this week • http: //en. wikipedia. org/wiki/Information_audit • http: //www. librijournal. org/pdf/2003 -1 pp 2338. pdf • UIMA http: //www. ibm. com/developerworks/data/do wnloads/uima/ • SPAR http: //tw. rpi. edu/web/inside/ideas/SPAREvalu ation 25

Logical Collections • The primary goal of a Management system is to abstract the

Logical Collections • The primary goal of a Management system is to abstract the physical collection into logical collections. The resulting view is a uniform homogeneous collection. • Note the analogy with logical models and information integration: so EARLY ON – Identifying naming conventions and organization – Aligning cataloguing and naming to facilitate search, access, use (who uses? ) – Provision of **contextual** information 26

Physical Handling • Map between physical and logical. • Where and who does it

Physical Handling • Map between physical and logical. • Where and who does it come from? – Is there a transfer into a physical form? – Is it backed-up, archived, cached? … – What formats? – Naming conventions – do they change? • Note analogy to physical models 27

Interoperability Support 28

Interoperability Support 28

Security • Access authorization and change verification. This is the basis of trusting your

Security • Access authorization and change verification. This is the basis of trusting your information. 29

Ownership • Who is responsible for quality and meaning 30

Ownership • Who is responsible for quality and meaning 30

Metadata • Recall metadata are data about data. • Metainformation? 31

Metadata • Recall metadata are data about data. • Metainformation? 31

Persistence • Deployment of mechanisms to counteract technology obsolescence. 32

Persistence • Deployment of mechanisms to counteract technology obsolescence. 32

Discovery • Ability to identify useful relations and information inside the collection • More

Discovery • Ability to identify useful relations and information inside the collection • More on this later in this class 33

Dissemination • Mechanisms to make aware the interested parties of changes and additions to

Dissemination • Mechanisms to make aware the interested parties of changes and additions to the collections. • Do you rely on information retrieval? The Web? 34

Summary of Information Management • • • Creation of logical collections Physical handling Interoperability

Summary of Information Management • • • Creation of logical collections Physical handling Interoperability support Security support Ownership Metadata collection, management and access. Persistence Knowledge and information discovery Dissemination and publication 35

Note for your project writeup! • Information management! Cover the 9 areas. 36

Note for your project writeup! • Information management! Cover the 9 areas. 36

Information Workflow • What is a workflow? • Why would you use it? •

Information Workflow • What is a workflow? • Why would you use it? • Key considerations for information, cf. data • Some pointers to workflow systems 37

What is a workflow? • General definition: “series of tasks performed to produce a

What is a workflow? • General definition: “series of tasks performed to produce a final outcome” (taxes? ) • Information workflow – involves people but potentially want to – Automate jobs that a person traditionally performed manually – Process large volumes of information faster than one could do by hand • NB difference from data workflows – it reaches out to encompass the user (e. g. ‘unrecorded actions’) 38

Background: Business Workflows • Example: planning a trip • Need to perform a series

Background: Business Workflows • Example: planning a trip • Need to perform a series of tasks: book a flight, reserve a hotel room, arrange for a rental car, etc. • Each task may depend on outcome of previous task – Days you reserve the hotel depend on days of the flight – If hotel has shuttle service, may not need to rent a car • Prior information, experience, preferences… 39

Tripit. com? 40

Tripit. com? 40

What about information workflows? • Perform a set of transformations/ operations on information source(s)

What about information workflows? • Perform a set of transformations/ operations on information source(s) • Examples – Generating images from raw data – Identifying areas of interest from a large information source (e. g. word cloud) – Classifying a set of objects – Querying a web service for more information on a set of objects – Many others… 41

More on Workflows • Can process many information types: – Archives – Web pages

More on Workflows • Can process many information types: – Archives – Web pages – Streaming/ real time – Images – Semiotic systems • Robust workflows depending on formal (concept and logical) models of the flow of information among components • May be simple and linear or very complex 42

Challenges • Questions: – What are some challenges for users in implementing workflows? –

Challenges • Questions: – What are some challenges for users in implementing workflows? – What are some challenges to executing these workflows? – What are limitations of writing a program? • • • Mastering a programming language Visualizing workflow Sharing/exchanging workflow Formatting issues Locating datasets, services, or functions 43

Workflow Management Systems 44

Workflow Management Systems 44

Benefits of Workflows • Documentation of aspects of analysis • Visual communication of analytical

Benefits of Workflows • Documentation of aspects of analysis • Visual communication of analytical steps • Ease of testing/debugging • Reproducibility • Reuse of part or all of workflow in a different project 45

Additional Benefits • Integration of and between multiple computing environments • ‘Automated’ access to

Additional Benefits • Integration of and between multiple computing environments • ‘Automated’ access to distributed resources via other architectural components, e. g. web services and Grid technologies • System functionality to assist with information integration of heterogeneous components and source 46

Why not just use a script? • Script does not specify low-level task scheduling

Why not just use a script? • Script does not specify low-level task scheduling and communication • May be platformdependent • Can’t be easily reused • May not have sufficient documentation to be adapted for another purpose 47

Why can a GUI be useful? • • No need to learn a programming

Why can a GUI be useful? • • No need to learn a programming language Visual representation of what workflow does Allows you to monitor workflow execution Enables user interaction (though not necessarily collaboration) • Facilitates sharing of workflows 48

Some workflow systems • • Kepler SCIRun Sciflo Triana Taverna Pegasus Some commercial tools:

Some workflow systems • • Kepler SCIRun Sciflo Triana Taverna Pegasus Some commercial tools: – Windows Workflow Foundation – Mac OS X Automator • http: //www. isi. edu/~gil/AAAI 08 Tutorial. Slides/5 -Survey. pdf • http: //www. isi. edu/~gil/AAAI 08 Tutorial. Slides/ • See reading for this week 49

Discovery • How does someone find your information? • How would you provide discovery

Discovery • How does someone find your information? • How would you provide discovery of – collections – files – ‘bits’ • How would you find -> 50

Discovery o Search (Federated Search) o Helped by o Folksonomies (user contributed) o Intelligent

Discovery o Search (Federated Search) o Helped by o Folksonomies (user contributed) o Intelligent Agents o Search Engines o Taxonomies o Find photos of Kim o Boy or girl? 51

Use cases • Find a sound recording of a swallow. • Excuse me? 52

Use cases • Find a sound recording of a swallow. • Excuse me? 52

Use cases • Find a sound recording of an African Swallow • Find a

Use cases • Find a sound recording of an African Swallow • Find a sound recording of a bird that sounds like an African Swallow • Media types – how can you discover them? 53

Use cases • Find the movie that Jean Tripplehorn first starred in/ that was

Use cases • Find the movie that Jean Tripplehorn first starred in/ that was her most successful/ was lead actress? • Has anyone gene sequenced a mouse? • Find images of primary productivity in the North Atlantic • Discovery can often involve information integration (or is it *almost always*? ) 54

Three level ‘metadata’ solution for DATA Data Discovery Data Integration Level 1: Level 2:

Three level ‘metadata’ solution for DATA Data Discovery Data Integration Level 1: Level 2: Data Registration at the Discovery Level, e. g. Volcano location and activity Data Registration at the Inventory Level, e. g. list of datasets, times, products Earth Sciences Virtual Database Level 3: Data Registration at the Item Detail Level, e. g. access to individual quantities Ontology based Data Integration Using scientific workflows A Data Warehouse where Schema heterogeneity problem is Solved; schema based integration 55 A. K. Sinha, Virginia Tech, 2006

Three level ‘metadata’ solution? Information Discovery Integration Level 1: Level 2: Level 3: Registration

Three level ‘metadata’ solution? Information Discovery Integration Level 1: Level 2: Level 3: Registration at the Discovery Level, e. g. Find the upper level entry point to a source Registration at the Inventory Level, e. g. list of datasets, using the logical organization Registration at the Item Detail Level, i. e. annotation e. g. tagging Integration using mapping management Catalog/ Index Schema based integration 56 A. K. Sinha, Virginia Tech, 2006

Information discovery • What makes discovery work? – Metadata – Logical organization – Attention

Information discovery • What makes discovery work? – Metadata – Logical organization – Attention to the fact that someone would want to discover it – It turns out that file types are a key enabler or inhibitor to discovery – Result ranking using *tuned* algorithm • What does not work? – Result ranking algorithms that depend on unconventional information types (icon, index, symbol) 57

Federated search • “is the simultaneous search of multiple online databases or web resources

Federated search • “is the simultaneous search of multiple online databases or web resources and is an emerging feature of automated, web-based library and information retrieval systems. It is also often referred to as a portal or a federated search engine. ” wikipedia • Libraries have been doing this for a long time (Z 39. 50, ISO 23950) • Key is consistent search metadata fields (keywords) 58 • E. g. Geospatial One Stop http: //www. geodata. gov

Smart search • Semantically aware search, e. g. http: //noesis. itsc. uah. edu ,

Smart search • Semantically aware search, e. g. http: //noesis. itsc. uah. edu , http: //eie. cos. gmu. edu (Water -> Semantic Search) • Faceted search, e. g. mspace (http: //mspace. fm ), exhibit (MIT), S 2 S (RPI; http: //aquarius. tw. rpi. edu/s 2 s ) 59

NOESIS 60

NOESIS 60

Faceted search logd. tw. rpi. edu 61

Faceted search logd. tw. rpi. edu 61

Summary - discovery • Useful to write a few discovery use cases to drive

Summary - discovery • Useful to write a few discovery use cases to drive how your design is developed • Evolution of your role in facilitating discovery and what/ how others implement access to your information 62

Reading for this week • Is retrospective 63

Reading for this week • Is retrospective 63

Check in for Project Assignment • Analysis of existing information system content and architecture,

Check in for Project Assignment • Analysis of existing information system content and architecture, critique, redesign and prototype redeployment • Or a new use case, development, etc. 64

What is next • Today – project group meetings/ check in • April 21

What is next • Today – project group meetings/ check in • April 21 – Information Quality, Uncertainty and Bias • April 28 – course summary (written part of group project due) • May 5 – final project presentations (BE ON TIME, i. e. 5 -10 mins BEFORE 9 AM) – Be prepared to be asked (and answer) questions 65