Document Collections cs 5984 Information Visualization Chris North

  • Slides: 23
Download presentation
Document Collections cs 5984: Information Visualization Chris North

Document Collections cs 5984: Information Visualization Chris North

Where are we? • • Multi-D 1 D 2 D Hierarchies/Trees Networks/Graphs Document collections

Where are we? • • Multi-D 1 D 2 D Hierarchies/Trees Networks/Graphs Document collections 3 D • • • Design Principles Empirical Evaluation Java Development Visual Overviews Multiple Views Peripheral Views

Structured Document Collections • Multi-dimensional • author, title, date, journal, … • Trees •

Structured Document Collections • Multi-dimensional • author, title, date, journal, … • Trees • dewey decimal • Networks • web, citations

Envision • Ed Fox, et al. • Multi-D • similar to Spotfire

Envision • Ed Fox, et al. • Multi-D • similar to Spotfire

Unstructured Document Collections • Focus on Full Text • Examples: • digital libraries, encyclopedia

Unstructured Document Collections • Focus on Full Text • Examples: • digital libraries, encyclopedia • Web, homepages, photo collections • Tasks: • • search, keyword Browse Themes, subjects, topics, library coverage Size, distributions

Visualization Strategies • • • today Cluster Maps Keyword Query today Relationships Reduced representation

Visualization Strategies • • • today Cluster Maps Keyword Query today Relationships Reduced representation User controlled layout

Cluster Map • Create a “map” of the document collection • Similar documents near

Cluster Map • Create a “map” of the document collection • Similar documents near • Dissimilar document far • “Grocery store” concept

Document Vectors • • “aardvark” “banana” “chris” … Doc 1 1 2 0 Doc

Document Vectors • • “aardvark” “banana” “chris” … Doc 1 1 2 0 Doc 2 2 1 0 Doc 3 0 0 3 • Similarity between pair of docs = • • Layout documents in 2 -D map by similarity • similar to spring model for graph layout …

Cluster Algorithms • Partition clustering: Partition into k subsets • Pick k seeds •

Cluster Algorithms • Partition clustering: Partition into k subsets • Pick k seeds • Iteratively attract nearest neighbors • Hierarchical clustering: Dendrogram • Group nearest-neighbor pair • Iterate

Kohonen Maps • Xia Lin, “Document Space” • samal, ying • http: //faculty. cis.

Kohonen Maps • Xia Lin, “Document Space” • samal, ying • http: //faculty. cis. drexel. edu/sitemap/index. html

Themescapes, Cartia • PNL • Mountain height = Cluster size

Themescapes, Cartia • PNL • Mountain height = Cluster size

Web. SOM • http: //websom. hut. fi/websom/

Web. SOM • http: //websom. hut. fi/websom/

Map. net • http: //maps. map. net/start

Map. net • http: //maps. map. net/start

Cluster Map • Good: • • Map of collection Major themes and sizes Relationships

Cluster Map • Good: • • Map of collection Major themes and sizes Relationships between themes Scales up • Bad: • Where to locate documents with multiple themes? » Both mountains, between mountains, …? • Relationships between documents, within documents? • Algorithm becomes (too) critical

Keyword Query • Keyword query, Search engine • Rank ordered list • “Information Retrieval”

Keyword Query • Keyword query, Search engine • Rank ordered list • “Information Retrieval”

Tilebars • Hearst, “Tilebars” • reenal, xueqi • http: //elib. cs. berkeley. edu/tilebars/

Tilebars • Hearst, “Tilebars” • reenal, xueqi • http: //elib. cs. berkeley. edu/tilebars/

VIBE • Korfhage, http: //www. pitt. edu/~korfhage/interfaces. html • Documents located between query keywords

VIBE • Korfhage, http: //www. pitt. edu/~korfhage/interfaces. html • Documents located between query keywords using spring model

VR-VIBE

VR-VIBE

Keyword Query • Good: • Reduces the browsing space • Map according to user’s

Keyword Query • Good: • Reduces the browsing space • Map according to user’s interests • Bad: • What keywords do I use? • What about other related documents that don’t use these keywords? • No initial overview • Mega-hit, zero-hit problem

Assignment • Thurs: Document Collections • Bederson, “Image Browsing” » Rui, anusha • Card,

Assignment • Thurs: Document Collections • Bederson, “Image Browsing” » Rui, anusha • Card, “Web Book and Web Forager” » mrinmayee, ming • Demo your hw 3: tues or thurs

Next Week • Tues: 3 -D data • Kniss, “Interactive Volume Rendering with Direct

Next Week • Tues: 3 -D data • Kniss, “Interactive Volume Rendering with Direct Manip” » xueqi, mahesh • Thurs: Workspaces • Robertson, “Task Gallery” » supriya, varun • Upson, “AVS” » christa, jun • Thanksgiving break • Tues 27: Debates • Kobsa, “Empirical comparison of comm infovis systems” » kunal, zhiping

Upcoming Sched • • • Tues: 3 -D data Thurs: Workspaces Thanksgiving break Tues

Upcoming Sched • • • Tues: 3 -D data Thurs: Workspaces Thanksgiving break Tues 27: Debates Thurs 29: How (not) to lie with visualization Dec: project presentations • Dec 7: CHI 2 -pagers due, student posters due