The Intersection of Language Algorithms and Design Marti
- Slides: 90
The Intersection of Language, Algorithms, and Design Marti Hearst, UC Berkeley UCSD Design@Large October 31, 2018
Measure what they can, not what they should Do not adjust agilely to error Not answerable, secret formula Create their own distorted reality “Good” Model: baseball stats WMD: US News College Rankings
In the realm of language and text analysis …
How often do we have the designs we want versus those our algorithms can (easily) make?
In 2007 I was puzzled:
WHY DO THEY LOOK LIKE THIS?
So I did an investigation
Tag Cloud Order: Surprise! • 7 interviewees DID NOT REALIZE alphabetical ordering • “What order are tags shown in? ” – – hadn’t thought about it don’t think about tag clouds that way random ordered by semantic similarity • This result was also found by Wattenberg et al. 2008
Main Reasons For Using: • To signal the presence of tags on the site • An inviting way to get people interacting with the site • A good way to get the gist of the site • Easy to implement
New Perspective: Tag Clouds are Social! • It’s not about the “information”! • Self-reflection • Showing off topics to others, socially. • Probably a fad.
Ten years later …
Word Size Variations in Word Clouds are Problematic: Jonathan Schwabish, http: //www. allanalytics. com/author. asp? section_id=3072
What is this a summary of?
Answer: Hamlet’s famous “to be or not to be” soliloquy. But you couldn’t tell. Why not?
Yang et al. Euroviz 2018
FEINBERG ON WORDLES “The commonly used trick of scaling by the square root of the word’s weight (to compensate for the fact that words have area, not just length) simply makes a Wordle look boring. ” “There’s not much evidence that [tag clouds are] all that useful for navigation or other interactive tasks. … Once I decided to build a system for viewing text rather than tags, it seemed superfluous to have the words do anything other than merely exist on the page. I decided I would design something primarily for pleasure. ” “Color means absolutely nothing in Wordle. ” it is used for contrast and aesthetics. Some of Wordle’s success is due to its “its one-paste /one-click instant gratification. ” Feinberg, Ch 3, Beautiful Visualization, 2010
OTHER STUDIES Feinberg, Wattenberg, and Viegas 2009 surveyed 4, 306 Wordle users and found: • • • 50% did not understand what font size indicated 57% wrote the text they visualized Color “often” interpreted as having meaning Other Studies find: • • • Varying font size detrimental to understanding statistics Font size can guide visual search for certain tasks, but users prefer search boxes for word lookup tasks Column layouts or bar charts are better for recognizing frequencies of values
Why Are They Used Generally? • Word clouds are easy to make. • Word clouds are visually engaging. • Word clouds are commonly used.
WHY THIS MATTERS Word Clouds continue to be used as evidence in scientific settings.
Presented at Vis 2018 The word cloud “shows a summary of tweets” Urban Space Explorer: A Visual Analytics System for Urban Planning , Karduni et al. , IEEE CG&A 2018
Presented at Vis 2018 “We see this distribution covers a variety of ethnic surnames, perhaps giving insight into how immigrants migrated after coming to Ellis Island. ” Name Profiler Toolkit, Wang et al. , IEEE CG&A 2018
Presented at ACL 2018 for 28 seconds. “Here we find differences among the words in large letters. We find for example, learning networks and embeddings being heavily represented in ACL 2018 titles. ”
WHY? We wouldn’t plot numerical axes incorrectly. Why is it ok to show text in this way?
Why Are They Used in Science? • Word clouds are easy • Word clouds are visually engaging. • Word clouds are commonly used. • There is no alternative with the same properties. • Training in usability is generally lacking. • Also …
Almost any text outcome can look ok: People are great at making up associations among words. It’s hard to conjure what isn’t there: People are really bad at noticing what is missing from text collections. www. randomlists. com
Let’s return to this example:
What are alternatives?
These are accurate, but do not make the words as prominent. An approximate alternative …
New work Goal: retain the engaging aspect of word clouds, while imparting some useful semantic information. Hearst et al, An Evaluation of Semantically Grouped Word Cloud Designs, under review, TGCV
HYPOTHESIS Organizing the words both semantically and visually will improve understanding while retaining engagement.
Evaluating Word Clouds • Most papers are vague about this – “gist”, “summary”, “navigate”, “see trends” • Most evaluations do not assess these; instead: – Identify the largest word – Identify a given word • How to evaluate more deeply?
A NEW TASK Given a set of words, identify the category menu waiter dishes tablecloth bill restaurant
Compare Effects of Spacing and Color
Average score (out of 5) Hypothesis: standard wordle worst Color + space best Mixed color, but with coherent color assignments, falls in between. Results were consistent with hypothesis. 88% preferred the column layout for task.
White Space Separation Color Mapped, Spatial Jumble Color Mapped, Spatial Organized All views had larger font size variation than prior study
Average score (out of 5) Hypothesis was that column layout would outperform Spatial Organized. Column layout scored best; S. O. significantly better than Spatial Jumbled (Wordle) but not significantly different from Column. 90% preferred color column for “task” 56% preferred color column for “visually pleasing” With the rest split between the other two.
Which of the following do you prefer?
b a c d
Key Findings • Visually grouped layouts, compared to ungrouped layouts, are more effective in time-constrained category understanding tasks. • Visual grouping can be achieved by separating categories via white space or color or both. • For analytical tasks, layouts with white space tends to be preferred over spatially arranged groupings. • These results hold for semantically distinct categories.
Why Isn’t This Done Now? It is not easy to make word clouds this way. In general, it is not easy to make language algorithms work well.
Creating Their Own Reality As Cathy O’Neil says, algorithms can create their own reality. This is what I believe happens with cheap and easy text analysis algorithms.
Kaye, et al. "Nokia internet pulse: a long term deployment and iteration of a twitter visualization. " CHI'12 Ext. Abst.
The. Mail, Viegas et al, CHI 2006
Creating Their Own Reality Pulling out popular words is often misleading. It often reflects the concepts for which there are not other words. And it can create a feedback loop: this is popular, therefore important. It is also a substitute for truly identifying the underlying meaning.
How often do we have the designs we want versus those our algorithms can (easily) make?
Remember old web search? (Sometime before 2004)
We use what the text analysis gives us We wanted it to do question answering. It couldn’t. We used what it gave us anyhow. And we learned keywordese.
(2018)
Organizing Search Results
Text Clustering Fully Automated Difficult to interpret Inconsistent Uneven coverage Cannot combine Faceted Navigation Some manual aspects Understandable Consistent Even coverage Can mix and match
Clustering (The Hope)
Clustering (The Hope)
Clustering (The Reality)
Clustering (The Reality)
Clustering (The Reality)
The Idea of Facets • Facets are a way of labeling data – A kind of Metadata (data about data) – Can be thought of as properties of items • Facets vs. Categories – Items are placed INTO a category system – Multiple facet labels are ASSIGNED TO items
The Idea of Facets
Create Faceted Metadata
Assign Facets to Items
Advantages of the Approach • Understandable categories – Helps explore the collection – Evokes a feeling of “browsing the shelves” • Helps avoid feelings of being lost. • Can’t end up with empty results sets. Hearst, IEEE Data Eng Bltn 2000; Hearst et al. CACM 2002; Yee, et al. CHI 2003;
Advantages of the Approach • Systematically integrates search results: – reflects the structure of the info architecture – retain the context of previous interactions • Gives users control and flexibility – Over order of metadata use – Over when to navigate vs. when to search • Allows integration with advanced methods – Collaborative filtering, predicting users’ preferences
However … you have to build up faceted metadata, and that takes some external knowledge and a bit of work.
Eighteen Years Later • Faceted Navigation is still flourishing in the most high profile ecommerce sites. • Text Clustering only appears… in research projects! • Now there is a replacement unsupervised algorithm called LDA. And it is being used by social scientists to produce what I believe are questionable results.
In Summary
Measure what they can, not what they should Do not adjust agilely to error Not answerable, secret formula Create their own distorted reality How do our text analysis algorithms differ?
Experiment 1: Wordles vs Semantic Grouping
Wordles had far lower performance score than any Word. Zones. Interactions between font size and white space widths. 86% of participants preferred a column layout with single font.
- Mardipäeva kombed
- Cfg closed under intersection
- Principles of intersection design
- 1001 design
- Introduction of design and analysis of algorithms
- Binary search in design and analysis of algorithms
- Introduction to the design and analysis of algorithms
- Design and analysis of algorithms
- Design and analysis of algorithms
- Design and analysis of algorithms
- Design techniques of algorithms
- Algorithms for visual design
- Mat256
- Jose marti ubc
- Zemaites marti pagrindine mintis
- Luni marti miercuri joi vineri sambata duminica versuri
- Happy end kurt marti analyse
- Gary marti
- Marti reinfeld
- Tarp pilku debesu skaidres
- Marti a. hearst
- Escola jaume balmes sant martí sarroca
- Fünfzehn kurzgeschichte charakterisierung
- Marti winer
- Martti translation service
- Rafael marti ciruelos
- Luni marti miercuri joi sau vineri
- Marti hearst
- Scoala jose marti
- Erwartungsbruch
- Monumento a josé martí
- Marti tehnic
- Probability union and intersection
- Rational number venn diagram
- Intersection of solids problems
- Dependent events and conditional probability
- Draw and label a figure for each relationship
- Simplify union and intersection
- Dc load line
- Efficient private matching and set intersection
- The intersection of the wbs and the obs is called the
- The intersection of wbs and obs is called
- Double-napped cone
- Union and intersection
- Thelastlecture.com
- Union and intersection
- Sample outcome
- Hát kết hợp bộ gõ cơ thể
- Frameset trong html5
- Bổ thể
- Tỉ lệ cơ thể trẻ em
- Voi kéo gỗ như thế nào
- Thang điểm glasgow
- Chúa yêu trần thế
- Môn thể thao bắt đầu bằng từ chạy
- Thế nào là hệ số cao nhất
- Các châu lục và đại dương trên thế giới
- Cong thức tính động năng
- Trời xanh đây là của chúng ta thể thơ
- Mật thư anh em như thể tay chân
- 101012 bằng
- độ dài liên kết
- Các châu lục và đại dương trên thế giới
- Thơ thất ngôn tứ tuyệt đường luật
- Quá trình desamine hóa có thể tạo ra
- Một số thể thơ truyền thống
- Cái miệng bé xinh thế chỉ nói điều hay thôi
- Vẽ hình chiếu vuông góc của vật thể sau
- Biện pháp chống mỏi cơ
- đặc điểm cơ thể của người tối cổ
- V. c c
- Vẽ hình chiếu đứng bằng cạnh của vật thể
- Phối cảnh
- Thẻ vin
- đại từ thay thế
- điện thế nghỉ
- Tư thế ngồi viết
- Diễn thế sinh thái là
- Các loại đột biến cấu trúc nhiễm sắc thể
- Các số nguyên tố là gì
- Tư thế ngồi viết
- Lời thề hippocrates
- Thiếu nhi thế giới liên hoan
- ưu thế lai là gì
- Hổ sinh sản vào mùa nào
- Sự nuôi và dạy con của hổ
- Hệ hô hấp
- Từ ngữ thể hiện lòng nhân hậu
- Thế nào là mạng điện lắp đặt kiểu nổi
- Bloom filter intersection
- Conflict points at intersection