The Intersection of Language Algorithms and Design Marti

Measure what they can, not what they should Do not adjust agilely to error

In the realm of language and text analysis …

How often do we have the designs we want versus those our algorithms can

Tag Cloud Order: Surprise! • 7 interviewees DID NOT REALIZE alphabetical ordering • “What

Main Reasons For Using: • To signal the presence of tags on the site

New Perspective: Tag Clouds are Social! • It’s not about the “information”! • Self-reflection

Word Size Variations in Word Clouds are Problematic: Jonathan Schwabish, http: //www. allanalytics. com/author.

Answer: Hamlet’s famous “to be or not to be” soliloquy. But you couldn’t tell.

FEINBERG ON WORDLES “The commonly used trick of scaling by the square root of

OTHER STUDIES Feinberg, Wattenberg, and Viegas 2009 surveyed 4, 306 Wordle users and found:

Why Are They Used Generally? • Word clouds are easy to make. • Word

WHY THIS MATTERS Word Clouds continue to be used as evidence in scientific settings.

Presented at Vis 2018 The word cloud “shows a summary of tweets” Urban Space

Presented at Vis 2018 “We see this distribution covers a variety of ethnic surnames,

Presented at ACL 2018 for 28 seconds. “Here we find differences among the words

WHY? We wouldn’t plot numerical axes incorrectly. Why is it ok to show text

Why Are They Used in Science? • Word clouds are easy • Word clouds

Almost any text outcome can look ok: People are great at making up associations

These are accurate, but do not make the words as prominent. An approximate alternative

New work Goal: retain the engaging aspect of word clouds, while imparting some useful

HYPOTHESIS Organizing the words both semantically and visually will improve understanding while retaining engagement.

Evaluating Word Clouds • Most papers are vague about this – “gist”, “summary”, “navigate”,

A NEW TASK Given a set of words, identify the category menu waiter dishes

Average score (out of 5) Hypothesis: standard wordle worst Color + space best Mixed

White Space Separation Color Mapped, Spatial Jumble Color Mapped, Spatial Organized All views had

Average score (out of 5) Hypothesis was that column layout would outperform Spatial Organized.

Key Findings • Visually grouped layouts, compared to ungrouped layouts, are more effective in

Why Isn’t This Done Now? It is not easy to make word clouds this

Creating Their Own Reality As Cathy O’Neil says, algorithms can create their own reality.

Kaye, et al. "Nokia internet pulse: a long term deployment and iteration of a

Creating Their Own Reality Pulling out popular words is often misleading. It often reflects

Remember old web search? (Sometime before 2004)

We use what the text analysis gives us We wanted it to do question

Text Clustering Fully Automated Difficult to interpret Inconsistent Uneven coverage Cannot combine Faceted Navigation

The Idea of Facets • Facets are a way of labeling data – A

Advantages of the Approach • Understandable categories – Helps explore the collection – Evokes

Advantages of the Approach • Systematically integrates search results: – reflects the structure of

However … you have to build up faceted metadata, and that takes some external

Experiment 1: Wordles vs Semantic Grouping

Wordles had far lower performance score than any Word. Zones. Interactions between font size

Slides: 90

Download presentation

The Intersection of Language, Algorithms, and Design Marti Hearst, UC Berkeley UCSD Design@Large October 31, 2018

Measure what they can, not what they should Do not adjust agilely to error Not answerable, secret formula Create their own distorted reality “Good” Model: baseball stats WMD: US News College Rankings

In the realm of language and text analysis …

How often do we have the designs we want versus those our algorithms can (easily) make?

In 2007 I was puzzled:

WHY DO THEY LOOK LIKE THIS?

So I did an investigation

Tag Cloud Order: Surprise! • 7 interviewees DID NOT REALIZE alphabetical ordering • “What order are tags shown in? ” – – hadn’t thought about it don’t think about tag clouds that way random ordered by semantic similarity • This result was also found by Wattenberg et al. 2008

Main Reasons For Using: • To signal the presence of tags on the site • An inviting way to get people interacting with the site • A good way to get the gist of the site • Easy to implement

New Perspective: Tag Clouds are Social! • It’s not about the “information”! • Self-reflection • Showing off topics to others, socially. • Probably a fad.

Ten years later …

Word Size Variations in Word Clouds are Problematic: Jonathan Schwabish, http: //www. allanalytics. com/author. asp? section_id=3072

What is this a summary of?

Answer: Hamlet’s famous “to be or not to be” soliloquy. But you couldn’t tell. Why not?

Yang et al. Euroviz 2018

FEINBERG ON WORDLES “The commonly used trick of scaling by the square root of the word’s weight (to compensate for the fact that words have area, not just length) simply makes a Wordle look boring. ” “There’s not much evidence that [tag clouds are] all that useful for navigation or other interactive tasks. … Once I decided to build a system for viewing text rather than tags, it seemed superfluous to have the words do anything other than merely exist on the page. I decided I would design something primarily for pleasure. ” “Color means absolutely nothing in Wordle. ” it is used for contrast and aesthetics. Some of Wordle’s success is due to its “its one-paste /one-click instant gratification. ” Feinberg, Ch 3, Beautiful Visualization, 2010

OTHER STUDIES Feinberg, Wattenberg, and Viegas 2009 surveyed 4, 306 Wordle users and found: • • • 50% did not understand what font size indicated 57% wrote the text they visualized Color “often” interpreted as having meaning Other Studies find: • • • Varying font size detrimental to understanding statistics Font size can guide visual search for certain tasks, but users prefer search boxes for word lookup tasks Column layouts or bar charts are better for recognizing frequencies of values

Why Are They Used Generally? • Word clouds are easy to make. • Word clouds are visually engaging. • Word clouds are commonly used.

WHY THIS MATTERS Word Clouds continue to be used as evidence in scientific settings.

Presented at Vis 2018 The word cloud “shows a summary of tweets” Urban Space Explorer: A Visual Analytics System for Urban Planning , Karduni et al. , IEEE CG&A 2018

Presented at Vis 2018 “We see this distribution covers a variety of ethnic surnames, perhaps giving insight into how immigrants migrated after coming to Ellis Island. ” Name Profiler Toolkit, Wang et al. , IEEE CG&A 2018

Presented at ACL 2018 for 28 seconds. “Here we find differences among the words in large letters. We find for example, learning networks and embeddings being heavily represented in ACL 2018 titles. ”

WHY? We wouldn’t plot numerical axes incorrectly. Why is it ok to show text in this way?

Why Are They Used in Science? • Word clouds are easy • Word clouds are visually engaging. • Word clouds are commonly used. • There is no alternative with the same properties. • Training in usability is generally lacking. • Also …

Almost any text outcome can look ok: People are great at making up associations among words. It’s hard to conjure what isn’t there: People are really bad at noticing what is missing from text collections. www. randomlists. com

Let’s return to this example:

What are alternatives?

These are accurate, but do not make the words as prominent. An approximate alternative …

New work Goal: retain the engaging aspect of word clouds, while imparting some useful semantic information. Hearst et al, An Evaluation of Semantically Grouped Word Cloud Designs, under review, TGCV

HYPOTHESIS Organizing the words both semantically and visually will improve understanding while retaining engagement.

Evaluating Word Clouds • Most papers are vague about this – “gist”, “summary”, “navigate”, “see trends” • Most evaluations do not assess these; instead: – Identify the largest word – Identify a given word • How to evaluate more deeply?

A NEW TASK Given a set of words, identify the category menu waiter dishes tablecloth bill restaurant

Compare Effects of Spacing and Color

Average score (out of 5) Hypothesis: standard wordle worst Color + space best Mixed color, but with coherent color assignments, falls in between. Results were consistent with hypothesis. 88% preferred the column layout for task.

White Space Separation Color Mapped, Spatial Jumble Color Mapped, Spatial Organized All views had larger font size variation than prior study

Average score (out of 5) Hypothesis was that column layout would outperform Spatial Organized. Column layout scored best; S. O. significantly better than Spatial Jumbled (Wordle) but not significantly different from Column. 90% preferred color column for “task” 56% preferred color column for “visually pleasing” With the rest split between the other two.

Which of the following do you prefer?

b a c d

Key Findings • Visually grouped layouts, compared to ungrouped layouts, are more effective in time-constrained category understanding tasks. • Visual grouping can be achieved by separating categories via white space or color or both. • For analytical tasks, layouts with white space tends to be preferred over spatially arranged groupings. • These results hold for semantically distinct categories.

Why Isn’t This Done Now? It is not easy to make word clouds this way. In general, it is not easy to make language algorithms work well.

Creating Their Own Reality As Cathy O’Neil says, algorithms can create their own reality. This is what I believe happens with cheap and easy text analysis algorithms.

Kaye, et al. "Nokia internet pulse: a long term deployment and iteration of a twitter visualization. " CHI'12 Ext. Abst.

The. Mail, Viegas et al, CHI 2006

Creating Their Own Reality Pulling out popular words is often misleading. It often reflects the concepts for which there are not other words. And it can create a feedback loop: this is popular, therefore important. It is also a substitute for truly identifying the underlying meaning.

How often do we have the designs we want versus those our algorithms can (easily) make?

Remember old web search? (Sometime before 2004)

We use what the text analysis gives us We wanted it to do question answering. It couldn’t. We used what it gave us anyhow. And we learned keywordese.

(2018)

Organizing Search Results

Text Clustering Fully Automated Difficult to interpret Inconsistent Uneven coverage Cannot combine Faceted Navigation Some manual aspects Understandable Consistent Even coverage Can mix and match

Clustering (The Hope)

Clustering (The Reality)

The Idea of Facets • Facets are a way of labeling data – A kind of Metadata (data about data) – Can be thought of as properties of items • Facets vs. Categories – Items are placed INTO a category system – Multiple facet labels are ASSIGNED TO items

The Idea of Facets

Create Faceted Metadata

Assign Facets to Items

Advantages of the Approach • Understandable categories – Helps explore the collection – Evokes a feeling of “browsing the shelves” • Helps avoid feelings of being lost. • Can’t end up with empty results sets. Hearst, IEEE Data Eng Bltn 2000; Hearst et al. CACM 2002; Yee, et al. CHI 2003;

Advantages of the Approach • Systematically integrates search results: – reflects the structure of the info architecture – retain the context of previous interactions • Gives users control and flexibility – Over order of metadata use – Over when to navigate vs. when to search • Allows integration with advanced methods – Collaborative filtering, predicting users’ preferences

However … you have to build up faceted metadata, and that takes some external knowledge and a bit of work.

Eighteen Years Later • Faceted Navigation is still flourishing in the most high profile ecommerce sites. • Text Clustering only appears… in research projects! • Now there is a replacement unsupervised algorithm called LDA. And it is being used by social scientists to produce what I believe are questionable results.

In Summary

Measure what they can, not what they should Do not adjust agilely to error Not answerable, secret formula Create their own distorted reality How do our text analysis algorithms differ?

Experiment 1: Wordles vs Semantic Grouping

Wordles had far lower performance score than any Word. Zones. Interactions between font size and white space widths. 86% of participants preferred a column layout with single font.