Uncorking the Varietals Social Tagging Folksonomies Controlled Vocabularies
Uncorking the Varietals: Social Tagging, Folksonomies & Controlled Vocabularies Margaret Maurer Head, Catalog and Metadata Kent State University Libraries and Media Services April 2008 1
In wine making - What is a Varietal? n A wine made from a single, named grape variety. ¨ Cabernet Sauvignon wines are made from cabernet sauvignon grapes ¨ Chardonnay wines are made from chardonnay grapes April 2008 2
In information seeking – on the Web or in the catalog n Access and identification systems may be controlled by librarians–controlled vocabularies n Access and identification systems may be dynamically generated by users–social tagging, folksonomies n These are different varieties of access and identification systems April 2008 3
This presentation n n Controlled vocabularies Social Tagging Folksonomies My recommendations First we’ll talk about the cabernet sauvignons – the controlled vocabs April 2008 4
Purpose of a controlled vocabulary n n To create sets of objects To serve as a bridge between the searcher’s language and the author’s language To provide consistency To improve precision and recall April 2008 5
Characteristics of a controlled vocabulary n n n Features a single, authorized form of heading Often features a syndetic structure of crossreferences Based on belief that the successful use of the catalog is based on the quality of the individual records April 2008 6
The authority record structure n n n Records the standardized form Ensures the gathering together of records via that access point Enables standardized catalog records Documents decisions taken Records all other heading forms and provides links from them to the standardized form April 2008 7
Benefits of controlled vocabularies n n Promotes discovery generally Promotes discovery when the aboutness of something has nothing to do with words in the resource or its representation ¨ Imaginative literature (Genre headings) ¨ Humanities n Promotes pre-coordinated displays expand access–http: //cinema. library. ucla. edu April 2008 8
Benefits when combined with keyword searching n n Keywords hook into strings of terms most efficiently Users can be routed by precoordinated strings April 2008 9
Controlled vocabularies support faceted catalogs Encore n Evergreen n Endeca n World. Cat Local All provide hyperlinks to authorized headings n April 2008 10
Weaknesses of controlled vocabularies n n n The artificially controlled language is not necessarily natural language—Cookery anyone? Subject searches are the most problematic for users It may work better in theory than in practice It is costly to perform necessary maintenance Cost is seen to outweigh the benefits by many administrators April 2008 11
Library of Congress Subject Headings - LCSH n n Has a long and well-documented history Commonly used Is contained in millions of bibliographic records Strong institutional support from LC April 2008 12
More benefits of LCSH n n n The rich vocabulary covers most subjects It imposes synonym and homograph control There are machine assisted authority control mechanisms There is pre-coordination with LCC The music subject heading system is well developed April 2008 13
Weaknesses of LCSH n n It is a generalist taxonomy that can’t always provide needed granularity Terminology currency It doesn’t allow for post-search coordination (it is pre-coordinated) It suffers from LC Collection bias April 2008 14
More weaknesses of LCSH n Training needed ¨ Requires some orientation to use effectively ¨ Is not always accurately applied by catalogers n Maintenance ¨ It April 2008 is difficult to maintain when changes occur 15
Authority control outside the catalog n Data critical mass tipping point? ¨ Homogeneity of data in terms of subject matter ¨ Requirements within data community’s users for specificity ¨ Size ¨ Computing power n Wikipedia’s “disambiguation” April 2008 16
Zoom. Info http: //www. zoominfo. com/Default. aspx April 2008 17
April 2008 18
What if we did open up our authority files to the web? n n National Library of Australia’s People Australia Project http: //www. nla. gov. au/initiatives/peopleaustralia/ Wikipedia Persondata-Tool http: //www. ifla. org/IV/ifla 73/papers/113 Danowski-en. pdf April 2008 19
Is ontology overrated? n n Physicality requires ontologies for searching, but systems with hyperlinks do not Browse versus search may eliminate the need for creating lists of authorized headings April 2008 20
Ontological classification n Works well when the domain to be organized is small, has formal categories, has stable entities, is restricted and has clear edges n Does not work well when the domain to be organized is large, has no formal categories, is unstable, is unrestricted and has no clear edges April 2008 21
Ontological classification n Works well when the participants are expert catalogers, authoritative sources of judgement, coordinated users or expert users n Does not work well when the participants are uncoordinated, armature, naïve or nonauthoritative April 2008 22
Now we talk about the Chardonnays – social tagging and folksonomies April 2008 23
What are tags? n Keywords or terms associated with or assigned to a piece of information n They enable keywordbased classification and search of information April 2008 24
Common Web sites that use tags include n n n Del. icio. us – Social bookmarking site Flickr – Image tagging Library. Thing Gmail - Webmail You. Tube April 2008 25
Tags, and therefore social tags and folksonomies are n n n Dynamic categorization systems Often created on-the-fly Chosen as relevant to the user – not to the creator, cataloger or researcher A social activity (more on this later) Hopefully one small step toward a more interactive and responsive library system April 2008 26
Social tags are n n n Non-hierarchical A way to create links between items by the creation of sets of objects A means of connecting with others interested in the same things April 2008 27
Way baaack in 2003… n n n Del. icio. us includes identity in its social bookmarking Flickr includes tags Lists of tags became a tool for serendipitous discovery (folksonomies) April 2008 28
Why is tagging so popular? n n It is easy and enjoyable It has a low cognitive cost It is quick to do It provides self and social feedback immediately April 2008 29
People tag things n n n To find them again To get exposure and traffic To voice their opinions Incidentally as they perform other tasks To take advantage of functionality built on top of a folksonomy To play a game or earn points April 2008 30
Putting the social in tagging n n Tags allow for social interaction because when we navigate by tags we are directly connecting with others People tag for their own benefit April 2008 31
Don’t confuse tags with keywords or full-text searching n n n Keywords are behind the scenes, tags are often visibly aggregated for use and browsing Keywords can not be hyper-linked Keywords imply searching, tags imply linking Full-text searching is passive, tagging is active It’s more about connecting items rather than categorizing them. April 2008 32
What is a Folksonomy? n Folksonomy refers to an “emergent, grassroots taxonomy” ¨ An aggregate collections of tags ¨ A bottom-up categorical structure development ¨ An emergent thesaurus n A term coined by Thomas Vander Wal April 2008 33
How do folksonomies work? n n n The searcher defines the access, but The aggregation of the terms has public value It’s a typically messy democratic approach April 2008 34
What makes folksonomies popular? Their dynamic nature works well with dynamic resources n They’re personal n They lower barriers to cooperation n April 2008 35
Tagging and the consequent folksonomies work best when It’s easy to do n It’s not commercial in nature n Taggers have ownership n Taggers are more likely to tag their own stuff than they are your stuff n It has been shown to work well on the Web n April 2008 36
The unexpected development: terminological consensus n n n Collective action yields common terms Stabilization may be caused by imitation and shared knowledge The wisdom of the crowd April 2008 37
Is your tagging influenced by my tagging? n n Of course it is! People are beginning tag in ways that make it easier for others to fine like stuff Shared meaning consequently evolves for tags Most used tags become most visible April 2008 38
Strengths of folksonomies n n Cost-effective way to organize Internet Social benefits It’s inclusive For many environments, they work well April 2008 39
Issues with meaning n n n They do not yield the level of clarity that controlled vocabularies do Term ambiguity – words with multiple meanings No synonym control April 2008 40
Issues with specificity n n n Variable specificity for related terms Broadness of terms impacts precision – terms are often imprecise Mixed perspectives April 2008 41
Issues with structure n n Singular and plural forms create redundant headings No guidelines for the use of compound headings, punctuation, word order No scope notes No cross references April 2008 42
Issues with accuracy n n Collective ‘wisdom’ of the tagging community How does wrong information impact retrieval Conflicting cultural norms Sometimes authority counts April 2008 43
“Spagging” and other problems n n n Opening doors to opinion tags Tagging wars “Spagging” Spam tagging April 2008 44
Tidying up the tags…? n n n Lists of tagging norms have been developed Are there programmatic solutions? Users know they are looking at tags By tidying, do we destroy the essence of why this works? Do we realistically have the resources? April 2008 45
Recommendations Don’t assume that one size fits all n n Retain controlled vocabularies in the catalog Explore ways to use controlled vocabularies to help organize the internet by re-purposing controlled vocabularies that already exist Invite Folksonomies to the party in the catalog to gain their benefits Explore ways to combine the two systems April 2008 46
Recommendations When you invite folksonomies into the catalog, do so strategically, and carefully Don’t put terms in the same index as controlled vocabularies n Find ways to associate terms applied across editions of works n Need for mediation, or at least observation n The crowd is not necessarily the best arbiter of specific terminology n April 2008 47
Recommendations Always remember why people tag n n People tag things because they want to find them, not because they want others to find them Be aware that this will impact the quality of the terms, and their frequency April 2008 48
Recommendations Controlled vocabularies could be better utilized than they currently are n n n Subject structures are underutilized in the ILS Controlled vocabularies that exist are not being exported to the Web Well-connected terms foster discovery – let’s connect them. Index those cross references where available April 2008 49
Questions? Margaret Maurer mbmaurer@kent. edu April 2008 50
- Slides: 50