Taxonomy Development An Infrastructure Model Tom Reamy Chief



































- Slides: 35
Taxonomy Development An Infrastructure Model Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services http: //www. kapsgroup. com
Agenda § Introduction § Type of Taxonomies § The Enterprise Context – Making the Business Case § Infrastructure Model of Taxonomy Development – Taxonomy in 4 Contexts • Content, People, Processes, Technology § Infrastructure Solutions – the Elements § Applying the Model – Practical Dimension – Starting and Resources § Conclusion 2
KAPS Group § § § Knowledge Architecture Professional Services (KAPS) Consulting, strategy recommendations Knowledge architecture audits Partners – Convera, Inxight, FAST, and others Taxonomies: Enterprise, Marketing, Insurance, etc. – Taxonomy customization § Intellectual infrastructure for organizations Knowledge organization, technology, people and processes – Search, content management, portals, collaboration, knowledge management, e-learning, etc. – 3
Two Types of Taxonomies: Browse and Formal Browse Taxonomy – Yahoo 4
Two Types of Taxonomies: Formal 5
Browse Taxonomies: Strengths and Weaknesses § Strengths: Browse is better than search – – Context and discovery Browse by task, type, etc. § Weaknesses: – Mix of organization • Catalogs, alphabetical listings, inventories • Subject matter, functional, publisher, document type – – – Vocabulary and nomenclature Issues Problems with maintenance, new material Poor granularity and little relationship between parts. • Web site unit of organization – No foundation for standards 6
Formal Taxonomies: Strengths and Weaknesses § Strengths: Fixed Resource – little or no maintenance – Communication Platform – share ideas, standards – Infrastructure Resource – • Controlled vocabulary and keywords • More depth, finer granularity § Weaknesses: Difficult to develop and customize – Don’t reflect users’ perspectives – • Users have to adapt to language 7
Facets and Dynamic Classification § Facets are not categories – – Entities or concepts belong to a category Entities have facets § Facets are metadata - properties or attributes – – Entities or concepts fit into one category All entities have all facets – defined by set of values § Facets are orthogonal – mutually exclusive – dimensions – An event is not a person is not a document is not a place. § Facets – variety – of units, of structure – – Date or price – numerical range Location – big to small (partonomy) Winery – alphabetical Hierarchical - taxonomic 8
Faceted Navigation: Strengths and Weaknesses § Strengths: – More intuitive – easy to guess what is behind each door • 20 questions – we know and use – Dynamic selection of categories • Allow multiple perspectives – Trick Users into “using” Advanced Search • wine where color = red, price = x-y, etc. . § Weaknesses: – Difficulty of expressing complex relationships • Simplicity of internal organization – Loss of Browse Context • Difficult to grasp scope and relationships – Limited Domain Applicability – type and size • Entities not concepts, documents, web sites 9
Dynamic Classification / Faceted navigation § Search and browse better than either alone Categorized search – context – Browse as an advanced search – § Dynamic search and browse is best – Can’t predict all the ways people think • Advanced cognitive differences • Panda, Monkey, Banana – Can’t predict all the questions and activities • Intersections of what users are looking for and what documents are often about • China and Biotech • Economics and Regulatory 10
Business Case for Taxonomies: The Right Context § Traditional Metrics Time Savings – 22 minutes per user per day = $1 Mil a Year – Apply to your organization – customer service, content creation, knowledge industry – Cost of not-finding = re-creating content – § Research Advantages of Browsing – Marti Hearst, Chen and Dumais – Nielsen – “Poor classification costs a 10, 000 user organization $10 M each year – about $1, 000 per employee. ” – § Stories – Pain points, success and failure – in your corporate language 11
Business Case for Taxonomies: IDC White Paper § Information Tasks – – – Email – 14. 5 hours a week Create documents – 13. 3 hours a week Search – 9. 5 hours a week Gather information for documents – 8. 3 hours a week Find and organize documents – 6. 8 hours a week § Gartner: “Business spend an estimated $750 Billion annually seeking information necessary to do their job. 30 -40% of a knowledge worker’s time is spent managing documents. ” 12
Business Case for Taxonomies: IDC White Paper § Time Wasted Reformat information - $5. 7 million per 1, 000 per year (400 M) – Not finding information - $5. 3 million per 1, 000 (370 M) – Recreating content - $4. 5 Million per 1, 000 (315 M) – § Small Percent Gain = large savings 1% - $10 million – 5% - $50 million – 10% - $100 million – 13
Business Case for Taxonomies: The Right Context § Justification Search Engine - $500 K-$2 Mil – Content Management - $500 K-$2 Mil – Portal - $500 -$2 Mil – Plus maintenance and employee costs – § Taxonomy Small comparative cost – Needed to get full value from all the above – § ROI – asking the wrong question What is ROI for having an HR department? – What is ROI for organizing your company? – 14
Infrastructure Model of Taxonomy Development Taxonomy in Basic 4 Contexts § Ideas – Content Structure Language and Mind of your organization – Applications - exchange meaning, not data – § People – Company Structure – Communities, Users, Central Team § Activities – Business processes and procedures – Central team - establish standards, facilitate § Technology / Things CMS, Search, portals, taxonomy tools – Applications – BI, CI, Text Mining – 15
Taxonomy in Context Structuring Content § All kinds of content and Content Structures – Structured and unstructured, Internet and desktop § Metadata standards – Dublin core+ Keywords - poor performance – Need controlled vocabulary, taxonomies, semantic network – § Other Metadata – Document Type • Form, policy, how-to, etc. – Audience • Role, function, expertise, information behaviors – Best bets metadata § Facets – entities and ideas – Wine. com 16
Taxonomy in Context: Structuring People § Individual People Tacit knowledge, information behaviors – Advanced personalization – category priority – • Sales – forms ---- New Account Form • Accountant ---- New Accounts ---- Forms § Communities Variety of types – map of formal and informal – Variety of subject matter – vaccines, research, scuba – Variety of communication channels and information behaviors – Community-specific vocabularies, need for inter-community communication (Cortical organization model) – 17
Taxonomy in Context: Structuring Processes and Technology § Technology: infrastructure and applications Enterprise platforms: from creation to retrieval to application – Taxonomy as the computer network – • Applications – integrated meaning, not just data § Creation – content management, innovation, communities of practice (Co. Ps) When, who, how, and how much structure to add – Workflow with meaning, distributed subject matter experts (SMEs) and centralized teams – § Retrieval – standalone and embedded in applications and business processes – Portals, collaboration, text mining, business intelligence, CRM 18
Taxonomy in Context: The Integrating Infrastructure § Starting point: knowledge architecture audit, K-Map – Social network analysis, information behaviors § People – knowledge architecture team Infrastructure activities – taxonomies, analytics, best bets – Facilitation – knowledge transfer, partner with SMEs – § “Taxonomies” of content, people, and activities Dynamic Dimension – complexity not chaos – Analytics based on concepts, information behaviors – § Taxonomy as part of a foundation, not a project – In an Infrastructure Context 19
Taxonomy in Context: The Integrating Infrastructure § Integrated Enterprise requires both an infrastructure team and distributed expertise. – Software and SME’s is not the answer - keywords § Taxonomies not stand alone Metadata, controlled vocabularies, synonyms, etc. – Variety of taxonomies, plus categorization, classification, etc. – • Important to know the differences, when to use which § Multiple Applications – Search, browse, content management, portals, BI & CI, etc. § Infrastructure as Operating System Word vs. Word Perfect – Instead of sharing clipboard, share information and knowledge. – 20
Infrastructure Solutions: The start and foundation Knowledge Architecture Audit § Knowledge Map - Understand what you have, what you are, what you want – The foundation of the foundation § Contextual interviews, content analysis, surveys, focus groups, ethnographic studies § Category modeling – “Intertwingledness” -learning new categories influenced by other, related categories § Natural level categories mapped to communities, activities • Novice prefer higher levels • Balance of informative and distinctiveness § Living, breathing, evolving foundation is the goal 21
Infrastructure Solutions: Resources People and Processes: Roles and Functions § § Knowledge Architect and learning object designers Knowledge engineers and cognitive anthropologists Knowledge facilitators and trainers and librarians Part Time Librarians and information architects – Corporate communication editors and writers – § Partners IT, web developers, applications programmers – Business analysts and project managers – 22
Infrastructure Solutions: Resources People and Processes: Central Team § Central Team supported by software and offering services – – – – Creating, acquiring, evaluating taxonomies, metadata standards, vocabularies Input into technology decisions and design – content management, portals, search Socializing the benefits of metadata, creating a content culture Evaluating metadata quality, facilitating author metadata Analyzing the results of using metadata, how communities are using Research metadata theory, user centric metadata Design content value structure – more nuanced than good / poor content. 23
Infrastructure Solutions: Resources People and Processes: Facilitating Knowledge Transfer § Need for Facilitators Amazon hiring humans to refine recommendations – Google – humans answering queries – § Facilitate projects, KM project teams – Facilitate knowledge capture in meetings, best practices § Answering online questions, facilitating online discussions, networking within a community § Design and run KM forums, education and innovation fairs § Work with content experts to develop training, incorporate intelligence into applications § Support innovation, knowledge creation in communities 24
Infrastructure Solutions: Resources People and Processes: Location of Team § KM/KA Dept. – Cross Organizational, Interdisciplinary § Balance of dedicated and virtual, partners – Library, Training, IT, HR, Corporate Communication § Balance of central and distributed § Industry variation Pharmaceutical – dedicated department, major place in the organization – Insurance – Small central group with partners – Beans – a librarian and part time functions – § Which design – knowledge architecture audit 25
Infrastructure Solutions: Resources Technology § Taxonomy Management – Text and Visualization § Entity and Fact Extraction § Text Mining § Search for professionals – Different needs, different interfaces § Integration Platform technology – Enterprise Content Management 26
Taxonomy Development: Tips and Techniques Stage One – How to Begin § Step One: Strategic Questions – why, what value from the taxonomy, how are you going to use it – Variety of taxonomies – important to know the differences, when to use what. § Step Two: Get a good taxonomist! (or learn) – Library Science+ Cognitive Science + Cognitive Anthropology § Step Three: Software Shopping – Automatic Software – Fun Diversion for a rainy day • Uneven hierarchy, strange node names, weird clusters – Taxonomy Management, Entity Extraction, Visualization § Step Four: Get a good taxonomy! Glossary, Index, Pull from multiple sources – Get a good document collection – 27
Infrastructure Solutions: Taxonomy Development Stage Two: Taxonomy Model § Enterprise Taxonomy No single subject matter taxonomy – Need an ontology of facets or domains – § Standards and Customization Balance of corporate communication and departmental specifics – At what level are differences represented? – Customize pre-defined taxonomy – additional structure, add synonyms and acronyms and vocabulary – § Enterprise Facet Model: Actors, Events, Functions, Locations, Objects, Information Resources – Combine and map to subject domains – 28
Taxonomy Development: Tips and Techniques Stage Three: Development and/or Customization § Combination of top down and bottom up (and Essences) Top: Design an ontology, facet selection – Bottom: Vocabulary extraction – documents, search logs, interview authors and users – Develop essential examples (Prototypes) – • Most Intuitive Level – genus (oak, maple, rabbit) • Quintessential Chair – all the essential characteristics, no more Work toward the prototype and out and up and down – Repeat until dizzy or done – § Map the taxonomy to communities and activities Category differences – Vocabulary differences – 29
Taxonomy Development: Tips and Techniques Stage Four: Evaluate and Refine § Formal Evaluation – – – Quality of corpus – size, homogeneity, representative Breadth of coverage – main ideas, outlier ideas (see next) Structure – balance of depth and width Kill the verbs Evaluate speciation steps – understandable and systematic • Person – Unwelcome person – Unpleasant person - Selfish person Avoid binary levels, duplication of contrasts – Primary and secondary education, public and private – 30
Taxonomy Development: Tips and Techniques Stage Four: Evaluate and Refine § Practical Evaluation Test in real life application – Select representative users and documents – Test node labels with Subject Matter Experts – • Balance of making sense and jargon Test with representative key concepts – Test for un-representative strange little concepts that only mean something to a few people but the people and ideas are key and are normally impossible to find – 31
Sources § Books – Women, Fire, and Dangerous Things • What Categories Reveal about the Mind • George Lakoff – The Geography of Thought • Richard E. Nisbett § Software Convera Retrievalware – Inxight Smart Discovery – entity and fact extraction – § Courses – Convera Taxonomy Certification 32
Conclusion § Taxonomy development is not just a project – It has no beginning and no end § Taxonomy development is not an end in itself – It enables the accomplishment of many ends § Taxonomy development is not just about search or browse – It is about language, cognition, and applied intelligence § Strategic Vision (articulated by K Map) is important – Even for your under the radar vocabulary project § Paying attention to theory is practical – So is adapting your language to business speak 33
Conclusion § Taxonomies are part of your intellectual infrastructure – Roads, transportation systems not cars or types of cars § Taxonomies are part of creating smart organizations – § § Self aware, capable of learning and evolving Think Big, Start Small, Scale Fast If we really are in a knowledge economy We need to pay attention to – Knowledge! 34
Questions? Tom Reamy tomr@kapsgroup. com KAPS Group Knowledge Architecture Professional Services http: //www. kapsgroup. com