COMS E 6125 Weben Hanced Information Management WHIM
COMS E 6125 Web-en. Hanced Information Management (WHIM) Prof. Gail Kaiser Spring 2012 28 February 2012 Kaiser: COMS E 6125 1
Today’s Topics: • What is Web 2. 0? • Information Sharing and Privacy • Applications Beyond the Web 28 February 2012 Kaiser: COMS E 6125 2
Tim O’Reilly, September 2005 3
Netscape vs. Google: The Web As Platform • Netscape: free web browser as flagship to establish market for high-priced server products that push content to the “webtop” – but servers also turned out to be commodities • Google: Native web application, never sold or packaged or ported, delivered as a service with no scheduled software releases, massively scalable - core competency is data management 28 February 2012 Kaiser: COMS E 6125 4
Akamai vs. Bit. Torrent: Internet Decentralization • Akamai: Treats network as platform at deeper level of stack, transparent caching and content delivery that eases bandwidth congestion – also limited by business model catering to large providers • Bit. Torrent: P 2 P file fragment downloads, every client is also a server, the service automatically gets better the more people use it - architecture of participation 28 February 2012 Kaiser: COMS E 6125 5
Harness Collective Intelligence • Google Page. Rank using link structure • e. Bay enabler of user activity requiring critical mass • Amazon uses community activity to produce better search results (e. g. , realtime “most popular” computation) • Wikipedia – radical experiment in trust, profound change in content creation 28 February 2012 Kaiser: COMS E 6125 6
Harness Collective Intelligence • Web of connections grows organically • Viral marketing – if a site or product relies on advertising to get the word out, it isn’t Web 2. 0 • Peer-production open source development of much web infrastructure – linux, apache, mysql, perl, php, python • Network effects from user contributions are the key to market dominance 28 February 2012 Kaiser: COMS E 6125 7
Blogosphere • Blogging vs. personal home pages, replaced personal dairy, daily opinion column, Usenet News, now being supplanted by facebook and twitter • RSS (Really Simple Syndication) allows subscribing to a page – the incremental (or live) web • Permalink builds bridges between weblogs, effects Page. Rank search results 28 February 2012 Kaiser: COMS E 6125 8
Perpetual Beta • Software delivered as a service, not a product • Upgrades every day vs. every 2 -3 years • Operations and monitoring must become core competencies • Scripting languages as duct tape • Innovation in assembly 28 February 2012 Kaiser: COMS E 6125 9
AJAX Rich User Experiences • Standards-based presentation using XHTML and CSS • Dynamic display and interaction using the Document Object Model • Data Interchange and manipulation using XML and XSLT • Asynchronous data retrieval using XMLHttp. Request • Javascript binding everything together • Without plugins! 28 February 2012 Kaiser: COMS E 6125 10
Infoware • Data management as core competency • Web crawlers vs. specialized databases (“invisible web”) • Map databases: starting with Mapquest, many services now license the same data from Nav. Teq (digital street maps) and Digital Globe (satellite images) • Amazon licensed ISBN registry from Bowker but added publisher-supplied data and user annotations • Mashups based on lightweight programming model create value-added data Ø Key issue: Who owns the data? 28 February 2012 Kaiser: COMS E 6125 11
Information Sharing: Web 1. 0 • The original purpose of the Web! • Generally viewed as an information resource, download without upload • Websites owned by “someone else” may store your information in a database – usually limited to basic identification (name, address, phone number, credit card) and “preferences” • Personal websites (e. g. , hosted by geocities) might be universally browse-able but visited by few • Key issue: Who owns the data? 28 February 2012 Kaiser: COMS E 6125 12
Information Sharing: Web 2. 0 • Message boards with user-supplied content • Portals with user-selected content “portlets” • Blogs, wikis, news feeds, texting • Social networking, collaborative filtering • The Web as Platform, user-supplied applications Ø Key issue: Who owns the data? 28 February 2012 Kaiser: COMS E 6125 13
The Right To Privacy • Secrecy (confidentiality): The extent to which we are known to others • Anonymity: The extent to which we are the subject of others’ attention • Solitude: The extent to which others have access to us 28 February 2012 Kaiser: COMS E 6125 14
Rights to Sue (wrt Privacy) • Intrusion upon seclusion or solitude, or into private affairs • Public disclosure of embarrassing private facts • Inaccurate reporting: Publicity that places a person in a false light in the public eye • Appropriation of identity: “identity theft” 28 February 2012 Kaiser: COMS E 6125 15
A New Yorker cartoon from 1993 28 February 2012 Kaiser: COMS E 6125 16
But in 2012, your browser (and its addons, plugins, etc. ) know • You’ve searched for local veterinarians and groomers • You’ve read reviews comparing flea powders • You’ve ordered “chew sticks” and “squeaky toys” • You’ve printed coupons for Alpo • You’ve downloaded 101 Dalmations and Lassie “on demand” movies • Your email contains sales notices from petco. com v Your “My Pictures” folder contains 100 s of images of fire hydrants and frisbees 28 February 2012 Kaiser: COMS E 6125 17
28 February 2012 Kaiser: COMS E 6125 18
Web Tracking • Bits: How Do They Track You? • Data collection events: – – Pages displayed Search queries entered Videos played Advertising displayed (both same party and third party) • In December 2007 alone, yahoo collected 400 billion events, aol 100 billion, google 91 billion, microsoft 51 billion 28 February 2012 Kaiser: COMS E 6125 19
From study by com. Score published in NY Times online 3/9/08 28 February 2012 Kaiser: COMS E 6125 20
Caveats • Not all of this data is useful • Not all of it is retained by the companies with access to it • Much of it cannot be traced back to individuals • Several data collection events may be triggered by a single Web page • Augmented by user-volunteered data (website registration, public profiles, “like” buttons) 28 February 2012 Kaiser: COMS E 6125 21
Fighting Back? • Targeted advertising supports “free” services and content (ad serving was the first widely deployed mashup) • Partially combated by blocking (e. g. , TACO) and transparency (e. g. , Open Data Partnership) • But collected information can be used for other purposes… • Need a general-purpose “No track” button 28 February 2012 Kaiser: COMS E 6125 22
Privacy Before and After • Before the Web, you participated in a variety of activities • These might have involved groups of people, in public or private, possibly even “the press” • Photos or recordings might have been taken, with or without your knowledge • You might have borrowed or purchased books or magazines related to your activities • You might have sent/received letters by snailmail • What is different now? Does it matter? 28 February 2012 Kaiser: COMS E 6125 23
Privacy Before and After • Before the Web, you might have typed your name, address, phone number, birth date, social security number, bank account numbers, credit card numbers, etc. into your PC for personal storage • It was unlikely anyone outside your household could access your PC • Now you type at least part of that information into your PC all the time (if you make online purchases and/or sign up for online services) • And you have no idea who might be reading them, from either your PC (if connected to Internet) or from the Websites you sent them to 28 February 2012 Kaiser: COMS E 6125 24
Privacy Before and After • Your name, phone number, address were always easily available (phone book, reverse listings) • So was your birth date, although harder to obtain (birth records, drivers license) • And your SSN - lots of forms ask for it • Your checking account and/or credit card numbers were available through the issuing banks and the merchants where you made purchases • So what is different now? Does it matter? 28 February 2012 Kaiser: COMS E 6125 25
Web 2. 0 Applications for Scientific Communities • Scientists collaborating together in the same lab on the same project share: – – – Ø Data: specimens, samples, materials, observations, etc. Tools: instruments, software, hardware Knowledge: open discussion, whiteboard Real-world social networking • However, there are time and space constraints • More significantly, this model does not scale well to communities of scientists working on different projects but who could possibly learn from each other’s expertise, experience, etc. 28 February 2012 Kaiser: COMS E 6125 26
CSCW Approaches • CSCW (Computer-Supported Collaborative Work) aims to augment same-time/same-place collaboration but more significantly differenttime/different-place collaborations and communities • Current generation CSCW systems support data sharing (e. g. , PNNL Collaboratories) and/or tool sharing (e. g. , UIUC Bio. Co. RE) • However, these systems do not address knowledge sharing Ø how/when/where/why to use tools and data 28 February 2012 Kaiser: COMS E 6125 27
Knowledge Sharing • Knowledge sharing is partially enabled through labor intensive static approaches: publications, email lists, wikis, chat, shared display, etc. • We seek to enable automatic knowledge sharing - without requiring “extra work” on the part of scientists 28 February 2012 Kaiser: COMS E 6125 28
Social Networking Metaphor • Some online social networking is a form of CSCW that is potentially enjoyable and profitable but still requires “extra work”, with dynamism limited by explicit user participation – Facebook, Linked. In, Twitter, etc. • Other social networking automatically records what people do online to aggregate, data mine and disseminate in an enjoyable and profitable fashion, with no “extra work” required - but can be enhanced by very simple user actions (e. g. , ratings) – Collaborative filtering – “people like you …” 28 February 2012 Kaiser: COMS E 6125 29
gen. Space Overview • We combine implicit and explicit social networking concepts in our approach to knowledge sharing • Prototype implemented as a set of plugins for ge. Workbench, a platform for analysis and visualization tools for integrated genomics • Records, aggregates, data mines and disseminates ge. Workbench users’ activities with tools and tool sequences (workflows) 28 February 2012 Kaiser: COMS E 6125 30
Questions gen. Space Can Answer • What do I do next? • Which tools work well together? • Where does this tool fit in a typical workflow? • Who do I know who also uses this tool? • How can I get help (from an expert who is online right now)? 28 February 2012 Kaiser: COMS E 6125 31
gen. Space Features • Collaborative Workflow Composition: past history of analysis tool usage is used to identify commonly -occurring sequences/workflows • Tool Suggestions: suggests analysis tools that may be useful, based on what tools were previously used • Social Networking: allows users to associate with each other and share knowledge within groups • Data Suggestions: suggest data sets based upon previous analyses and CF 28 February 2012 Kaiser: COMS E 6125 32
gen. Space Architecture 28 February 2012 Kaiser: COMS E 6125 33
Privacy/Confidentiality Concerns • Users can choose anonymous logging or disable it entirely • Security/privacy of the activity logs is being investigated (data sets are NOT recorded*) • Issues when users change their collaborative networks and/or opt out preferences • Must we provide privacy by default? 28 February 2012 Kaiser: COMS E 6125 34
Research in the Cloud • ge. Workbench, most other analysis tools are “fat” desktop applications • Why not create a browser-based client? 28 February 2012 Kaiser: COMS E 6125 35
More open questions for gen. Space • What other Web 2. 0 concepts and techniques can help support scientific researchers? • How can we efficiently address privacy concerns while providing helpful recommendations? 28 February 2012 Kaiser: COMS E 6125 36
gen. Space Summary • gen. Space embodies an approach to knowledge sharing that is based on social networking metaphors • gen. Space is built on the ge. Workbench platform for integrated genomics • Potentially applicable to other kinds of scientists and engineers, including software engineers 28 February 2012 Kaiser: COMS E 6125 37
Web 2. 0 Summary • It’s here and everywhere, privacy/anonymity are losing ground • Web-Oriented Architecture (Web Services, RSS, Mashups) • Rich Internet Applications (AJAX, HTML 5, Flash) • Social Web (Facebook, Google+, Linked. In, user participation in shopping/renting as well as review sites) 28 February 2012 Kaiser: COMS E 6125 38
Web 3. 0 28 February 2012 Kaiser: COMS E 6125 39
Next Assignment #1: Presentation Proposal • • • Due Tuesday March 6 th, 10 am Title and a brief 1 -2 paragraph description of the planned content Presentation slots on the course schedule will be assigned asap after proposals are received (specify any scheduling constraints) Each presentation should be about 10 minutes and should consist of approximately 10 slides The target audience is the students in this class: do not assume any specialized knowledge beyond the scope of the initial course lectures but also do not duplicate any material covered in lectures (except a one -slide “review” is ok) 28 February 2012 Kaiser: COMS E 6125 40
Next Assignment #2: Project Proposal • • Due Tuesday March 6 th, 10 am Three pages, not including figures and references (if any) Identify your full team (if any), with “management structure” Sketch the project you have in mind, including both the functionality or evaluation you aim to achieve and the technology you plan to use • You should plan to do some programming and to produce some demoable software 28 February 2012 Kaiser: COMS E 6125 41
COMS E 6125 Web-en. Hanced Information Management (WHIM) Prof. Gail Kaiser Spring 2012 28 February 2012 Kaiser: COMS E 6125 42
- Slides: 42