Center for Science of Information Bryn Mawr Center
Center for Science of Information Bryn Mawr Center for Science of Information Howard MIT Princeton Purdue, October 2014 Purdue Stanford Texas A&M UC Berkeley UC San Diego UIUC National Science Foundation/Science & Technology Centers Program 1
Center for Science of Information Outline 1. Science of Information 2. Center Mission & Center Team 3. Research – Structural Information – Learnable Information: Knowledge Extraction – Information Theoretic Approach to Life Sciences Science & Technology Centers Program 2
Center for Science of Information Shannon Legacy The Information Revolution started in 1948, with the publication of: A Mathematical Theory of Communication. The digital age began. Claude Shannon: Shannon information quantifies the extent to which a recipient of data can reduce its statistical uncertainty. “semantic aspects of communication are irrelevant. . . ” Objective: Reproducing reliably data: Fundamental Limits for Storage and Communication. Applications Enabler/Driver: CD, i. Pod, DVD, video games, Internet, Facebook, Wi. Fi, mobile, Google, . . Design Driver: universal data compression, voiceband modems, CDMA, multiantenna, discrete denoising, space-time codes, cryptography, distortion theory approach to Big Data. Science & Technology Centers Program 3
Center for Science of Information What is Science of Information? • Claude Shannon laid the foundation of information theory, demonstrating that problems of data transmission and compression can be precisely modeled formulated, and analyzed. • Science of information builds on Shannon’s principles to address key challenges in understanding information that nowadays is not only communicated but also acquired, . curated, organized, aggregated, managed, processed, suitably abstracted and represented, analyzed, inferred, valued, secured, and used in various scientific, engineering, and socio-economic processes. Science & Technology Centers Program 4
Center for Science of Information Center’s Goals • Extend Information Theory to meet new challenges in biology, economics, data sciences, knowledge extraction, … • Understand new aspects of information (embedded) in structure, time, space, and semantics, and dynamic information, limited resources, complexity, representation-invariant information, and cooperation & dependency. Center’s Theme for the Next Five Years: Data Information Framing the Foundation Knowledge Science & Technology Centers Program 5
Center for Science of Information Outline 1. Science of Information 2. Center Mission & Center Team 3. Research – Structural Information – Learnable Information: Knowledge Extraction Science & Technology Centers Program 6
Center for Science of Information Mission and Center Goals Advance science and technology through a new quantitative understanding of the representation, communication and processing of information in biological, physical, social and engineering systems. Some Specific Center’s Goals: RESEARCH • define core theoretical principles governing transfer of information, • develop metrics and methods for information, • apply to problems in physical and social sciences, and engineering, • offer a venue for multi-disciplinary long-term collaborations, EDUCATION AND DIVERSITY • explore effective ways to educate students, • train the next generation of researchers, • broaden participation of underrepresented groups, KNOWLEDGE TRANSFER • transfer advances in research to industry & broader community. Science & Technology Centers Program 7
Center for Science of Information STC Team Bryn Mawr College: D. Kumar Wojciech Szpankowski, Purdue Howard University: C. Liu, L. Burge MIT: P. Shor (co-PI), M. Sudan Purdue University (lead): W. Szpankowski (PI) Andrea Goldsmith, Stanford Princeton University: S. Verdu (co-PI) Stanford University: A. Goldsmith (co-PI) Texas A&M: P. R. Kumar Peter Shor, MIT University of California, Berkeley: Bin Yu (co-PI) University of California, San Diego: S. Subramaniam UIUC: O. Milenkovic. R. Aguilar, M. Atallah, S. Datta, A. Grama, A. Mathur, J. Neville, D. Ramkrishna, L. Si, V. Rego, A. Qi, M. Ward, D. Xu, C. Liu, L. Burge, S. Aaronson, N. Lynch, R. Rivest, Y. Polyanskiy, W. Bialek, S. Kulkarni, C. Sims, G. Bejerano, T. Cover, A. Ozgur, T. Weissman, V. Anantharam, J. Gallant, T. Courtade, M. Mahoney, D. Tse, T. Coleman, Y. Baryshnikov, M. Raginsky, N. Santhanam. Sergio Verdú, Princeton Bin Yu, U. C. Berkeley Science & Technology Centers Program 8
Center for Science of Information Center Participant Awards • Nobel Prize (Economics): Chris Sims • National Academies (NAS/NAE) – Bialek, Cover, Datta, Lynch, Kumar, Ramkrishna, Rice, Rivest, Shor, Sims, Verdu, Yu. • Turing award winner -- Rivest. • Shannon award winners -- Cover and Verdu. • Nevanlinna Prize (outstanding contributions in Mathematical Aspects of Information Sciences) – Sudan and Shor. • • • Richard W. Hamming Medal – Cover and Verdu. Humboldt Research Award – Szpankowski. Swartz Prize in Neuroscience – W. Bialek. Science & Technology Centers Program 9
Center for Science of Information Mission and Integrated Research CSo. I MISSION : Advance science and technology through a new quantitative understanding of the representation, communication and processing of information in biological, physical, social and engineering systems. RESEARCH MISSION : Create a shared intellectual space, integral to the Center’s activities, providing a collaborative research environment that crosses disciplinary and institutional boundaries. S. Subramaniam A. Grama David Tse T. Weissman S. Kulkarni J. Neville Research Thrusts: 1. Life Sciences 2. Communication 3. Knowledge Management (Data Analysis) Science & Technology Centers Program 10
Center for Science of Information Science & Technology Centers Program 11
Center for Science of Information Education and Diversity Integrate cutting-edge, multidisciplinary research and education efforts across the center to advance the training and diversity of the work force D. Kumar K. Andronicos 1. Summer Schools, Purdue, 2011&2013, Stanford, 2012, UCSD, 2014 2. Center Wide Fellows (Courtade, Ma, Wang, Kamath, Javanmard) M. Ward 3. Information Frontiers Curriculum & Learning HUB (from data to information to knowledge) 4. Intro to Science Information (BMC, Howard, Purdue, GWU) 5. Student Workshop & Student Teams, 2012 - 2014 6. NSF TUES Grant, 2012 -2014 7. CSo. I Curriculum, Module and Courses, 2013 -2014 8. Faculty Training Workshop, 2012 - 2014 9. Channels Scholars Program 10. Supplement REU and Professional Development 11. Recruitment of US Citizens B. Ladd 12. Collaborations w/ Minority Serving Programs (U. Hawaii, UTEP) Science & Technology Centers Program 12
Center for Science of Information Outline 1. Science of Information 2. Center Mission & Center Team 3. Research – Structural Information – Learnable Information: Knowledge Extraction Science & Technology Centers Program 13
Center for Science of Information Challenge: Structural Information Structure: Measures are needed for quantifying information embodied in structures (e. g. , information in material structures, nanostructures, biomolecules, gene regulatory networks, protein networks, social networks, financial transactions). [F. Brooks, JACM, 2003] Szpankowski, Choi, Manger : Information contained in unlabeled graphs & universal graphical compression. Grama & Subramaniam : quantifying role of noise and incomplete data, identifying conserved structures, finding orthologies in biological network reconstruction. Neville: Outlining characteristics (e. g. , weak dependence) sufficient for network models to be well-defined in the limit. Yu & Qi: Finding distributions of latent structures in social networks. Szpankowski, Baryshnikov, & Duda: structure of Markov fields and optimal compression. Science & Technology Centers Program
Center for Science of Information Real Stuff – Biological Networks Science & Technology Centers Program 15
Center for Science of Information Structural Zip (SZIP) Algorithm Science & Technology Centers Program 16
Center for Science of Information Outline 1. Science of Information 2. Center Mission 3. Integrated Research – Structural Information – Learnable Information: Models for Querying System Science & Technology Centers Program 17
Center for Science of Information Challenge: Learnable Information (Big. Data): Data driven science focuses on extracting information from data. How much information can actually be extracted from a given data repository? Information Theory of Big Data? Big data domain exhibits certain properties: Large (peta and exa scale) Noisy (high rate of false positives and negatives) Multiscale (interaction at different levels of abstractions) Dynamic (temporal and spatial changes) Heterogeneous (high variability over space and time Distributed (collected and stored at distributed locations) Elastic (flexibility to data model and clustering capabilities) ``Big data has arrived but big insights have not. . ’’ Complex dependencies (structural, long term) Financial Times, J. Hartford. High dimensional Ad-hoc solutions do not work at scale! Science & Technology Centers Program 18
Center for Science of Information Modern Data Processing Data is often processed for purposes other than reproduction of the original data: (new goal: reliably answer queries rather than reproduce data!) Recommendation systems make suggestions based on prior information: Email on server Processor Distributed Data Prior search history Recommendations are usually lists indexed by likelihood. Databases may be compressed for the purpose of answering queries of the form: “Is there an entry similar to y in the database? ”. Original (large) database is compressed Compressed version can be stored at several locations Queries about original database can be answered reliably from a compressed version of it. Courtade, Weissman, IEEE Trans. Information Theory, 2013. Science & Technology Centers Program 19
- Slides: 19