SUPPORTING RESEARCH COMMUNITIES COLLABORATIONS IN ACTION 19 April

SUPPORTING RESEARCH COMMUNITIES: COLLABORATIONS IN ACTION 19 April, 2011, Internet 2 Spring Member Meeting Heather Flanagan, (COmanage) Internet 2 Scott Koranda, (LIGO) University of Wisconsin-Milwaukee Nirav Merchant, (i. Plant) University of Arizona Remco Poortinga – van Wijnen, SURFNet

Heather Flanagan • COmanage Project Coordinator Nirav Merchant • Director for Bio. Computing at the University of Arizona, Faculty advisor and technology strategist for i. Plant Scott Koranda • Senior Scientist - University of Wisconsin-Milwaukee; Lead Architect, LIGO Identity Management effort Remco Poortinga - van Wijnen • (Project) Manager Middleware Services at SURFnet 2 – 3/12/2021, © 2011 Internet 2

Today’s session • • VO trends Collaboration Management Platforms Activities and innovations in the field Q&A 3 – 3/12/2021, © 2011 Internet 2

VO’s • Multi-institutional, usually multi-national collaborations • Frequently centered on unique instruments (e. g. CERN, Sloan), data repositories (e. g. medical records, economic data), etc • Examples: – hard sciences – LIGO, NEON, OOI, i. Plant, GENI – social sciences and humanities - Bamboo, CLARIN • Use standard collaboration tools and domain tools, often in an integrated fashion – SSH to manage an instrument that populated a DB that a web browser accesses 4 – 3/12/2021, © 2011 Internet 2

VO trends • More and more collaboration spaces are relying on external authentication and varying levels of assurance • • Research. gov Pub. Med Educause Yelp • Opening up the virtual boundaries means increased complexity to identity management 5 – 3/12/2021, © 2011 Internet 2

Collaboration Management Platform • Scalable actions expected (or at least hoped for) in a CMP: • Create and delete/archive users, accounts, keys • Group management on an individual and CMP-wide scale • Permit or deny access control to wiki pages, calendars, computing resources, version control systems, domain apps, etc. • Domesticated applications to meet the needs of the VO • Usage reporting • Metering and throttling 6 – 3/12/2021, © 2011 Internet 2

Collaboration Platform Trends Planning for a Standalone platform Embedded in a Portal Single VO a command-line oriented VO with an equal focus on person identity and tool availability a VO with a more appfocused collaboration Multiple VO in the CMP a VO that is acting more as a service provider to various groups than one focused on a single collaboration effort, where absolute control over branding is important a VO that is acting as a service provider to a variety of collaborations that cannot share resources fully, but where the apps and services are still the focus of the collaboration 7 – 3/12/2021, © 2011 Internet 2

http: //www. internet 2. edu/comanage/ • Funded by an NSF-SDCI grant and Internet 2 • Has gone through several iterations as we explore the needs and possibilities in this space • Participating VO include: LIGO, i. Plant, ESWN, Bamboo, GENI • API developed for the platform now in use at LIGO 8 – 3/12/2021, © 2011 Internet 2

SURFconext How to support researchers? • What do we want to achieve? • Use cases • Observations • SURFconext - a collaboration infrastructure • Status • Plans • Challenges 9 – 3/12/2021, © 2011 Internet 2 Remco Poortinga – van Wijnen remco. poortinga@surfnet. nl

What do we want to achieve? • Support researchers in their ‘Virtual Collaboration’ • But what is a Virtual Collaboration? • • Virtual Breeding Environments Virtual Organizations Virtual Communities Virtual Projects Virtual Teams Virtual Laboratories Virtual … 10 – 3/12/2021, © 2011 Internet 2

Back to basics A Virtual Collaboration is a temporary or permanent coalition of geographically dispersed individuals, groups, organisational units or entire organisations that pool resources, capabilities and information in a coordinated way to achieve common objectives, while decisively relying on ICT. With five main characteristics • Lifespan • Coalition • Resources • Coordination • Objectives 11 – 3/12/2021, © 2011 Internet 2

Back to basics 12 – 3/12/2021, © 2011 Internet 2

What not to do? • High level services • • • Project management tools Communication services Shared workspaces Relationship management Brainstorming tools Contracting services 13 – 3/12/2021, © 2011 Internet 2

What to do? • Basic services • • • Life cycle management Delegation and access management Resource management Federation as a service Translation services Digital signatures Report ‘Characterising Virtual Collaborations and their ICT support”, to be published at: http: //www. surfnet. nl/nl/Innovatieprogramma%27 s/gigaport 3/Pages/Resulaten 2011. aspx 14 – 3/12/2021, © 2011 Internet 2

A specific (real) use case From the CLARIN project • CLARIN: Common Language Resources and Technology Infrastructure • CLARIN is an (ESFRI) EU Research Infrastructure project • Additional funding from national governments: national CLARIN projects • Dutch CLARIN project 2009 -2015 awarded 9 M€, ambition to become the hub of CLARIN EU • DE, DK, CZ, F, ES, … and other national CLARIN projects started or starting. • The CLARIN consortium: 36 partners from 26 EU countries and > 180 member organizations • A CLARIN ERIC will be founded begin 2012 and will co-ordinate the national efforts. 15 – 3/12/2021, © 2011 Internet 2

A specific (real) use case • A researcher authenticates at his own organization and creates a “virtual” collection of resources from different repositories. • He does this on the basis of browsing a catalogue, searching through metadata, or searching in resource content. • To be granted access to this distributed dataset he signs the appropriate licenses • He is then able to use a workflow specification tool and process this virtual collection using LT tools in the form of reliable distributed web services which he is authorized to use. • (Intermediate) results are stored in a user specific workspace • After evaluation, the resulting data (including metadata) can be added to a repository and the “virtual” collection specification can be stored for future reference using PIDs. 16 – 3/12/2021, © 2011 Internet 2

A typical (hypothetical) use case • Henk (a biologist) from the University of Monnickendam wants to correlate and match data from his experiments with the data of Karl from Stockholm. Together they research the relationship between swine flu and bird flu. The matched data is used as input for further experiments for which they want to use the unique lab equipment of Ronaldo and his team in Madrid. The equipment they use there can be controlled remotely and store its data in the database of Karl directly. During the research they video conference regularly and work together on a scientific paper. After submitting their paper, they want to make their results and workflow available for peer reviews. 17 – 3/12/2021, © 2011 Internet 2

The changing research landscape • Institutions (and researchers) flock together in interdisciplinary expertise • Collaboration across domains is the rule rather than the exception (across institutions, disciplines, internationally) • Shift from ‘institute/IDP centric’ to ‘person/VO/CO centric’ • People want to be able to use their own favorite tools • Research more data intensive • Collaboration is about groups of people sharing distributed resources (knowledge, systems, services, repositories, instruments) 18 – 3/12/2021, © 2011 Internet 2

Distributed resources • Such as? • • • Generic collaboration tools LMS-es/ELE-s Data repositories Instruments Computing power • Where? • Generic sources/tools are ‘in the cloud’ • Specialist sources/repositories are at the institutions • Within the same ‘collaboration domain’ • e. g. CLARIN 19 – 3/12/2021, © 2011 Internet 2

How (not) to support researchers? • Each research discipline typically has its own tools • Enable researchers to use them • Enough common collaboration tools available • Collaboration is about groups of people • Let people (not institutions) create and collaborate in their own groups • Integration of tools in a collaboration environment • Do not prescribe a model, but make it easy to do • Today’s high-end is tomorrow’s commodity • Provide specific support for ‘high-end’ research to predict ‘common’ needs tomorrow (Bandwidth-on-Demand for example). 20 – 3/12/2021, © 2011 Internet 2

SURFconext • SURFconext is a collaboration infrastructure • With SURFconext, services from different providers can be used with and next to each other, creating new collaboration possibilities • The SURFconext platform combines Federated Identity Management, use of groups across services, and technology from social networks 21 – 3/12/2021, © 2011 Internet 2

SURFconext 1. FEDERATIVE IDM (SAML) 2. GROUPS 3. 'PORTAL' TECHNOLOGY (OPENSOCIAL) 4. COLLABORATION TOOLS (INTERNAL & EXTERNAL) 22 – 3/12/2021, © 2010 Internet 2

SURFconext – guest users • Necessary for collaboration • Is/are not part of the SURFfederatie • International federations • edu. GAIN • REFEDS • Other identity providers 23 – 3/12/2021, © 2011 Internet 2

SURFconext Reference Portal Native Interfaces Institutional Portal intern/external apps “Showcase” Alfresco/Liferay… SURFnet 24 – 3/12/2021, © 2010 Internet 2 Institutions Commercial Vendors … … Drupal Mendeley Web. Ex Confluence Alfresco Liferay Supporting services: • SURFfederatie • SURFteams Resources Stud. Adm. ELOs (Sakai) Etherpad File. Sender SURFmedia Middleware

25 – 3/12/2021, © 2010 Internet 2

26 – 3/12/2021, © 2010 Internet 2

27 – 3/12/2021, © 2010 Internet 2

28 – 3/12/2021, © 2010 Internet 2

SURFconext - status • Currently in ‘live beta’ • Production ready June 2011 • First institutions connected • others in the starting blocks • A number of tools already connected (Life. Ray, Etherpad, Web. Ex) • ‘pipeline’ of new services to be connected (Mendeley, storage, …) 29 – 3/12/2021, © 2011 Internet 2

SURFconext - plans • Extend towards an e-Infrastructure, allowing integrated access to • Network • Storage • Compute • Instruments • Making it easier to use for research workflows • Support for web services on behalf of the user. 30 – 3/12/2021, © 2010 Internet 2

SURFconext – plans & challenges • Provisioning/deprovisioning of services • Proper virtual IDP support • Use of groups across different collaboration infrastructures • License models for services/resources used • Standardization • Work together with projects and research groups (CLARIN, Life. Watch) • • Especially useful for ‘non-domesticated’ services For services unable to deal with WAYF SURFconext is not the only one (yet…) Guest use and international Mostly in context of European GN 3 project How best to support them? With generic infrastructure? 31 – 3/12/2021, © 2011 Internet 2

32 – 3/12/2021, © 2011 Internet 2

LIGO • Where we are starting from • Current development • My. LIGO 2. 5 • LIGO Guest Services • Specific LIGO use cases and requirements 33 – 3/12/2021, © 2011 Internet 2

LIGO Collaboration Management Platform (? ) Today 34 – 3/12/2021, © 2010 Internet 2

35 – 3/12/2021, © 2010 Internet 2

36 – 3/12/2021, © 2010 Internet 2

My. LIGO 2. 0. x Problems: • Conflates identity management and collaboration management • Naïve database schema cannot represent collaboration realities • Missing most of CMP functionality • Brittle code base 37 – 3/12/2021, © 2010 Internet 2

My. LIGO 2. 5+ New development: • Django backend with REST API • Javascript (AJAX) frontend • Id. M (albert. einstein@LIGO. ORG) is one distinct Django app • Leverage COmanage Gears for CMP – Driven by Javascript frontend for single look and feel • LIGO specific functionality built as other Django app – Leverage COmanage Gears and Id. M REST APIs – Authorship eligibility (complex function) one example app 38 – 3/12/2021, © 2010 Internet 2

My. LIGO 3+ The grand future: • LIGO Laboratory is federation of 4 COs – Caltech, MIT, LHO, LLO • LIGO Scientific Collaboration (LSC) – Federates with LIGO Laboratory – Most use credentials from home institution (In. Common) or other • UK, German, Japanese, … – Maintain albert. einstein@LIGO. ORG as “Id. P of last resort” • LSC + LIGO Laboratory federates with In. Common • COmanage Gears for CMP – Enables all management of COs and COUs (more later…) – Bind home creds with @LIGO. ORG creds until future is now • (full support of ECP, Silver LOA, two-factor auth, …) 39 – 3/12/2021, © 2010 Internet 2

LIGO Guest Services • Science drivers need solution “yesterday” – Enable wiki collaboration between LIGO scientists and projects X, Y, Z • We need an “Id. P of last resort” right now – Don’t want to conflate with @LIGO. ORG identities • Solution is LIGO Guest Services – albert. einstein@LIGOGUEST. ORG – COmanage Gears for CMP – Going into “production” ASAP 40 – 3/12/2021, © 2010 Internet 2

41 – 3/12/2021, © 2010 Internet 2

LIGO use cases for COmanage Gears • “Invitation” not sufficient – Require application, conscription, invitation • Grouper as definitive source of memberships – Learned from our mistakes about multiple authoritative sources • Authentication and Authorization – Full support of SAML 2 and Shib for auth – Authz from same infrastructure • Grouper LDAP Id. P attributes SP (for example) 42 – 3/12/2021, © 2010 Internet 2

LIGO use cases for COmanage Gears Need “COUs” • CO = collaborative organization • COU = collaborative organizational units • Flat structure COs not sufficient to represent our collaboration(s) – – – Scott Koranda is member of UW-Milwaukee LIGO group UWM collaborates with other university LIGO groups That group collectively collaborates with LIGO Laboratory is one body but has 4 sites GEO, a UK and German project, collaborates with LIGO Laboratory “LIGO” collaborates with Virgo and LCGT • (for time being we actively manage some identities for Virgo) 43 – 3/12/2021, © 2010 Internet 2

44 – 3/12/2021, © 2010 Internet 2

Consulting for CO/VO Important! • Science VO/COs like LIGO have little Id. M and CMP experience • We are repeating mistakes and ignorant of things campus IT learned years ago • It does not scale well, but “brains on a stick” approach is extremely valuable Source/Credit http: //www. phdcomics. com/comics/archive. php? comicid=1126 - all rights reserved 45 – 3/12/2021, © 2010 Internet 2 – Copyright 2009

46 – 3/12/2021, © 2011 Internet 2

WWW. IPLANTCOLLABORATIVE. ORG 47 – 3/12/2021, © 2011 Internet 2

NSF PSCIC : Goals Create a new type of organization - a cyberinfrastructure collaborative for plant science - enable new conceptual advances through integrative, computational thinking Address grand challenge questions in plant science, the driving force and organizing principles for the PSCIC: Plant Science Cyberinfrastructure Collaborative

Paradigm Shift • Classic paradigm: You produce data, analyze, interpret (end to end) • Conventional paradigm: Consortium/centers produce data and you consume it • New Paradigm: Paradigm Consortium/centers have produced data and creating “cyber infrastructure” to tackle the “grand challenge” 49

The i. Plant Cyberinfrastructure Users Grand Challenge Workflows, i. Plant Interfaces Third Party Tools, i. Plant-built Tools, Community Contributed Tools and Data! i. Plant Discovery Environments Job Submission Workflow Management Service/Data APIs i. RODS, Grid Technologies, Condor, RESTful Services i. Plant Middleware Compute Storage Persistent Virtual Machines Tera. Grid Open Science Grid UA/ASU/TACC Physical Infrastructure Build a CI that’s robust, leverages national infrastructure, and can grow through community contribution!

Aligning the services 51

i. Plant: Primary VO participants • Resource providers: consortiums, infrastructure and services providers • Researchers: Plant Science community • Developers: Bioinformaticians, CISE • Educators: Class room, workshops

i. Plant: VO needs resource providers • Ease of User Management (SSO across providers, ease for bringing in new users) • Ease of group Management (Inherit these across providers) • Data access across providers/sites (more about that in data) • Manage quota/access at various levels (meter and throttle) • Resource and services federation ! • Ability integrate 3 rd party tools

For Researchers • Convenience of single ID/sign-on (hopefully linked to home institution) • Ability to form ad-hoc groups/teams for sharing • Ability to control access with keys/tokens that can be honored across services and providers (including temporal aspects) • Can use web, API and command line apps (that honor VO based credentials) • Keep data in one location, integration with many analysis platforms and providers • Activity Dashboard , Messaging and alerts (consolidated from providers)

For Developers User management out of the box Access to federated resources (storage and compute) Consistent API to compute, storage, analysis Ability to meter and throttle API based access Unified sharing, reporting, dashboards for messaging/alerts • Ease of integration to resources that need significant permissions (running compute data intensive tasks) • Become part of a “market place” model • • •

For Educators • Make all i. Plant resources easy to use in class room settings (Grouper friendly ? ) • Easy to work with adhoc user groups (workshops, tutorial with ease of provisioning/deprovisioning) especially for institutions that cannot support Grouper (community colleges, K-12) • Integration with Learning Management Systems (LMS) • Integration with dashboard for management of self paced tutorials/learning material (interactive documentation, proficiency assessment)

What do we give back ? • Toolkit for providers integrating with i. Plant (we do heavy lifting with In. Commons and others) • Toolkit for developers • SSO features/capabilities for end users • Integrated HPC resources with API (including cloud) • Best practices for community • Few “Domesticated Applications” • Infrastructure that promotes better collaboration

How easy is it ? 58

Choosing a configured machine 59

Access from your desktop ! 60

Questions? 61 – 3/12/2021, © 2011 Internet 2
- Slides: 61