The future of open data portals the EDP
























- Slides: 24
The future of open data portals: the EDP view October 6, 2020 Elena Simperl, King’s College London & Luis-Daniel Ibáñez, University of Southampton @EU_Data. Portal
European Data Portal Technology, resources and support to increase the value of European open government data
Activities Supporting the entire data value chain from publishing to re-use
The future of open data portals: the EDP view https: //bit. ly/3 jsu 1 WK What do we measure? • Indicators and metrics for: context, data, use, impact • Criteria for defining and selecting indicators and metrics • Sources of data and methods Can we measure at scale? • Balance between resources available and depth of insight • The limits of automation within current architectures • Linking data reuse to engagement metrics Who is paying for it? • Business cases and funding models • Cost considerations for portals to inform budget planning What if portals would look completely different? • Making portals more user-centric • From data repositories to data communities
The future of open data portals: the EDP view https: //bit. ly/3 jsu 1 WK What do we measure? • Indicators and metrics for: context, data, use, impact • Criteria for defining and selecting indicators and metrics • Sources of data and methods Can we measure at scale? • Balance between resources available and depth of insight • The limits of automation within current architectures • Linking data reuse to engagement metrics Who is paying for it? • Business cases and funding models • Cost considerations for portals to inform budget planning What if portals would look completely different? • Making portals more user-centric • From data repositories to data communities
Indicators and metrics Our work was informed by the Common Assessment Framework Context Legal Organisational Political Legal Social Economic October 6, 2020 Data Licensing – how open Technical – format, APIs, documentation What data – core data, sectors represented Quality – up to date, complete Use Type of users – researchers, entrepreneurs Purpose – reduce spending, ease congestion Activities – benchmarking, mapping Impact Environmental - reduced pollution Economic - increased jobs, growth Political - reduced corruption, better services Social - greater equality, participation © European Data Portal. All rights reserved | 6
Indicators and metrics We explored four types of methods for defining and assessing indicators and metrics October 6, 2020 Macroeconomic studies Population studies and user surveys Showcases and use cases Digital traces © European Data Portal. All rights reserved | 7
Indicators and metrics We then looked at each method to understand if it produces relevant outcomes October 6, 2020 Method assessment Valid Reliable Description The method is closely correlated with the attribute of interest The method gives consistent results over time and between observers Sensitive The method is sensitive enough to discriminate significant differences in the attribute of interest. Efficient The less time and resource required to use it the better. In some contexts, poor efficiency can lead to poor validity and reliability. Transferable The same method can be used in a variety of different contexts and across cultural and economic variation. Comparable If a method is comparable not only is the method transferable to a wide variety of contexts but the results can be meaningfully compared. Ideally this would result in a universal standard that transcends cultures and applications. © European Data Portal. All rights reserved | 8
Attribute DATA Dimensions Sectors covered Plausible method(s) Valid Not covered in this paper Automated methods Good Reliable Efficient Transferable Comparable Good Very good Good Core data Automated methods Good Very good Good Data quality Automated methods Medium Good Very good Good Poor Good Medium Good Low/medium Good Medium/good Good Very low Good Low Poor Good Poor Very Poor Good Low Medium Good Medium/good Medium Good Medium Good Low to good Medium Depends on category Medium Low/medium Medium Good Medium Good Low to good Population study User survey Type of user Population study User survey Purpose Population study User survey Activities Population study IMPACT Macroeconomic Population study User survey Microeconomic Population study BUSINESS MODEL Various models User survey USE October 6, 2020 © European Data Portal. All rights reserved | 9
Recommendations Indicators and metrics Open data ecosystems should be encouraged to survey their members for open data use and impact. Publishers should aim to engage with data users identified in showcases/use cases to develop quantifiable indicators and metrics. Publishers and portal managers should share lists of metrics they have identified, in order to encourage larger catalogues of metrics and comparability. October 6, 2020 © European Data Portal. All rights reserved | 10
The future of open data portals: the EDP view https: //bit. ly/3 jsu 1 WK What do we measure? • Indicators and metrics for: context, data, use, impact • Criteria for defining and selecting indicators and metrics • Sources of data and methods Can we measure at scale? • Balance between resources available and depth of insight • The limits of automation within current architectures • Linking data reuse to engagement metrics Who is paying for it? • Business cases and funding models • Cost considerations for portals to inform budget planning What if portals would look completely different? • Making portals more user-centric • From data repositories to data communities
Making portals more user-centric An indicators and metrics approach with potential for automation Literature review to develop fivestar schemes to assess the 10 dimensions in a portal Walker & Simperl, 2017
The ten dimensions We used these as a starting point to develop the five-star schemes • Organising for use of the datasets (rather than simply for publication); • Learning from the techniques utilised by recently emerged commercial data marketplaces; promoting use via the sharing of knowledge, co-opting methods common in the open source software community; • Investing in discoverability best practices, borrowing from e-commerce; • Publishing good quality metadata, to enhance reuse; • Adopting standards to ensure interoperability; • Co-locating tools, so that a wider range of users and re-users can be engaged with; • Linking datasets to enhance value; • Being accessible by offering options for big data and more manual processing. Commercial exploitation may require APIs, while individuals may prefer to download (sample) CSVs; • Co-locating documentation, so that users do not need to be domain experts in order to understand the data; • Being measurable, as a way to assess how well they are meeting users’ needs. October 6, 2020 © European Data Portal. All rights reserved | 13
Example: Organise for use Each dataset is accompanied by a comprehensive descriptive record (going beyond a collection of structured metadata) An extract of the data can be previewed (for sense making) The portal provides recommendations for related datasets The portal enables users to review/rate the datasets Keywords from datasets are linked to other published datasets October 6, 2020 © European Data Portal. All rights reserved | 14
Example: Promote for use The portal is connected with social media to create a social distribution channel for open data. The portal provides users with online support for feedback, to request/suggest the publication of new datasets, and when problems arise during use (e. g. contact form, discussion forum, FAQs, helpdesk, search tips, tutorials, demos). The portal provides a way for users to keep informed of updates to the data (e. g. news feed). Datasets are accompanied by links or resources that provide user guidance and support. Examples of reuse (fictitious or real) are provided (e. g. information contributed by other users, last reuse, best reuse, data stories). October 6, 2020 © European Data Portal. All rights reserved | 15
Example: Co-locate documentation Supporting documentation does not exist. Supporting documentation exists, but as a document found separately from the data. Supporting documentation is found at the same time as the data (e. g. the link to the document is next to the link to the data in the search). Supporting documentation can be immediately accessed from within the dataset but it is not context sensitive (e. g. a link to the documentation or text contained within the dataset). Supporting documentation can be immediately accessed from within the dataset and it is context sensitive so that users can immediately access information about a specific item of concern (e. g. a link to a specific point in the documentation or the text contained within the dataset). October 6, 2020 © European Data Portal. All rights reserved | 16
Using the five-star scheme We took ten portals at different open data maturity stages. Our aim is to get a sense of how diverse the space is rather than proposing a ranking. October 6, 2020 © European Data Portal. All rights reserved | 17
Recommendations (1) Making portals more user-centric - general Portals should consider a regular assessment of their portals along the ten dimensions as part of their strategic assessment of what they would like to achieve. Areas with particularly low diversity of results, such as Be Discoverable and Colocate Documentation, should be explored further. Are there technical/social barriers preventing the implementation of improved solutions? October 6, 2020 © European Data Portal. All rights reserved | 18
Recommendations (2) Making portals more user-centric - automation Recent research at the University of Southampton has looked at ways to automatically assess some of the metrics, based on a subset of CKAN-based portals indexed by the EDP. • • Among other things, the analysis showed that current technical realisations of portals do not lend themselves well to a continuous, detailed monitoring of data use. This means less insight into the impact of publishing efforts. In EDP we have also proposed a methodology that • • • develops bottom-up data use indicators targeted to the user group of the portal, based on their real interactions with the datasets which are logged via analytics; links activity metrics to data and data usage characteristics; explores which data qualities lead to more user activities. We’ve tested this methodology on datasets shared via Git. Hub, which offers many more user-centric capabilities that repository software such as CKAN and serves as a model for our work on alterative architectures. October 6, 2020 © European Data Portal. All rights reserved | 19
Alternative futures pilot (1) From data-centric to user-centric • Currently, with one or two notable exceptions, users are not specifically encouraged to engage with data portals in a meaningful way. • To more effectively track use, it is key to develop portals in the direction of more collaborative environments, where users stay engaged with portal and other users rather than extract the data and leave. • • Such an environment can be found in other data ecosystems e. g. data/code sharing platforms like Git. Hub. Increased onsite activity would also mean the effort of finding links and improving data quality would be shared with data users, distributing the maintenance effort among those benefiting from the data.
Alternative futures pilot Software openly available: https: //gitlab. com/european-data-portal/collaborative-space Users join spaces organised around datasets and share tools, develop services and apps, and derive further datasets October 6, 2020 © European Data Portal. All rights reserved | 21
The future of open data portals: the EDP https: //bit. ly/3 jsu 1 WK What do we measure? • Indicators and metrics for: context, data, use, impact • Criteria for defining and selecting indicators and metrics • Sources of data and methods Can we measure at scale? • Balance between resources available and depth of insight • The limits of automation within current architectures • Linking data reuse to engagement metrics Who is paying for it? • Business cases and funding models • Cost considerations for portals to inform budget planning What if portals would look completely different? • Making portals more user-centric • From data repositories to data communities
Any questions? Elena Simperl @esimperl October 6, 2020 Luis-Daniel Ibáñez © European Data Portal. All rights reserved | 24
October 6, 2020 © European Data Portal. All rights reserved | 25