Decoupling open data publishing and open data use cases

Dieser Post ist auch auf Deutsch verfügbar.

The open data scene in Germany has a love affair with open data use cases*. When attempting to explain why the pace of open data adoption among governmental entities at all levels of government (national, state and local) remains so slow after 8+ years of open data in Germany (the federal open data portal, Govdata.de, was launched in 2013), open data advocates and governmental officials alike tend to fixate on use cases, or the lack thereof: We need more examples of how open data is used in practice, and then governments will slowly but surely understand why open data is important and why they should be publishing it!

(* Probably this is the case in other countries as well, but I can only speak from experience about the German setting)

The logic isn’t flawed. For many working in government, open data just sounds like a lot of extra work with an unclear payoff. Not only do I have to do my normal work with this data, but I also have to prepare a dataset to be published on some other website because someone somewhere might use the data for something?

In the face of this confusion and potential resistance, presenting open data use cases is an effective way of explaining the reason behind open data: you, government employee, should publish your data, because then an organization like X can take this data and build a product/application/solution like Y, which clearly generates value for the city/for the state/some other defined group – and the government has an interest in generating this kind of value, doesn’t it?

But over the last few years of my open data work, I’ve increasingly observed problems emerging from this use case-heavy narrative around open data. For one, as mentioned, the open government data movement in Germany is at least 8 years old, and there have been countless collections of use cases gathered in that time – it’s unclear to me what exactly even more lists of use cases would reasonably achieve.

Furthermore: in my work I have found that many government employees have internalized the connection between open data and use cases as one justifying the other: the main reason we should publish open data is to enable these use cases.

And of course, this isn’t a connection government employees made up on their own; enabling open data-based innovations and business models is one of the main arguments typically presented to governments when trying to introduce them to and persuade them of the concept of open data (alongside arguments like increasing citizen trust in government through transparency and easier exchange of data within government).

So what’s problematic about justifying open data through use cases? I personally have observed two main issues:

It leads to government employees thinking that only data for which they can imagine a use case is worth publishing

It creates false expectations for publishing open data that can in turn dampen enthusiasm for the concept when those expectations aren’t met.

Which is not to say these are the only problems generated by this heavy emphasis on open data use cases, but they are the two problems I have observed most frequently. Here are both points a bit more fleshed out:

1) Would-be open data publishers inevitably adopt a mindset that only data for which they can conceive of a concrete use case is worthy of being published.

I’ve experienced this countless times in conversations with data owners in Berlin: when trying to identify data they hold that could or should be published as open data, they often will push back on or gloss over certain datasets, because “no one would be interested in that” or “you’d have to be a real expert to use that data for anything”. This judgment inevitably stems from their understanding that open data is about building data-based applications. And if they can’t envision an application of a given dataset, they perceive the data as being unworthy or unessential for publication.*

(* I am aware that an “open by default” approach to open data would theoretically render this point moot, since that would remove individual discretion from the decision of whether certain data should be published or not. But the reality is, the vast majority of governments in Germany are very far away from a true implementation of open by default – both technically and culturally – and that’s not going to change any time soon)

This mindset is problematic, because in my experience data owners in government contexts are very poor judges of how their data could be potentially used by others. They are too used to only using their data for their specific workflows within their specific contexts through the application of their specific methodologies. They tend to lack the experience and broader understanding of the value of data to imagine uses of it in other contexts.

Moreover, due to the open data movement’s heavy emphasis on finding and showcasing particularly accessible use cases of open data in the form of smart phone apps, interactive maps or data visualization projects, etc., data owners in government tend to have a very narrow idea of what open data usage looks like in practice. This leads to further (arbitrary) limits being placed on what is perceived as publishable data: if data owners can’t imagine someone making an interactive visualization or flashy web app from a dataset, they may think that means the data is not useful to be published as open data.

But of course, there are many more ways that open data can be used, for example in data analysis projects where diverse data sources are combined to produce a given analytical output (and thus where the role of open data in generating that output is not necessarily obvious to external observers). Such analyses can be of interest to various sectors: civil society, private sector, academia, etc. Or perhaps the data is needed by another government entity (i.e. another department or agency) for their day-to-day work, such as the composition of a report looking at the status quo of a given issue – maybe the effects of climate change on the city or projected impacts of curent demographic trends.

In summation: relying on open data applications as the primary argument for why open data should be published can cause data owners to assess “publishability” through a highly subjective use case-driven lens: Where they perceive no obvious use case, they see no pressing need to publish.

2) Data publishers – especially first-time publishers – may develop false expectations for what it will be like when they publish their data: namely, they expect their newly published data will quickly be put to use in some sort of application. If/when that doesn’t happen, it can dampen enthusiasm for open data or even lead to a sense that it was a wasted effort.

When you draw a direct line between open data publishing and open data applications, you create an expectation among open data publishers that their published data will definitely be used in some tangible, verifiable way. It’s a question I’ve frequently received in my open data work with government: If we publish this data, how will it be used? Do you already have contact to these potential users? Can we talk to them ahead of time to better understand what they want to do with the data?

And of course, it’s not bad when data publishers wonder about their potential users. I think it’s fantastic when data publishers have an interest in understanding for whom they’re publishing data, because this interest can be a gateway to future conversations about data quality and how to optimize publishing (I would argue incentives for quality improvement are greater when users are perceived as real people with a sincere interest in working with data rather than amorphous figures with unclear intents). But when the generation of open data applications is used as a key argument for why governments need to publish open data, it creates an expectation that these applications naturally follow from published datasets.

In my opinion, the narrative around use cases is at least partially to blame for this expectation, since a major part of that narrative is the idea that there is an Open Data Community™ always waiting in the wings, ready to whip up a flashy app for the next dataset to be published.

The reality is of course very different: not every dataset will be met with immediate interest. Some datasets may never see much use. Indeed, analyses of Berlin’s open data portal have shown that a high number of datasets are never accessed at all. This isn’t a failure: Data that is openly accessible and properly documented with metadata can never be seen as a failure, in my opinion. The point of open data is not that every dataset finds a use; the point of open data is that as much data as possible is made publicly accessible in a structured, organized way, so that data users – whoever they may be – have the opportunity to find the data they need when they need it. But it could be that that need doesn’t manifest itself until months or years later… or maybe it never appears at all. Or, perhaps the data is only used for personal projects that never see the light of day.

When open data publishing is justified primarily through a use case lens but these use cases don’t materialize, it potentially leaves data publishers confused about what the point of it all was. It also increases the likelihood that they won’t prioritize open data publishing in the future: because most governments lack the technical capability to automate data publishing, doing so requires manual effort from government employees. If they perceive use cases to be the main measure of the impact of open data and then see that there were no use cases generated by their published datasets, they may be inclined to invest fewer resources in open data publishing in the future.

In summation: the emphasis on open data use cases creates expectations that open data publishing efforts on the whole can never live up to, leading to disillusionment and disenchantment among data publishers who were fed a different narrative about what would happen after they started publishing open data.

So what do we do about this? Right off the bat, I want to say I’m not against keeping use cases as part of the open data discourse. As previously mentioned, these use cases serve a real purpose in making a potentially abstract concept more concrete for the target group (would-be data publishers). Moreover, regularly highlighting more recent use cases – rather than constantly trotting out the tried-and-true use cases from 3-5 years ago – is important to show that the relevance of open data hasn’t diminished over the years.

But one thing I would argue for is more nuance in what kinds of use cases we highlight. Instead of always presenting an interactive map, or a flashy data viz project, perhaps we can present more stories and narratives describing other kinds of uses of open data, where it’s possible to highlight the impact of linking up open datasets, or otherwise integrating open datasets into broader data analysis projects – i.e., usages that are less obvious or self-explanatory but which still have the potential to generate significant value and/or impact. These kinds of use cases are harder to fit into a classic “open data showcase” where use cases are described and linked, but they have the potential to create just as much social value, and for that reason alone such usages deserve visibility.

And in general: we need to mature past justifying open data through its use cases. It’s a convenient way to get your foot in the door, but for open data to get a sustainable, long-term foothold in governments, we need governments that see the bigger picture. Namely: that the goal is not use cases, but rather, the goal is the systematic (i.e., automated) provision of high-quality, machine-readable data that is accessible to all (so-called FAIR data).

Yes, it is the usage or application of this high-quality data that generates value in the end. But the role of the government is not to presume what data will generate that value, or to pre-identify what that value is. The role of government is to make their data available for everyone to use, period.

Lastly: open data publishing ultimately lives and dies by the technical infrastructure and internal processes that enable it. If it is not possible for governments to automatically publish high quality data at regular intervals, use cases ultimately matter little – they’ll only represent an idealized possibility, a single isolated example of what value could be generated if data were published in greater quantities, better quality and/or in more regular intervals.

Tori BoeckFebruary 15, 2022