Let's apply dumb economics to the use of remote sensing data

Preamble

I am not an economist. I am a engineer and researcher, and like many folks, I have done plenty of time scouring the internet for remote sensing products as material for papers, posters, presentations, etc.

I don't think it's controversial to say that accessing open geospatial data is not easy, even at times for a highly technical person. It is controversial to say, as Dr. Brianna Pagán's did at North51 this year, to say that commercial companies who have contracts with or otherwise are supported by the US Government should provide at least some of their captured value back to the taxpayer in the form of open data.

As I said, I am not an economist. But in this post, I'm going to attempt to use my dumb understanding of some economic concepts to explore how we can make open access to geospatial data better including to that sweet sweet commercial data.

A case study

Let's start by looking at the consumer side, specifically me in graduate school when I was studying lidar and glaciers. I had a dump truck of terrestrial lidar data of the Helheim Glacier in southeast Greenland, and I needed remote sensing products to build visualizations, compare terminus detection algorithms, and all matter of other use-cases. Let's bring this experiment forward today, and pretend that I'm doing this research, now, with the currently existing tooling and data.

I can write software, I know roughly what remote sensing products might be useful to me, and I have a slew of papers that I'm supposed to read to learn more, but let's face it — I'm an engineer, and I just want to build something. So instead of carefully reviewing the literature to see what products people use for what, I turn to the internet (hello chatbots) to help:

A ChatGPT response to a remote sensing question

Hey, that's not too bad! I've got a good sense of the products available — now, how do I get the data? Let's try that first link, the MODIS Data Archive:

MODIS Data Archive

Uhh, ok it says "Data", but that's mostly just text. I scroll down a bit, and find a link that says MODIS cryosphere products. I'm working in the cryosphere, so that seems like a good place to look:

MODIS cryosphere products

🤦

Everyone who works in geospatial knows this story. But what's the economic phenomenon at play here? How did a system, designed (presumably) with good intentions, some thought, and no small amount of money, create something so obviously bad?

Analyzing the problem

I am not an economist, but I am a millennial, which means I'm as likely to turn to Wikipedia for answers as I am ChatGPT. From Economics:

There are a variety of modern definitions of economics; some reflect evolving views of the subject or different views among economists. Scottish philosopher Adam Smith (1776) defined what was then called political economy as "an inquiry into the nature and causes of the wealth of nations"

Okay, that's pretty circular — I'm looking to economics for answers, and it comes back at me with "I'm just asking questions." I keep reading:

Jean-Baptiste Say (1803), distinguishing the subject matter from its public-policy uses, defined it as the science of production, distribution, and consumption of wealth.

That's better — if you substitute "geospatial data" in for "wealth", then you're cooking with gas. Later in the article, I find this gem:

Expositions of economic reasoning often use two-dimensional graphs to illustrate theoretical relationships.

I love graphs. Let's make some. I'd like to explore why the government has made so many websites with lots of text, instead things that give me data. So here's a graph:

Government websites over time

That might be accurate, but it's not very helpful. Why are there so many websites, and why are they increasing over time? Where does the demand come from?

  • The people who get paid make the websites (engineers, designers, managers, CEOs, etc) clearly want more websites, and so increase demand ⬆️
  • The people who spend the money to make the websites (uhh, someone in the government? how does the government work?) want more websites because it looks good on an annual review to say that you made a new thing, and so increase demand ️⬆️
  • That leaves us, the taxpayer, whose money is being spent (me! and you! all of us! especially frustrated grad students!) as the only chance for downward demand pressure ⁉️

I'd argue that that constantly-increasing line in the above chart, the "Not Found" page for MODIS cryosphere data, all of it — they're our fault. If there's one thing "the dismal science" believes it's that people make decisions in their own self-interest, and there's no reason to expect anything different here. The people who get paid to make websites and the people who spend money to make websites want more websites, and that's not "bad" in the moral sense — they're just acting in their own interests. If there's too many government websites, then it's our (the taxpayers') fault for not exerting enough downward pressure on the demand.

To summarize:

PartyEffect on websites
People who get paid to make websiteIncrease demand
People who pay to make websitesIncrease demand
The taxpayerDo better

Working the problem

So how do we exert that downward pressure?

First, we could try to convince the people who make websites to make fewer websites. However, our argument boils down to "make less money", which isn't very good.

How about the people who pay for websites, what are their motivating factors? We turn again to the internet for search and find https://science.nasa.gov/earth/data/, which states:

NASA’s Earth Science Data Systems (ESDS) Program oversees the entire Earth science data life cycle and facilitates unrestricted access to maximize the scientific return from NASA's missions and experiments. [emphasis mine]

Now there's something we can shove into a dumb economics model. Does that broken maze of websites provide unrestricted access? Maybe in a strict sense, but not to the spirit of the mission statement. And it's obvious that we aren't maximizing our scientific return if it's that hard to get data.

This means that its up to us, the taxpayer, to pressure the people who pay for websites to make sure that they're maximizing the scientific return (their words). How do we do that? The first clue comes from the beginning of our case study in the form of that quite useful (it pains me to say1) ChatGPT response to my original question. I, the rushed and ignorant grad student, was doing search and discovery. A wall of text does not enable search and discovery, so any link in the chain of websites that doesn't enable search and discovery is a non-maximizing link.

Let's pull that out as a dumb economic theory, worth highlighting: Every link in a website chain that does not facilitate search and discovery decreases the value of that chain.2

What was the most useful part of our search chain in the case study? ChatGPT. How will ChatGPT and its friends help us actually find the data, instead of only the government websites that have links to the data? By making the data themselves searchable and discoverable in a machine-friendly way. That's where specifications like the SpatioTemporal Asset Catalog (STAC) play a role. If geospatial data are indexed by some sort of crawlable metadata (doesn't have to be STAC) then they can be indexed into search tools and surfaced at the "search and discovery tool" layer, such as a chatbot.

This is where the government has a competitive advantage over the rest of the market. They know best what data products exist and the nuances of those products, so they are best positioned to both organize those data and index them with metadata. So that's a second major advocacy point: The government should leverage its competitive advantages by focusing on data storage and building searchable metadata indexes.

The commercial data provider

The case of the commercial data providers (Planet, Maxar, Umbra, etc) is very obviously different to that of the government. Their mission is to make money, and each of them has chosen one or more variations on a similar theme as their business model: "collect data, and build a blend of government and private customers to sell those data to."

In her keynote at North513, Dr. Brianna Pagán argued that earth observation (EO) data should be considered a "common good" and be made free/accessible to all, regardless of who collects it:

What's a "common good"? Back to Wikipedia!

Common goods (also called common-pool resources) are defined in economics as goods that are rivalrous and non-excludable.

"Rivalrous and non-excludable" sounds like a frat bro trying to get into a party. Anyways, what do those actually mean?

a good is said to be rivalrous or a rival if its consumption by one consumer prevents simultaneous consumption by other consumers

🤔 So is Planet image rivalrous? If the data are open, no, not really — you could argue that heavy usage of an image (i.e. downloading it a lot) excludes others by consuming bandwidth and increasing costs on the provider, but the internet is made to have multiple people using the same asset. Ironically, the only way that a Planet image is rivalrous is if its not free and open — if a three-letter agency can use the image but I can't, then their consumption is preventing my consumption. But I'm not quite sure that's actually rivalrous, so we'll give that one a "probably not."

How about "non-excludable"?

Excludability is the degree to which a good, service or resource can be limited to only paying customers, or conversely, the degree to which a supplier, producer or other managing body (e.g. a government) can prevent consumption of a good.

Ok, that's what we wanted! Planet data are for sure excludable, but probably not rivalrous. That means that Planet data is not a common good by both of the core components of the definition. What else do we have?

ExcludableNon-excludable
RivalrousPrivateCommon
Non-rivalrousClubPublic

When Planet keeps its imagery private but available for purchase, that looks like a "Club" — they can sell the same image to multiple folks without their use impacting each others'. For the sake of this dumb-yet-pedantic economics post, I'm going to tweak Dr. Pagán's argument to be to consider EO data a "public" good. That makes it a bit simpler — we just need to transition that Planet image from excludable to non-excludable.

Public goods in the internet age

In Nadia Eghbal's excellent book, Working in Public: The Making and Maintenance of Open Source Software, the author discusses how open source software functions a bit like a public good, albeit one that has little government influence. EO data is similar but different. Like open source software, distributing as single piece EO data is (relatively) inexpensive — the costs to store and serve a single image are negligibly small. However, the amount of EO data is constantly growing and doing so at a scale that quickly moves bandwidth and storage costs from "doesn't matter" to "hooo boy". This implies that, if EO data is to be a public good, the government may have a role in distributing those data.

It's obvious that it isn't appropriate to think of commercial EO data a public good all the time — if it were, the government would own all the satellies. There must be some point at which those data should transition from "club" to "public". To explore when that point is, let's look at the value of an EO image to a couple different use groups:

The value of an image over time

  • Three letter government agencies (CIA, NGA, etc) and their military counterparts place a huge value on the timeliness of data. The most important decisions they need to make are time-critical, and so the value of a given picture drops precipitously after it's created. It doesn't go to zero because there's still back research that needs to be done.
  • For a researcher (e.g. a biologist or an earth scientist) the value (someone counter-intuitively) goes up over time, though by no means to the levels that the three letter agencies were at. This is because (generally) research needs multiple coincident data sources, and the longer the data has been around, the more other data sources come online. Also, very little research is "real time" — usually you're studying something that happened months or years ago, not yesterday.
  • The general public (i.e. most taxpayers) don't really have much interest in these pictures other than looking at them.

So if a commercial company is trying to extract the maximum value from a given picture, they're going to want to sell it quickly. And this is in fact what they're doing, so much so that some companies don't see economic value in charging for their archive at all:

This obviously isn't a universal belief, as shown by the fact that none the archives of any of the big commercial collectors (inluding Umbra, Joe's employer) are totally open. Here's some text from Umbra's open data page on AWS:

The Open Data Program (ODP) features over twenty diverse time-series locations that are updated frequently, allowing users to experiment with SAR's capabilities.

It's not the complete archive, but it's better than nothing I guess.

Working the (commercial) problem

So how do we bridge the gap between the commercial company's incentive to capture as much value as possible (by selling their data quickly then mostly forgetting about it) and Dr. Pagán's vision of earth observation data as a common good? As argued above, the government probably has a role to play, since hosting and serving those data is expensive. How do we convince the government to get involved? Let's look at things from a scientist's perspective:

Coincident, discoverable, and accessible

The more coincident, discoverable, and accessible (CDA)4 datasets that are available, the more things you can do — sometimes, a new dataset will spark a new idea and a whole new area of research. The government has a lot of datasets, and the commercial providers have more5, meaning that in order to maximize the scientific return, the government should spend time and resources to make more datasets available from the commercial (and other-governmental) archive.

This doesn't have to be overly complex. With the rise of cloud-native geospatial, we have more formats and tools to store, index, search, and use data without having to stand up and maintain servers:

Cloud-native geospatial

The sell to commercial data producers might not be too hard — they aren't deriving much (if any) value from their archive, and the goodwill and exposure they receive from making their archive public (with government backing) should outweigh any un-compensated costs. And if they don't want to willingly, maybe the government can write in a "after X amount of time these data must be made public via Y means" clauses to their agreements with the commercials? A person can dream.

Easy enough!

Main points

  • Every link in a website chain that does not facilitate search and discovery decreases the value of that chain, so governments need to ensure any systems they build have the minimum amount of low-value links.
  • Governments should leverage their competitive advantage to create machine-readable metadata (indexes) for their data to enable search and discovery by external tools.
  • We (taxpayers) should advocate for government support for hosting and serving cloud-native archives of commercial EO data.

This post is a companion to an identically-named lightning talk I'll be giving at SatCamp 2024.

1

I am a dyed-in-the-wool old-man-who-yells-at-the-AI-industry

2

That's not to say that there isn't a place for long "about" pages, or deep dives into specific questions. But these pages shouldn't be the first thing you see, they should be (wait for it) discoverable.

3

Her keynote was titled "From Competition to Connection: Geospatial Technologies and the Path to Climate Justice"

4

I just googled this term and nothing obvious popped up, so this might be made up? But I like it, I think.

5

So do other governments, e.g. the European Space Agency