Ocean Datatokens: From Money Legos to Data Legos | by Trent McConaghy | Sep, 2020 | Ocean Protocol


Decentralized Finance (DeFi) tools are referred to as “money legos” for their composability. Ocean Protocol datatokens (now in beta) allow these DeFi tools to also manipulate data infrastructure, thereby unlocking “data legos”.

datatokens * money legos = data legos

The rest of this post is organized as follows.

  • First, I draw on history to build up intuition on the concepts of protocols and of repurposing.
  • I then describe datatokens in detail. Datatokens act as an overlay protocol to more easily combine pieces of data infrastructure.
  • Then, I describe how we can repurpose DeFi infrastructure with datatokens to immediately enable data wallets, data exchanges, data provenance, data DAOs, and other tools for a Web3 Data Economy.
  • I describe how data grows the DeFi pie as a new asset class
  • Finally, I describe how new data assets can optimize returns in DeFi.

A Physical Protocol

“Are U.S. Railroad Gauges [track widths] Based on Roman Chariots?”

This is the question posed by a 2001 Snopes article, titled the same. While this sounds apocryphal, it’s likely true! Train tracks follow a standard width (4 feet, 8.5 inches) so that trains can easily connect to each other. Working backwards:

  • The first railroad builders kept the same width that they’d used in building pre-railroad tramways before that.
  • The tramway builders used the same width as wagons before that, so that they could leverage wagon tooling.
  • Wagon builders followed a standard width because inconsistent spacing would break wheels in roads with deep ruts.
  • The era of rutted roads goes back to the Romans and their chariots.

So yeah, width of Roman chariots → width of railroad tracks. It sounds a bit crazy. But it helped, at every step of the way: it helped Roman chariots, it helped wagons, it helped trams, it helped trains. Without it, people would have been stuck in ditches with broken wheels.

This highlights the incredible usefulness and staying power of standards. In blockchain land, we use the label “protocols” — agreed-upon formats and order of messages, exchanged among computers.

A Physical *Overlay* Protocol

Even with standardized track width, logistics still held plenty of other challenges. Here’s one of the biggies.

The image shows how ships used to get loaded: one sack at a time. A worker would take a sack from the truck, hoist it on their back, walk up the gangplank, down flights of stairs, and drop the sack in the ship’s hold. This would repeat for each of the thousands of sacks in the ship. It would happen at each step of the journey: train, truck, ship, and so on. It wasn’t always sacks; it could also be barrels, boxes, cages, and more. But it was all back-breaking, tedious, and slow. And it was expensive: shipping costs could easily be 20% or 50% of overall product costs [ref].

This is simply “how things were done” for centuries. People mostly didn’t even conceive that there could be a better way. But along came an idea so good that it seems obvious in retrospect:

[In] April 26, 1956, … American trucking entrepreneur McLean put 58 … containers aboard a refitted tanker ship, the SS Ideal X, and sailed them from Newark, New Jersey to Houston, Texas. … McLean had the idea of using large containers that never opened in transit and that were transferable on an intermodal basis, among trucks, ships, and railroad cars. -Wikipedia

A shipping container is a protocol that provides standards for width, height and depth; a minimum strength; a maximum weight; a standard way to open/close the container; and standard interfaces to trains, trucks, ships, cranes, and other shipping containers. It’s an API for logistics.

The shipping container revolutionized shipping. It made shipping easier, more reliable, and cheaper. It catalyzed global trade.

I see shipping containers not as just any protocol, but an overlay protocol. They wrap existing infrastructure (trains, tracks, ships, shipping routes, etc.) with a standardized interface that enables the infrastructure blocks to combine together more easily. Shipping containers turn the logistics infrastructure into “logistics Lego blocks”.

On “Repurposing”

In the 1940s, cathode-ray tubes (CRTs) were used as displays in airborne, ship-borne, and land-based radars. William Higinbotham of MIT Radiation Laboratory researched CRTs.

[He] worked on the Eagle radar display system, which showed the radar returns of ground targets as seen from a high-flying B-28 airplane. The picture of the target area stood still on the display, in spite of the yaw, pitch, or roll of the aircraft while maneuvering toward the target. -Brookhaven National Laboratory [alt link]

Here’s the sort of CRT that he worked with for those radar applications.

1940s-era CRT display, developed for radar applications [link]

On a whim, Higinbotham thought “it might liven up the place to have a game that people could play.” So he built “Tennis for Two” — the world’s first video game. Hundreds of visitors lined up to play. It was powered by an analog computer and a CRT. The CRT had been developed and tuned for radars, but no matter! Higinbotham didn’t have to design and build the CRT. Rather, he repurposed the CRT to power the world’s first video game.

I love this story, because Higinbotham took something that was meant to be serious and, with a wink and a smile, creatively repurposed it for something fun. (And sparked the video game era, to the future delight of millions.)

“Repurposing is the use of a tool being re-channeled into being another tool, usually for a purpose unintended by the original tool-maker.” -Wikipedia

Repurposing has a long history, from Duchamp appropriating a urinal as “art”, to the MIT Tech Model Railroad Club (and generations of “hackers” that followed), to Aspirin repositioning to help lower risk of heart attacks.

This brings us to: can overlay protocols or repurposing be used to solve modern challenges? Yes, absolutely. I will explain how.

The Data Problem

Data has risen in prominence in the last decade. Alas, because how things have been configured, data-related pain has also risen. Individuals are getting mined by Facebook and Google, AI researchers are struggling to have enough quality data to compete, enterprises are getting hacked in Equifax-scale events, and nations are struggling to retain digital sovereignty.

Fundamentally, the goal is to achieve data sovereignty (self-governance) at the level of the individual, and in ever-larger groups as well: the family, the company, the city-state, the nation, and the region. In this era, data sovereignty is a prerequisite to overall sovereignty.

Data sovereignty is a prerequisite to overall sovereignty, for individuals and nations and everything in between. [Left image source][Right image source]

Many have described what they believe is needed to address this. Typically, the foundation is a means for data exchange (I agree). On top of that, writers have outlined the need for secure data custody / data management, data marketplaces, data provenance / data audit trails, collective bargaining around data, and more. They’ve underscored the need to retain privacy, while balancing that with the ability to unlock value from private data (no easy feat, but possible). They’ve acknowledged that it’s not just about open-source sharing of data, there needs to be a financial element, a data economy (an open one).

Done well, we not only address the problems give above, but unlock new opportunities for growth and prosperity in such an open data economy. To equalize the opportunities for all humans in this era of data and AI.

Ocean Protocol 2016 → Now, and Now → Future

In creating the Ocean Protocol project, we took these challenges and opportunities to heart. In 2016, we outlined the needs from a big data and an AI perspective. In 2017, we created an initial design and raised initial capital. In 2018, 2019 and 2020 we built in earnest, shipping a version 1 then a version 2.

We’re proud of the progress we’ve made. However, we’ve also come to understand the sheer amount of software that needs to get built to kickstart a data economy: applications for (a) secure data custody / management, (b) data marketplaces, (c) data provenance / data audit trails, (d) collective bargaining around data (data co-ops, data unions), and more. Each of these apps is at least one software product. That’s a huge amount of development work. This is on top of building the foundational data exchange infrastructure — a big effort on its own.

So we asked ourselves: can we get sneaky? To elaborate, can we revise Ocean Protocol architecture unlock blockchain infrastructure for the apps (a)(b)(c)(d) etc above? The answer turned out to be yes. It brought the added benefit of simplifying Ocean codebase while retaining existing functionality. Here’s the recipe:

  1. Turn existing data services into ERC20 datatokens, i.e. data assets. That is, datatokens act as an overlay protocol. Ocean datatokens are the shipping container for data services.
  2. Repurpose DeFi tools for use on those new data assets, for out-of-the-box apps implementing (a)(b)(c)(d) etc. Metamask becomes a data wallet, Balancer a data exchange, and so on. CRTs got repurposed from radar → video games; Ocean repurposes DeFi tools from money economy→data economy.

The next two sections elaborate on each of these in turn, followed by how data can grow the DeFi pie and optimize DeFi returns.

An Overlay Protocol For Data Services

Datatokens are ERC20 Access Tokens

Traditional access tokens exist, such as OAuth 2.0. If you present the token, you can get access to the service. However, these aren’t the “tokens” we think of in the blockchain space. The “tokens” here are simply a string of characters, and “transfer” is basically copying and pasting that string. This means they can easily be “double-spent”: if one person gets access, they can share that access with innumerable others, even if that access was only meant for them.

How do we address the double-spend problem? This is where blockchain technology comes in. In short, there’s a single shared global database that keeps track of who owns what, and can then easily prevent people from spending the same token twice. Note [1] elaborates on how blockchains do this.

ERC20 was developed as a standard for blockchain token ownership actions. It’s been adopted widely in Ethereum and beyond. Its focus is fungible tokens, where tokens are fully interchangeable.

We can connect the idea of access with the ERC20 token standard. Specifically, consider an ERC20 token where you can access the dataset if you hold 1.0 tokens. To access the dataset, you send 1.0 datatokens to the data provider. You have custody of the data if you have at least 1.0 tokens. To give access to someone else, send them 1.0 datatokens. That’s it! But now, the double-spend problem is solved for “access control”, and by following a standard, there’s a whole ecosystem around it to support that standard.

Datatokens are ERC20 tokens to access data services [2]. Each data service gets its own datatoken.

Datatokens and Rights

Holding a datatoken implies the right to access the data. We can formalize this right: the datatoken would typically automatically have a license to use that data. Specifically: the data would be copyrighted (a form of intellectual property, or IP), as a manifestation of bits on a physical storage device. The license is a contract to use the IP in that specific manifestation. In most jurisdictions, copyright happens automatically on creation of the IP. Alternatively, encrypted data or data behind a firewall can be considered as a trade secret.

“Ownership” is a bundle of rights. “Owning” a token means you hold the private key to a token, which gives you the right to transfer that token to others. Andreas Antonopoulos has a saying: “Your keys, your Bitcoin. Not your keys, not your Bitcoin”. That is, to truly own your Bitcoin, you need to have the keys to it. This crosses over to data:

“Your keys, your data. Not your keys, not your data”.

That is, to truly own your data, you need to have the keys to it.

Mental Model

Ocean Protocol datatokens are the interface to connect data assets with DeFi tools. Ocean is an on-ramp for data into ERC20 datatoken data assets on Ethereum, and an off-ramp to consume data assets. In between are any ERC20-based applications. The image below illustrates (repeated from the beginning of this post, for convenience).

Ocean datatokens mental model, repeated from top. [Note: showing a logo does not imply a partnership]

Relation to Oracles

Oracles like Chainlink and Band help get data itself on-chain. Ocean is complementary, providing tools to on-ramp and off-ramp data assets. The data itself does not need to be on-chain, which allows wider opportunity for leveraging data in DeFi.

Datatoken Variants

There are many possible variants of datatokens. At the smart contract level, datatokens don’t differ. Variants emerge in the semantic interpretation by libraries run by the data provider, one level up. Here are some variants:

  • Access could be perpetual (access as many times as you like), time-bound (e.g. access for just one day, or within specific date range), or one-time (after you access, the token is burned).
  • Data access is always treated as a data service. This could be a service to access a static dataset (e.g. a single file), a dynamic dataset (stream), or for a compute service (e.g. “bring compute to the data”).
  • Read vs write etc access. This paper focuses on “read” access permissions. But there are variants: Unix-style (read, write, execute; for individual, group, all); database-style (CRUD: create, read, update, delete), or blockchain database-style (CRAB: create, read, append, burn).

Repurposing DeFi for the Data Economy, Via Datatokens

The DeFi tools space has been exploding, and maturing. Ocean V3 unabashedly “appropriates” DeFi tools: Metamask becomes a data wallet, Balancer becomes a data exchange, and more. Let’s elaborate.

Data Wallets: Data Custody & Data Management

Data custody is the act of holding access to the data, which in Ocean is simply holding datatokens in wallets. Data management also includes sharing access to data, which in Ocean is simply transferring datatokens to others.

With datatokens as ERC20 tokens, we can leverage existing ERC20 wallets. This includes browser wallets (e.g. Metamask), mobile wallets (e.g. Argent, Pillar), hardware wallets (e.g. Trezor, Ledger), multi-sig wallets (e.g. Gnosis Safe), institution-grade wallets (e.g. Riddle & Code), custodial wallets (e.g. Coinbase Custody), and more.

Datatokens transform bank-grade crypto wallets into data wallets. [Image: CC-SA-3.0]

ERC20 wallets may get tuned specifically for datatokens as well, e.g. to visualize datasets, or long-tail token management (e.g. holding 10,000 different datatoken assets).

Existing software could be extended to include data wallets. For example, Brave browser has a built-in crypto wallet that could hold datatokens. There could be browser forks focused on datatokens, with direct connection to user browsing data. Integrated Development Environments (IDEs) for AI like Azure ML Studio could have built-in wallets to hold & transfer datatokens for training data, models as data, and more. Non-graphical AI tools could integrate; such as scikit-learn or TensorFlow Python libraries using a Web3 wallet (mediated with Ocean’s Python library).

As token custody continues to improve, data custody inherits these improvements.

Data Marketplaces

ERC20 datatokens unlock a huge variety of possible data marketplaces. Here are some variants.

  • AMM DEXes. This could be a Uniswap or Balancerlike webapp to swap datatokens for DAI, ETH, or OCEAN. It could also have something like pools.balancer.exchange to browse across many datatoken pools.
  • Order-book DEXes. It could use 0x, Binance DEX, Kyber, etc. It could leverage platform-specific features such as 0x’s shared liquidity across marketplaces.
  • Order-book CEXes. Centralized exchanges like Binance or Coinbase could readily create their own datatoken-based marketplaces, and to kickstart usage could sell datasets that they’ve generated internally.
  • Marketplaces in AI tools. This could be AI-oriented data marketplace app embedded directly in an AI platform or webapp like Azure ML Studio or Anaconda Cloud. It could also be an AI-oriented data marketplace as a Python library call, for usage in any AI flow (since most AI flows are in Python). In fact, this already live in Ocean’s Python library.
  • “Nocode” Data Marketplace builder. Think Shopify for data marketplaces, where people can deploy their own data marketplaces in just a few clicks.

We can expect data marketplaces to come in many shapes and sizes. [Image CC0]

Data Auditability

Data auditability and provenance is another goal in data management. Thanks to datatokens, blockchain explorers like Etherscan now become data audit trail explorers.

Just as CoinGecko or CoinMarketCap provide services to discover new tokens and track key data like price or exchanges, we anticipate similar services to emerge for datatokens. CoinGecko and Coinmarketcap may even do this themselves, just as they’ve done for DeFi tokens.

Data DAOs: Data Co-ops and more

Decentralized Autonomous Organizations (DAOs) help people coordinate to manage resources. They can be seen as multi-sig wallets, but with significantly more people, and with more flexibility. DAO technology is maturing well. A data DAO would own or manage datatokens on behalf of its members. The DAO could have governance processes on what datatokens to acquire, hold, sell / license, and so on.

Here are some applications of data DAOs:

Co-ops and Unions (Collective Bargaining). Starting in the early 1900s, thousands of farmers in rural Canada grouped into the SWP for clout in negotiating grain prices, marketing grain, and distributing it. Labour unions have done the same for factory workers, teachers, and many other professions. In this paper, the authors suggest that data creators are currently getting a raw deal, and the solution is to make a labour union for data. A data DAO could be set up for collective bargaining, as a “data co-op” or “data union”. For example, there could be a data co-op with thousands of members for location data, using FOAM proof-of-location service.

To market and distribute their grain to consumers thousands of miles away, farmers organized into co-operatives like the Saskatchewan Wheat Pool (SWP). The SWP managed a system of grain elevators, trains, ships, and more to manage this. [Image: CC-BY]

Manage a single data asset. There could be a DAO attached to a single data asset. One way is: create a Telegram channel dedicated to that dataset. You can only enter the Telegram channel if you have 1.0 of the corresponding datatokens (inspired by Karma DAO). This can also be for Discord, Slack, or otherwise.

Datatoken pool management. There could be a data DAO to manage a datatoken pool’s weights, transaction fees, and more, leveraging Balancer Configurable Rights Pools (inspired by PieDAO which does this for a pool of DeFi assets).

Index Funds for Data Investments. Using e.g. Melon, construct an investment product for people to buy a basket of data assets (inspired by existing mutual and index funds).

Data: A New Asset Class for DeFi

In the previous section, I described how DeFi tools can be repurposed to help the Data Economy. We can flip this around: the Data Economy can help grow DeFi, because data is a huge industry. The data economy is already 377B€ for Europe alone, and growing. That’s 30x larger than DeFi assets under management (AUM).

“The economic impact of data is huge. Most economic activity will depend on data within a few years. The value of the European data economy for the 28 Member States is expected to grow from €377 billion in 2018 to €477 billion by 2020 and €1.054 billion by 2025 in a high-growth scenario based on the right conditions being in place.” — European Commission “Building a Data Economy” brochure

Data is a new asset class. It can be securitized and used as collateral. An example is Bowie Bonds, where a fraction of David Bowie’s IP (intellectual property) licensing revenue was paid to bondholders. Data is IP. To use it as a financial asset, one must price it. In Bowie’s case, the value was established from previous years’ licensing revenue. Alternatively, we can establish price by selling data assets in data marketplaces.

As such, data is an asset class. With datatokens, we can onboard more more data assets into each major DeFi service type:

  • Data assets can be used as collateral in stablecoins and loans, therefore growing total collateral.
  • Data assets bought and sold in DEXes and CEXes contributes to their $ volume and AUM.
  • There can be insurance on data assets. As described above, there can be data DAOs, data baskets, and more.

In short, datatokens have great promise to grow the size of DeFi in terms of $ volume and AUM.

Data to Optimize DeFi Returns

We can close the loop with data helping DeFi, and vice versa. Specifically: data can improve decision-making in DeFi to optimize returns. This will catalyze the growth of DeFi further. Here are some examples:

  • Yield farming. Data can improve the automated strategies to maximize annual percentage rate (APR). Think yearn.finance/earn robots, but optimized further.
  • Insurance. More accurate models for lower risk.
  • Loans. Better prediction of default for under-collateralized loans.
  • Arb bots. More data for higher-return arbitration bots.
  • Stablecoins. Assessment of assets for inclusion in stablecoins.

Data-powered loops. DeFi looping techniques further boost returns. For each of the examples above, we envision loops of buying more data, to get better returns, to buy more data, and so on. To go even further, we could apply this to data assets themselves.


We’ve built datatokens for Ocean V3. The code is in private beta with teams building on Ocean, and is undergoing a security audit. In coming weeks we will open up the GitHub repositories, and release updated documentation.


In this post, I described Ocean Protocol “datatokens”. Datatokens act as an overlay protocol to more easily combine pieces of data infrastructure. We can repurpose DeFi tools to immediately enable data wallets, data exchanges, data provenance, data DAOs, and other tools for a Web3 Data Economy.

I also described how data becomes a new asset class to grow the overall DeFi pie, and how data can help optimize DeFi returns.

Data legos, here we come.

Further Reading


Thanks to the very much to the following people for reviews: Bruce Pon, Simon de la Rouviere, Sarah Vallon, and Monica Botez.

Thanks to my excellent colleagues at Ocean Protocol for the collaboration in building towards this. Thanks especially to Ahmed Ali for the conversations that helped refine the datatokens concept. Finally, thanks to the broader Ocean community for their ongoing support.


[1] Here, we describe how blockchains prevent double-spending. Let’s illustrate how the Bitcoin system prevents double-spending of Bitcoin tokens (bitcoin). In the Bitcoin system, you “control” an “address”. An “address” is a place where bitcoin can be stored. You “control” the address if you’re able to send bitcoin from that address to other addresses. You’re able to do that if you hold the “private key” to that address. A private key is like a password — a string of text you keep hidden. In sending bitcoin, you’re getting software to create a transaction (a message) that specifies how much bitcoin is being sent, and what address it’s being sent to. You demonstrate it was you who created the transaction, by digitally signing the message with your private key associated with your address. The system records all such transactions on this single shared global that has thousands of copies shared worldwide.

[2] We could also use ERC721 “non-fungible tokens” (NFTs) [ERC721] for data access control, where you can access the dataset if you hold the token. Each data asset is its own “unique snowflake”. However, datasets typically get shared among >1 people. For this we need fungibility, which is the realm of ERC20. ERC20 has more applications and better interoperability. ERC721 datatokens remain an option for future consideration.