Feedback Control as a New Primitive for DeFi | by Hsien-Tang Kao | Gauntlet | Medium

This year, we’ve seen an explosion of new DeFi protocols providing new mechanisms to enable trading, lending, and other financial activity. While these protocols vary widely in functionality and purpose, a few primitives have emerged as common components across many of these new protocols. Constant Function Market Makers (CFMMs) and automated interest rate curves are two of the most popular components, appearing in many products such as Uniswap and Compound. As the industry coalesces around these primitives, it begs the question — do better options exist? Feedback control systems are one answer that would potentially improve protocol incentives, efficiency, and resiliency.

What is feedback control?

Feedback is a central feature of life. The process of feedback governs how we grow, respond to stress and challenge, and regulate factors such as body temperature, blood pressure, and cholesterol level. The mechanisms operate at every level, from the interaction of proteins in cells to the interaction of organisms in complex ecologies.— M. B. Hoagland and B. Dodson, The Way Life Works, 1995

Control theory has been studied extensively in applied mathematics, electrical engineering, and robotics. It has a broad range of applications across many industries, including aerospace systems, autonomous vehicles, and IoT devices. In the classic “Feedback System” textbook, Karl Johan Åström and Richard M. Murray define control as using algorithms and feedback in engineered systems.

image

[1] Open-loop System

image

[2] Closed-loop System

Figure [1] and [2] illustrates the difference between open-loop and closed-loop control system. In an open-loop system, the controller output is independent of the system output. In contrast, a closed-loop (feedback) system’s controller takes the system output as an additional input. In a closed-loop system, the system dynamics is dependent on the controller dynamics, and the controller dynamics is also dependent on the system dynamics, which creates a coupling effect of system and controller dynamics. Understanding the feedback system is nontrivial because of the cyclic dependency.

A brief history of feedback control & reinforcement learning

The proportional–integral–derivative (PID) controller is the most commonly used feedback controller. It continuously calculates the control signal using the difference between the desired system state and the observed state. Nicolas Minorsky developed the first theoretical analysis for the PID controller in the early 1920s for the U.S. Navy ship’s automatic steering systems. In the 1950s, the commercial digital computer became available, enabling the optimal control theory’s rapid development. The primary concern of optimal control is to find a control law that produces an optimal state trajectory and minimizes or maximizes a measure of a dynamical system’s behavior. Richard E. Bellman’s “principle of optimality” (or Bellman equation), dynamic programming, and Markov decision process were developed in this era to solve the optimal control problem. In the late 1980s and early 1990s, the prior work in optimal control and artificial intelligence led to the progress of reinforcement learning. Reinforcement learning generalizes the optimal control problem and solves the problem through trial-and-error learning or approximation, without the complete knowledge of the system state. The advances in computation and deep learning algorithms in the past twenty years brought up a new wave of successfully deep reinforcement learning algorithms. Deep reinforcement learning extends reinforcement learning by using a deep neural network without explicitly designing the state space. DeepMind leverages these algorithms to create artificial agents that can play Atari games and Go better than humans.

PID Controller

An intuitive way to understand feedback control or the PID controller is via a proportional controller

where K_p is a constant. In a proportional controller, the control input u(t) is proportional to the error e(t) between the observed output and the desired system output.

Here we show how a thermostat uses a feedback mechanism to control the room temperature. Assuming the current temperature is 90 °F and the thermostat temperature is set to 70 °F, the error is 20 °F. With K_p = 0.1 kilowatts per °F, the thermostat controls the air conditioner so that it uses u(t) = 2 kilowatts to cool down the room. When the temperature goes down to 80 °F, the error decreases to 10 °F, and the air conditioner will output one kilowatt of power. As seen from this example, the thermostat outputs a control signal to change the air conditioner’s output power and reduce the temperature. The thermostat measures the error of temperature and changes the output control signal. This feedback loop makes the room temperature gradually converges to the desired temperature.

image

A block diagram of a PID controller (source: Wikipedia)

PID controller extends the concept of the proportional controller. In addition to the current error e(t), it measures the cumulative error \int e(t) and the rate of change of the error \frac{de(t)}{dt} to calculate the control input:

image

where K_p, K_i, and K_d are constants.

Feedback Control & DeFi

image

Feedback control is a simple yet powerful idea and is used extensively in real-world systems. Outside the existing applications, feedback control is also an essential building block of the DeFi applications. Imagine that a protocol has a high-level objective, the protocol measures how far away the current state is and uses a feedback mechanism to update the protocol parameters to incentivize market participants to drive the system toward the desired state. For example, a stablecoin protocol wants to maintain one dollar peg. The protocol continuously adjusts the interest rate according to the stablecoin price. When the stablecoin price is above one dollar, the protocol cuts the interest rate and incentivizes participants to issue more stablecoins. Otherwise, the protocol raises the interest rate and incentivizes participants to pay back their debt. With the algorithmic adjusted interest rate, the market can reach the equilibrium supply and demand when the stablecoin is around one dollar.

Many DeFi applications are already implicitly or explicitly using this pattern in the protocol design. Here we will use Ampleforth’s rebase, RAI’s reflex index, EIP-1559’s fee market proposal, and THORChain’s incentive pendulum to illustrate the use of feedback controllers in different mechanisms. We will also show how feedback control can enable on-chain derivative pricing.

Volatility Dampened Asset

Ampleforth and RAI pioneered the concepts of uncorrelated and low-volatility crypto assets. At first glance, those protocols seem to have distinct underlying mechanisms. AMPL dynamically adjusts the supply to solve the supply inelasticity problem, whereas RAI uses a dynamic redemption rate mechanism to minimize the reflex index volatility. However, both protocols are essentially feedback control systems aiming to create a volatility dampened asset. The major difference between these protocols is that they use different control inputs. We will use the feedback control framework to show the similarities and differences between these two protocols.

Ampleforth Rebase

AMPL are digital assets that dynamically adjust the supply based on market prices. When the AMPL’s price is greater than $1, the supply expands and vice versa. The expansion and contraction of the token supply mechanism, rebase, incentivize rational AMPL traders to step in and push the AMPL price toward the $1 target.

To formulate the rebase mechanism, we first define the error as the difference between the target value and the observed value:

Assuming the target value is $1, and the observed value is the current price, the error term:

When the price deviation e(t) is greater than the deviation threshold d_t, AMPL’s supply adjustment is:

From the above equation, we can formulate rebase as a proportional controller, where

With the control rule:

As seen from this example, rebase lag is the key parameter determining the system behavior. Choosing a proper rebase lag parameter is effectively the same as tuning proportional gain for the controller. The impact of the proportional gain on the system characteristics has been studied extensively in the control system: a high proportional gain (or low rebase lag) reduces the steady-state error, results in faster rise time, but increases the overshoot and makes the system more oscillatory.

image

Response to a step reference with different values of K_p (source: control tutorials for Matlab & Simulink)

RAI Reflex Index

Reflex-index is an asset with volatility lower than its collateral. The system uses a MakerDAO like Collateralized Debt Position (CDP) for asset issuance. When the reflex index’s redemption price deviates from the market price, the protocol adjusts the redemption rate, the rate of change of the redemption price, incentivizing CDP holders to generate more debt or repay outstanding debt.

RAI Reflex index is the first protocol explicitly referencing the PID controller in the protocol design. The error term in the reflex index is the difference between the market price and the redemption price:

The redemption rate is the control input, and is modified by a proportional controller:

and

Given the above two examples, Ampleforth and RAI both have a feedback control mechanism. These protocols target the system at a specific reference price but use different economic mechanisms to influence the token supply. Ampleforth directly alters the system’s total supply to incentivize participants for “supply discovery” or “market cap discovery,” pushing the AMPL price toward $1. RAI changes the redemption price, incentivizing participants to rebalance the total outstanding debt and reduce price volatility.

EIP-1559: ETH Fee Market Change Proposal

The current Ethereum fee market uses a simple first-price auction to price the transaction fees. The auction mechanism is suboptimal and creates a considerable overhead for the bidders because each bidder needs to bid based on the expected bid value of other competitors. EIP-1559 solves the problem by an adaptive fee mechanism, such that the total fees collected can outweigh the network’s social cost.

The proposed transaction fee comprises a dynamically adjusted base fee and an extra tip to the miner. The block usage is the primary factor determining the base fee: when the block usage is higher than the target usage, the base fee increases, and vice versa. The algorithmic-adjusted fee algorithm seeks to find a game-theoretic equilibrium and establishes a fee lower bound. The proposal is probably the most significant change in ETH 1.0, and it will dramatically change the user experience and the monetary policy.

Not surprisingly, EIP-1559 can be formulated as a feedback control problem. The base fee adjustment algorithm is:

The error term in the algorithm is:

The base fee adjustment algorithm is also a proportional controller, where

With the control input:

and

THORChain Incentive Pendulum

THORChain is a decentralized network that facilitates cross-chain asset swaps. The protocol requires the system’s total pooled capital to be greater than the bonded capital to guarantee its safety. In THORChain, the optimal 2:1 bonded to pooled capital is considered the optimal system state. The incentive pendulum is designed to keep the system in a balanced state; it reallocates the total inflation reward and trading fees to participants, so the system can gradually converge toward the optimal state. In particular, the proportion of system income distributed to liquidity providers is:

where b and s denote the total bonded and total pooled capital, with the rest given to bonders. In the optimal state, the incentive pendulum distributes 33% of the system income to the liquidity providers and 66% of the system income to the bonders. If the system only has bonded capital, the incentive pendulum distributes 100% of the system income to the liquidity providers.

THORChain’s incentive pendulum uses a deterministic formula to calculate the system income distribution. Although it does not use the PID controller formulation, incentive pendulum and PID controller do share a very similar concept:

  • The mechanism attempts to minimize the error over time, i.e., making the system state converge to an optimal state.
  • The control signal is a function of error, where the error is the difference between the measured bonded-to-pooled capital and the optimal bonded-to-pooled capital.

On-chain Derivative Pricing

One of the biggest surprises of 2020 was that decentralized exchanges for spot assets could both handle and support volumes that are on the same order of magnitude as centralized spot venues. However, the most actively traded crypto-denominated product, the perpetual future, has yet to make decentralized splashes. There have been a number of early attempts at decentralized futures products, such as FutureSwap and McDEX, but so far, none of these protocols have lived up to their promise. One of the main reasons for this is that futures trading tends to be much more latency-sensitive than spot trade. This is because the oracle price updates that mark the product need to be extremely quick to avoid frontrunning and backrunning. Moreover, liquidity tends to be added and removed at higher velocity from derivatives trading venues, as lower margin requirements allow users to take large size bets with less collateral. However, there have been a number of novel mechanisms for replicating payoffs of derivatives without requiring high liquidity velocity. These methodologies involve automated market makers, such as Uniswap, that have dynamic curves. One fundamental piece of work in this direction is a theorem by Alex Evans that shows that if a Balancer pool adjusts its weights according to a modified PID controller (below), then you can replicate any unlevered payoff.

In this above equation, the weights w* of the Balancer pool (thought of as an n-dimensional vector for an n-asset pool) obey a control equation as a function of the expected payoff g. Generating arbitrary derivative payoffs is a matter of adding leverage — if one can borrow against Balancer pool shares that payoff g(x,t) and create new pool shares with the borrowed funds, they can lever their exposure to a constant multiple of g. On-chain lending platforms, like Aave and Compound, are well suited for performing such an operation. How does this relate to perpetual futures trading?

We can view a perpetual futures product as a function that maps an index price p(t) to a positive or negative payoff. Constant function market makers (CFMMs), like Balancer, allow p(t) to be represented as a quantity vector and the weights of the pool control the mapping from quantity to price. Therefore, we can view an alternative construction of a perpetual product (in finance parlance, a replicating portfolio) as a CFMM whose shape is adjusting to preserve a payoff. While the weight update can still be front or back-run, it is much harder to do this than manipulate a price. This is because you need to manipulate the quantities held in the market maker (x in the above equation) to adjust the payoff g. Unlike manipulating price (a single scalar), you have to adjust the collateral quantity x (a pair of spot assets locked by many LPs). As you can see in Appendix D of our paper on Uniswap, this manipulation is increasingly difficult (linear) as the total value locked increases.

This example illustrates that a number of derivatives products, when coupled with dynamically-adjusting market makers, can exist on-chain when the proper proportional controller is used. While research into designing such controllers is nascent, the popularity of these designs in CFMMs by Yield, Opyn, and others show that control theory makes on-chain derivatives possible.

Ethereum has limited computing and storage capacity

image

In the history of feedback control and reinforcement learning, algorithm advancement is arguably the main contributing factor to success. However, people often ignore the fact that the computing and storage paradigm shift also leads to these technological breakthroughs. Dynamic programming, a method for solving optimal control problems, would not work if there are no commercial computers during the 1950s. Deepmind cannot efficiently train deep reinforcement learning models for playing Atari without GPU clusters and massive storage space.

Ethereum has limited computing and storage capacity. Most of the current DeFi protocols get around with these constraints by using simple feedback algorithms that do not need massive storage to keep track of the historical state changes. Thus, PID controllers or other constant space and time complexity algorithms (run time and space requirements do not grow as the input size grows) fit well in the resource-bounded compute environment.

The natural next step for leveraging control theory on-chain is formulating the DeFi protocol feedback mechanism as an optimal control problem. The reason is twofold: there have been lots of theoretical work on optimal control, and it does not depend on massive computing power. Another potential route for bringing more complex algorithm optimized parameters on-chain is via the protocol’s governance process. Many neutral third parties can process blockchain data and external data sources off-chain, run sophisticated algorithms, and submit the optimized parameters for governance votes to improve the protocol efficiency.

Final Thoughts

  • The proportional controller is the most common form of the controller in the industry. It uses the current error as the input and solves most of the problems reasonably well. To further improve the existing feedback system, the protocol can consider adding “past errors” (integral term) and “anticipated future error” (derivative term) as inputs to the controller.
  • Bonding curves or interest rate curves are mechanisms that incentivize specific user behavior. Parameterizing these curves is non-trivial since the design space is broad. For example, curves with different shapes may achieve a very similar outcome, but it is difficult to claim that one curve is strictly better than the other. The bonding curve based approach suffers from the curse of dimensionality. Parameterizing a three or higher dimensional surface seems to be a challenging task. Protocol development teams can consider using a feedback control approach to simplify the design and parameterization. Instead of designing the whole curve that describes the relationship between a range of parameter values, developers only need to focus on tuning the “rate of change” of the parameter values.
  • Given that smart contracts usually have high-stakes involved and a feedback system’s dynamic nature, designing a feedback control enabled smart contract is challenging. Simulation has been used widely in the industry for parameter tuning. Gauntlet helps protocol designers to stress test their protocols by simulating a wide range of protocol parameters and market conditions. Enabling a safe and efficient DeFi ecosystem has always been our top priority.