Part 1: A look at why generalized state channels are going to help build the decentralized future and how to build them. Part 2 can be found, here.
Please note this is a thorough but not 100% complete or definitively secure methodology of how to build out generalized state channels. This code is not meant for immediate use. Also, libraries and tools used in this tutorial may have been updated since this post was published, please be mindful.
The May/June issue of Foreign Affairs magazine struck a chord, Is Democracy Dying?. In one of the essays, The End of the Democratic Century by Yasha Mounk and Roberto Stefan Foa, Larry Hagman, who plays the infamous J.R. Ewing on the hit show Dallas, is quoted saying, “We were directly or indirectly responsible for the fall of the [Soviet] empire…”.
The essay later goes on to say that Larry claimed it was not idealism but rather “good old fashioned greed” that “got them to question their authority.” In other words, people in the Soviet Union were not concerned or convicted with the ideology of democracy so much as the advertised standard of living that came along with it in the United States and other democratic nations.
Another prominent concept to consider is that avarice is the the most difficult vice to recognize within yourself. We can each easily point out the greed that exists within our neighbors’ worldview, but who comes to the world stage to admit their own greed? Many more would confess to theft, cheating, lying and worse, than would begin to think of greed as something of their own struggle because who cannot point to an offender greater than themselves? Perhaps only a lucky very few, and even more so, there is no universal measuring stick for this — who is to say that my spend is extravagant in one area when I have rationalized my spend, savings, or givings in another area to compensate? The fundamental challenge here is that until you have given away everything, there is always more to give. That being said, if avarice is notably challenging to diagnose, let alone treat, should significant stock be taken in an aspiration for its absence?
Before digressing entirely, suffice it to say that the realization of these ideas, that (1) the driving force behind ideology adoption is, in many cases, greed and that (2) greed is the most difficult vice to detect and thus rectify, has led me to question the underlying assumption that the world is aching for the centralized authorities to relinquish their control of our data and money. What mindset would we be in should the price of Ether or Bitcoin crash to 50% of its current value? 10%? 5%? The money in “decentralization” through the rise of cryptocurrency value has arguably brought about its popularity, but is this kind of attention sustainable in the light of their instability? It would be remiss to not mention stable coin projects, but what about the cost for stability currently being higher than that of the stability of traditional fiat, such as the US dollar? Stable coin projects cannot just be an option. They have to be better, faster, and cheaper than traditional options — and some have achieved components of that triad, but not all.
And if you stopped reading here, you might surmise that just about every light could be extinguished for a decentralized world…but then you would miss out on all the reasons why we should all be more hopeful than anytime in the past.
Within the Ethereum community, on-going scaling efforts are continually working to realize the decentralized dream, through addressing all three elements of the trinity mentioned above — better, faster, and cheaper. If decentralization can be achieved through a significantly better quality platform, a faster transaction time, and a cheaper cost of use, then all we really have to worry about is the madness of crowds (and a few well armed dictators) because when decentralization is achieved through a significantly better means than existing institutions, ideally, we should reach a better quality of life than prior to decentralization. Consequently, the greed that was vilified earlier in this article will serve as a propulsion to the acceptance of decentralization rather than a hindrance. As the old proverb goes, how do you know bad luck?
Behind all the reasoning is the excitement around state channels. As a reformed skeptic, I have come to really dive into the value and immense impact that can be achieved through building out state channels. Relatively new to Ethereum as of July 2017, at first, I brushed state channels off as merely a solution for “micropayments” and “payment channels.” There are many other scaling solutions in the works: plasma, side chains, sharding, and there is a great resource summarizing most of these of here. However, knowing that teams were actually building these state channels out and putting them into production with the current capabilities of smart contracts on Ethereum, I wanted to learn this approach in-depth first. There are several available resources, from Spankchain to Machinomy to Connext to Raiden to FunFair; it was hours of reading articles, listening to talks, and perusing code that finally made me understand that state channels have far more implications than the perceived simplicity of payment channels.
It is important that the Ethereum community continues to build more and build for the future; this article is attempting to explain the build out of generalized state channels from a wider developer perspective, so that others can refine and use them. Like the last in-depth look at ERC-721’s, this article makes a similar assumption that the reader is acquainted with the concept of state channels and how to write smart contracts with Solidity. A general introduction to a simple payment channel can be found here and here.
For state channels, Jeff Coleman’s 2015 post is a compelling introduction in which he asserts that the basic components of a state channel are:
1. Part of the blockchain state is locked via multisignature or some sort of smart contract, so that a specific set of participants must completely agree with each other to update it.2. Participants update the state amongst themselves by constructing and signing transactions that could be submitted to the blockchain, but instead are merely held onto for now. Each new update “trumps” previous updates.3. Finally, participants submit the state back to the blockchain, which closes the state channel and unlocks the state again (usually in a different configuration than it started with).
Notice the emphasis on state in the above. Granted you may have already heard it several times, but to differentiate this notation of state versus solely a balance allows for profoundly more powerful applications to be constructed through the use of state channels. From the Ethereum Whitepaper, recall that state is significantly further reaching than a balance:
In Ethereum, the state is made up of objects called “accounts”, with each account having a 20-byte address and state transitions being direct transfers of value and information between accounts. An Ethereum account contains four fields:* The nonce, a counter used to make sure each transaction can only be processed once* The account’s current ether balance* The account’s contract code, if present* The account’s storage (empty by default)
Notice, storage. That is a big deal because you can save a significant amount of information in storage. Objections may arise because storage is notoriously expensive on Ethereum, but that is the point of state channels — storing the state off-chain and only using the blockchain for transactions that are required for a “final” reconciliation of a balance or storage.For example, imagine constructing a game using state channels — like pong or tic-tac-toe. There may be an exchange of value at some point, but ultimately, those games boil down to updating a state between two parties.
Now onwards to generalized state channels, where things really start to get interesting. Let’s say you are trying to decentralize the sharing economy, like Uber or AirBnB, and you would want your users to be able to pay in Ether or a mix of available ERC20 tokens. Would you really want to have a host of different contracts that would initialize a new state channel for every different use case? Imagine you could pay in 20 different tokens — that’s 20 different sets of contracts to deploy a state channel. Expensive, and a nightmare to maintain the front end. Hence, this is part of the reason why the concept of generalized state channels has arisen in which a framework is maintained in a single set of contracts that can deploy multiple state channels desired. When a state channel is opened, funds are held in escrow in this system, while the homogenized structure accommodates the functionality required by all possible state channels within its realm of awareness.
Anyway, let’s dive into it. To summarize the major components of generalized state channels, there are four major processes that must be handled:
- Opening a Generalized State Channel
- Sub-Channels & Off-Chain Interaction Between State Channel Parties
- Closing a Generalized State Channel
- Handling Disputes
To begin, let’s think about agreements for a minute. Ultimately, what should happen to facilitate the most efficient and least risky agreement is:
- Individual A and Individual B need to trust each other in an agreement.
- Individual A and Individual B need to have conditions in place should something go awry.
Trust begets efficiency, because reliance on third parties and fall-backs is eliminated or lessened — the shortest distance between two parties is a straight line, right? However, trust is risky, and if someone is dishonest and things goes wrong, trust becomes extremely risky. Should one party take advantage of another, with no rules or fall-backs in place, then the other is left fending for themselves. More importantly, irrespective of consensus on rules and regulations, if there is nothing or no one to enforce these ideas, then the codification of rules becomes a list of suggestions, rather than a set of rules and consequences. Accordingly, there needs to be conditions to validate faulty transactions and the ability to enforce them.
In the example above, there is nothing that necessitates a smart contract exists, at least in the traditional sense, because there is a far simpler way to signal agreement to conditions with another party through the use of digital signatures. Think about even analog contracts — the validity of a contract is not determined by its existence but by the agreement of two parties to it, mostly commonly through personal signatures.
How Are Messages or Information “Signed” Off-Chain?
If you are already familiar with this process, you can skip ahead to the next section: How Do Two Parties Agree to a State?
web3.eth.sign(address, dataToSign, [, callback]). To begin, the
address argument is the address with which we would like to sign the data. This address must be unlocked for the message to be signed, which may prompt you to ask, why must the account be unlocked if we are not sending anything to blockchain? The signing of this data requires the use of an account’s private key — should this not be required, then anyone may be able to sign one another’s data through the use of solely a public key, and these digital signatures would be rendered useless. Now,
dataToSign is slightly more involved. It is the
keccak256() hash of all the data inputs that the user would like to sign as well as the prefix
“\x19Ethereum Signed Message:\n32”, in which 32 is the number of bytes in the hash.
Why do we sign the hash of the data? Recall that the main point of signing data is so that a signature can be verified. This verification is performed using the
ecrecover() function in which the address of the signer is derived from the hash of the signed data as well as other units v, r, and s:
ecrecover(bytes32 hash, uint8 v, bytes32 r, bytes32 s) returns (address)
What are v, r, and s? You may have also seen these outputs using
eth.getTransactionByHash() as signatures are required of normal on-chain transactions as well. In this case, after using
web3.eth.sign() to sign a transaction, the returned data is a string of the signed data. This signed data is a result of using the Elliptic Curve Digital Signature Algorithm (ECDSA) to complete the signature, where after the “0x” hex prefix:
r = signature[0:64]
s = signature[64:128]
v = signature[128:130]
Note the following instruction included in the Web3 documentation: “ …if you are using
v will be either
"01". As a result, in order to use this value, you will have to parse it to an integer and then add
27. This will result in either a
27 or a
This means after signing data using
web3.eth.sign(), we can use the results in a smart contract to verify that a certain address has signed the data using
ecrecover(). In this way, we can validate whether an address has signed a piece of data or not on-chain as part of transaction execution requirements. OpenZeppelin provides a great library for this activity via ECRecovery.sol.
Getting back to opening a state channel, what is posited is that if two addresses (or parties) sign the same set of data and if we can verify the signature, then it can be assumed the two parties agreed to the same set of data. Now, let’s think about the type of data that can be signed — anything that can be hashed. This means two parties can agree on a lot — even a state.
How Do Two Parties Agree to a State?
A major theme throughout state channels is the building of an off-chain state that can be recompiled on-chain, if and when needed. Although that sounds intimidating, it is actually fairly simple. Conversion to bytes allows us to put multiple types of data together in condensed byte form that can be extracted, manipulated, and put back together. Let’s say we would like to agree to the state of two address’ Ether balances. So what we really need is to agree on: address A, address B, balance A, and balance B. We can build a ‘state’ using a function such as the below:
In the above, you can see that first each variable input is converted into a bytes32 form via another function
toBytes32. You can have a look at any of the mentioned open source projects for state channels that use similar functions to build
Now, recalling what we know about digital signatures, this
state can be hashed to 32 bytes, and then signed by any address in agreement to it. Later, on-chain, this state can be decomposed for verification or validation using assembly as below:
There is pretty much a straightforward pattern here if all your want to do is decompose a state: use assembly to load each 32 byte variable into memory (
mload) as it was built in the order previously agreed on and return the variables for which you are looking. Plus, note how we can use this as a pure function in Solidity, meaning there are no gas costs associated with this call as we are only reading the state of the blockchain, rather than changing the state through altering storage. As an aside, there is a good video here with a more in-depth explanation as to how assembly works in Solidity. I don’t mean to brush off the complications with introducing assembly — you do have to be careful overwriting memory incorrectly, and you can accidentally introduce security vulnerabilities. You should always get an audit and learn as much as you can about what you’re doing. All I am saying is that you don’t have to be an expert in assembly to work with state channels if you learn the patterns well and take precautions.
So you may be wondering at this point, why not just hash all the original information together as is? For example, in the above, why not just use:
var hashToSign = keccak256(addressA, addressB, balanceA, balanceB)
Then, we wouldn’t have to deal with converting our variables to
bytes. This is true, sure. However, remember that we are talking about building generalized state channels, and in order to be generalized, we need to be able to handle a variety of input combinations. We will see how this comes together later.
How Do Two Parties Agree To A Set Of Rules For An Agreement?
So we have talked about how to agree on a state off-chain, how about what happens if a party is dishonest? How do we ensure that two parties agree on a set of rules, or a contract, should one party not comply? Certainly, what I am alluding to sounds like a case for a smart contract to exist. We just learned how a state could be built off-chain and recompiled on-chain, so an apt assumption is that we could have a contract to evaluate the states signed off-chain to determine if any malicious or fraudulent behavior occurred.
However, if we are in an environment in which our agreements are, for the most part, executed without issues, then most of these dispute handling contracts would not be used. Imagine paying rent on all of those unused contracts. In light of efficiency, you could postulate that these dispute handling contracts would be waste or non-value add, if they are not used. So, do we really need to have a dispute handling contract every time we have an agreement? Really, we only need a dispute handling contract when there is a dispute, and we do not need the dispute handling contract when we do not have a dispute. Although that’s a quite a nuanced conditional, there is a term for what we are referencing here: counterfactual conditionals.
David Hume summarized it best, when he said
“… we may define a cause to be an object, followed by another, and where all objects, similar to the first, are followed by objects similar to the second. Or in other words, where, if the first object had not been, the second never had existed …"
To drive this efficiency, we use a method called “counterfactual instantiation” in which both parties agree to create a contract to handle disputes should either party decide to open an instance of dispute resolution.
Let’s think about how this could work. We can agree to a state, which in our case we put together as a long string of bytecode. Recall that when deployed contracts are compiled to bytecode to be read by the EVM , so as just another string of bytecode, can we agree to a contract bytecode? Yes.
Then, using assembly, through the
create() opcode, we can use contract bytecode only if we need to deploy it. What’s more, the addition of constructor arguments in the bytecode is surprisingly simple as constructor inputs only need to be appended to the contract bytecode in the order in which they appear in the constructor arguments.
Below is a look at how a Solidity function would be implemented to use bytecode to deploy a smart contract.
To look at an example of how that would work with constructor inputs, have a look at the test below with a type of simple storage contract. We deploy the factory in which the
deployCode() function would be, then using Truffle we can grab the bytecode from the artifacts that we imported into the test. From there, we append the constructor argument, in this case a number, to the end of bytecode. Then, we use the
deployCode() function to deploy the contract. Next, we grab the address of the new contract through an event, and check that the particular constructor input was correctly used. You can have a look at the full contract scheme and deployment tests here.
How Do We Know the Address Of Our Counter-Factually Instantiated Contract?
One thing we have not mentioned yet is that if we counter-factually instantiate the dispute contract, how will we know the address of the contract to reference and send transactions, if it has not been deployed yet? Technically, contract addresses are deterministic — if you know the address from which it is deployed and the nonce. However, that is not applicable for our situation because, as noted above, we will be using a type of factory contract to deploy our bytecode, which could be deployed via a transaction from any party. Furthermore, there is no way to know what the nonce of an account might be at the time of needed deployment. It makes sense that in opening a state channel that we would include a reference to what could be our counter-factually instantiated contract. Also, recall that when a state channel is opened, funds are held in escrow until the channel is closed for settlement. To consolidate related functionality, it would follow that this escrow contract also include functionality for opening and closing a channel.
We have reasoned our way through much of what needs to happen to open a generalized state channel. Currently, our list of all that must be used to deploy a generalized state channel stands as:
- A contract that holds funds in escrow while the state channel is opened. This contract would also allow parties to open the state channel, close the channel, and initiate a dispute if required. This contract would also include a reference to the “potentially-deployed” dispute handling contract.
- A counterfactual dispute handling contract. (Remember, it is counterfactual, so this contract will not be deployed yet; however, the contract would need to be already built in order to compile it and get the bytecode.)
- A contract factory to deploy the counterfactual dispute handling contract if required. As this contract is only a factory, it only needs to be deployed once to be used by any party that may want to deploy their respective counterfactual contract.
From this point on, I will take a deep dive into Spankchain’s proof of concept here. Between their article with resources and their video, I found their set of contracts a bit easier to follow, and let’s be honest, they are leading the way here with channels deployed on mainnet.
So, let’s take a look at how we would deploy #1 above in reference to Spankchain’s
MultiSig.sol contract. In the constructor, we take in the
bytes32 _metachannel, which will be in reference to the counterfactual MetaChannel contract, or the dispute handler contract as it has been described. From this point forward, the counterfactual “dispute handling” contract will be referred to as the MetaChannel contract. Also, we include the
address _registry, which is in reference to a contract factory that also functions as a type of registry for mapping counterfactual addresses to actual contract addresses, if deployed.
_metachannel variable is in reference to the agreed upon MetaChannel contract bytecode, constructor inputs, and addresses to verify the signatures of the parties that have signed the
_metachannel state and are opening a state channel. We will have the following compiled into long bytes variable and hashed:
_metachannel = keccak256(bytes_composition_of :  addressA,  addressB, [ ] contractCodeLength, [ ] contractByteCode, [ ] constructorArguments, which include the registry address, the address of Party A and the address of Party B to join the channel);
Looking into our
_registry contract, this would be a previously deployed contract that acts as a contract factory to deploy our MetaChannel contract, if needed. It would employ a function as in
deployCTF() below that could verify that the MetaChannel creation had been signed by both parties. The
_state is the unhashed state of the
_metachannel, which was included as a constructor parameter in deploying the MultiSig contract.
You may ask why we do not use the state as a parameter for the MultiSig contract rather than the hash of the state. There is not an advantage in doing so as we know that the state is going to be composed as : addressA, addressB, contractCodeLength, contractByteCode, and constructorArguments. The constructorArguments in this case can be seen in the MetaChannel constructor below as well — the registry address, party A’s address and party B’s address. The state of the MetaChannel needed for the registry to deploy the counterfacutal contract can be easily saved and rebuilt off chain, and the existence of its 32 byte hash on chain is far more easily verified than its complete state, which would require a comparison of each of its individual elements. Plus, notice in the below that we will save the bytes32 of
_metachannel in the MultiSig contract, and it will also be used in the Registry to map the deployed address of the MetaChannel, if it is created (see line 23).
To go through the deployment of a MetaChannel contract through the registry,
deployCTF() takes in the
_state of the MetaChannel, built off-chain as well as the components of the signed data, which would also be taken from an off-chain location. First, we use the internal functions
_decodeContractCode(_state) to draw out the relevant information from the
_state — they can be examined in more detail from here. Next, now that we have the addresses of Party A and Party B, we can verify that they signed the state to create the MetaChannel contract using the internal function
_getSig. Following that, using assembly, the MetaChannel contract is created via the
create() opcode, and lastly, to save the newly deployed MetaChannel contract as a reference, it is mapped to the bytes32 of the hashed state also saved as
metachannel in the MultiSig contract.
As an aside, let’s think for a minute, who would deploy this MultiSig contract? Party A or Party B? Neither Party A nor Party B is required to deploy the MultiSig contract as these addresses are only identified in the
_metachannel state. It is pretty interesting to think that a complete third party, Party C, may open up a MultiSig contract between two parties. For that matter, think about who could deploy the MetaChannel as well…more on this later.
At this point, we have only deployed the respective framework. We have not actually opened a state channel yet. Recall that this functionality is contained within our MultiSig contract, so let’s finally have a look at how to open a generalized state channel.
MultiSig will contain a function an
openAgreement() function as the code snippet further below. You’ll notice it takes in another
_state and an address
_ext as well as the signature verification requirements. This state will be entirely different than the MetaChannel state that we previously saw as that was the state to initiate a MetaChannel contract, if needed. The new
_state that we will build to open the agreement or channel is comprised of the following:
bytes32 _state = keccak256(bytes_composition_of:  bool isClose  _uint256 sequence  address partyA  address partyB  bytes32 counterfactualMetachannel  bytes32 subChannelRootHash  uint256 balancePartyA  uint256 balancePartyB)
The state above outlays:
- A boolean
isClosethat determines whether the channel is closed or not.
- A value
sequencethat increments by 1 as the state transitions.
- Naturally Party A and Party B addresses and balances are included
- The bytes32 of the MetaChannel hashed state that we used in the constructor.
- Lastly, a new variable we see is
subChannelRootHash, this will be discussed in the next section.
Next, we can see that an address
_ext or an “extension” is used. This
_ext is actually a library (example) that will be used to interpret the
_state and open the channel through the use of
delegatecall() . Here is the beginning of where you can see a generalization of methods taking place. We can use a library for Ether payments, a library for ERC20 payments, ERC721’s or likely many other use cases that are yet to be seen. For now, we will continue with the Ether payment case.
Going through the
openAgreement() function, first we ensure that the extension is available for use in the array of
extensions that the MultiSig contract holds — as in a party may not use any library but only those pre-approved and trusted. As an aside, in Spankchain’s current PoC, extensions are added in the
openAgreement() function dynamically. In this future, one possible option is to include the extension addresses as an argument for the MultiSig constructor, so that extensions are set from the onset. However, it would be nice to be able to add more extensions dynamically because this one MultiSig contract could be potentially redesigned to open several channels in the future as well.
Anyway, then, we ensure that the channel is not yet open or pending — pending indicates that one party has already joined. Then, we validate that the matching
metachannel state has been sent to the contract and used. Next, we ensure that the
msg.sender has signed the respective
_state, through the use of the internal
isPending is set to true now that one party has joined the channel.
Later, line 21 shows how we finally open the channel through the use of
delegatecall() to the respective
_ext library to open the channel.
You can see from below in opening a channel the balance is validated in the extension library from the state as well as Party A’s address. In this specific case, Party A must be the individual to open the state. Later, we store the
msg.sender address in
partyA for the MultiSig contract and the keccak256 hash of the
_state in the
One more step to go, Party B will join the channel through the use of of the
joinArgreement() function below. Again, we assert the use of a proper extension, verify that the channel has not been opened yet, and we check that Party B is using the same
_state as signed by Party A.
isOpen is set to true, so that Party A may not attempt to re-open another channel with the same state. The signature of Party B is verified for the
_state, and using
delegatecall() with the extension library, Party B joins the channel.
In the extension library, again, the balance of what Party B has sent to the MultiSig contract is verified with what is recorded in the state.
At long last, we have finally opened a generalized state channel. Keep in mind that we are flowing through the back-end of this process. The front-end experience will be key in the design of state channels. In the lengthy process detailed above, assuming a third party deploys the MultiSig contract, Party A and Party B, the ultimate end users, needed to:
- Sign the MetaChannel State
- Sign the Initial State of the Channel
- Send a Transaction to Open the Channel with a Payment
Steps 1 and 2 are off chain transactions that, although they require the use of a private key, are virtually instantaneous as off-chain transactions. Step 3 is our only on-chain transaction required of our users thus far. Furthermore, if we think around the combination of opening state channels in conjunction with other tools developing in the Ethereum ecosystem, it will drive efficiency in entire value chains that previously existed in silos. For example, there is the impending arrival of (more) legally recognized smart contracts( e.g., OpenLaw) and the incorporation of better identity management solutions (e.g., Uport) or data sovereignty management systems (e.g. Linnia).
After 5,000 words , I have decided to split this reading into two parts. Part 2 can be found here with further detail as to:
- Off-Chain Interaction Between State Channel Parties & Sub-Channels
- Closing a State Channel
- Handling Disputes
- Why: More thoughts on driving the ecosystem forward through state channels and other scaling solutions.