Account-centric versus Smartcontract-centric Data Models and Implications for Sharding
By: David Beberman, CTO
This paper is an overview of SagaScale, the Account Negotiation Algorithms posted in https://code.prasaga.com/sagachain/saganode/accountnegotiation
Introduction
SagaChain uses a significantly different data model than all existing Smart Contract blockchains, which we term “account-centric”, and will refer to smart contract blockchains as “smartcontract-centric”. The objective of sharding for SagaChain is to provide a means to both scale the capacity of the number of transactions per second and to maintain the per-transaction latency to an average constant value.
The definition of a transaction for SagaChain is the transaction submitted by a user only. Transactions per second are the completion of execution of those submitted transactions. SagaChain does have a form of cross-shard communication, which is the topic of this paper, but this is independent from the measurement of transactions.
Before going further, it should be noted that the length and complexity of any given transaction can vary. This can affect the instantaneous per-transaction latency and capacity. SagaChain will add shards to accommodate an increase in average latency and forcing the average latency back toward the target.
Independence of Data Objects
There appear to be 3 fundamental data and execution sharding technique, including SagaChain. The first two are smartcontract-centric:
· Assigning a smart contract to a shard, with all code and data local to the shard
· Distributing segments of the data for a smart contract across multiple shards, with the smart contract code on each shard
SagaChain’s account-centric model views the blockchain as managing the consensus on an object state database. The database is a key/value database. Each key is the ID to an object. Each object is an array of bytes. The structure of the bytes consists of a globally defined header, of which the main element is a type field. The type field is just another object ID. That is, a key to another key/value in the database.
This is a well-known first-class object model (also known as a metaclass model). Instead of tying a single instance of code and a single instance of data together, like a smart contract does, an object contains the reference to a class object that contains the code to read/write the data in the object. This model has been implemented as SagaPython but can accommodate other languages.
The details of adding classing to SagaChain’s object state database, multiple inheritance, inspection, class tree definition, etc. are not covered in this document. Instead, for the purposes of the account-centric discussion, only a “Class Account” object is considered which contains the code for account objects as its data in the object state database, and any number of account object instances of the Class Account object type, also in the object state database. Accounts are therefore represented as account objects with individual data, with the code to execute transactions with any account is contained in a single object, the Class Account object in the object state database. Specifically, an account object does not contain any code, only the data for that specific account.
In comparison, a smart contract consists of a single unique set of code, and a single unique set of data. Accounts are logically represented by entries in an array or arrays based on the account IDs. The accounts themselves do not contain data in any sense.
Why is this difference important for scaling with sharding?
Sharding is a form of distributed parallel computing. Distributed computing in general requires that the data that each shard operates on is independent of all other data on other shards. The associated code can then execute in parallel on the independent data without conflicts. If not, then some sort of dynamic distributed locking scheme for synchronization is needed. A data model that maximizes independent data increases the opportunity for distributed parallel computing.
The account-centric model of SagaChain lends itself to independent data for distributed parallel computing. Consider the following simple example:
· There are four account — A, B, C, D
· There is one transaction between A and B
· There is one transaction between C and D
Each account contains all of its own data. Consider having two shards:
· Account objects for A and B are on shard 1
· Account objects for C and D are on shard 2
The transactions can be executed completely in parallel with no concern for conflict.
Compare this with the smart contract approach:
· Four accounts — W, X, Y, Z
· All of the accounts are individual entries in an array table in a single smart contract
· There is one transaction between W and X
· There is one transaction between Y and Z
· Transactions are sent to the smart contract, not the individual accounts
If the smart contract is on a single shard, then the transactions are serialized. In that case there is no concern of synchronization, but there also isn’t any parallel execution either. If the smartcontract data is on two shards, then code that accesses the data must have synchronization “critical section” code of some kind. Implementing distributed locking requires the overhead of cross-shard communication. Further, if a smartcontract calls another smartcontract there can be more instances of cross-shard communication.
Scaling for SagaChain based on per-account independent data reduces the distributed parallel execution problem from distributed critical section code to enforcing the following rules:
· An account object may only execute on one shard at a time.
· The data for the account object must be present on the shard.
By distributing accounts among shards based on the transactions, instead of by the individual smart contracts, creates the opportunity for much higher level of distributed parallel computing. This a necessary but not sufficient requirement for enabling shard scaling.
An additional piece is needed. A mechanism, a scheduling algorithm, to distribute the accounts over the shards in a manner that maximizes the opportunity for parallel independent execution of transactions.
Account Distribution Scheduling
The critical piece to optimize the distribution of accounts over shards, which optimizes the opportunity for parallel distributed transaction execution is called “account scheduling”. An account scheduler entity assigns accounts to shards such that each account is always either exclusively assigned to a single shard, or unassigned. SagaChain terms this “account ownership” for a shard. Only an account owned by a shard may participate in a transaction. As the accounts contain their own state, independence of data between accounts is maintained allowing for parallel distributed execution over the shards.
In general, the level of difficulty of account scheduling is exponential with an increasing number of shards and an increasing number of accounts. Although SagaChain is targeting supporting 1000’s of shards, consider just 100 shards. If each shard can have account ownership of 1000 accounts, and assuming a much larger number of accounts available, and a transaction queue with transactions using various combinations of the potential accounts, determining an optimal distribution of accounts across the shards to maximize transactions executed per block per shard and minimize average transaction latency is an NP-complete problem (NP: Nondeterministic polynomial time).
What is needed is an algorithm that can assign accounts to shards in a reasonably optimal manner, knowing that an optimal assignment solution is not possible.
Fundamental Scheduling Algorithm
The following describes the basic algorithm for assignment scheduling for SagaChain accounts. Note that this does not include provisions for transaction starvation, and various other requirements to highlight the main concept: Monte Carlo method.
The account scheduling algorithm data consists of the following
· A transaction queue of submitted transactions by users
· A list of current ownership of accounts by each shard
· A list of requested new account ownerships by each shard to execute a subset of the queued transactions
· A list of the account assignments output from the account scheduling algorithm
· The designated current account scheduling node
The algorithm consists roughly of the following
· Each shard examines the transaction queue and determines any additional accounts it needs to own to execute the transactions. Shards are free to decide to request any accounts based on local selection algorithms. These are sent as lists to the current account scheduling node
· The current account scheduling node receives the list of account requests from the shard
· The following steps are executed until a stopping criteria is reached
o A starting shard is selected at random and the requests accounts listed in an assignment table
o Two additional new shards are selected at random
o Each shard is compared against the starting shard for the number of conflicting accounts in the shard request lists
o The shard with the least number of conflicts is selected
o If either shard does not have any non-conflicting accounts in its request list, the shard is skipped
o The selection is continued until all shards have been visited and selected or skipped
o At completion, a heuristic is applied to generate value reflecting the current assignment success
o A global current maximum best value is compared and updated
o The steps are executed until N number of completions without a change in the maximum best value
· After the stopping criteria is reached, the resulting account assignments are distributed to the shards to enable execution of the tansactions.
· Each shard executes the transactions given the account assignments
Avoiding Per-Shard Exhaustive Transaction Execution
The above algorithm makes the assumption that the accounts involved in a transaction are known before the transaction is executed. The execution of a transaction may touch multiple accounts dependent on the state of any and all of the accounts involved. If every shard has to execute every transaction to determine account ownership requests, sharding devolves back to serialized transaction execution. A solution to this is to have shards perform “trial executions” of transactions to determine the account requirements over a subset of the queued transactions.
There is at least one account that is always known for a transaction: the user’s account. A transaction must be submitted by a user, signed with private key for the public key in their account. This piece of information enables the prestaging by each shard. Each transaction may be sorted into the following:
· Account is owned by the current shard
· Account is owned by another shard
· Account is unowned
Once sorted, a shard first executes the transactions that it is the owner of. These may then be sorted into the following:
· All accounts in the transaction are owned by the current shard
· All accounts except the user submitted account are unowned and thus available
· Some number of accounts are owned by other shards and would need to be requested
There is a limited number of transaction slots per block. If all of the transaction slots are consumed by currently owned accounts, the shard does not need to request new accounts and does not need to perform any further trial executions.
The accounts that are unowned may be requested by the shard to fulfill the associated accounts, and the accounts that are owned by other shards may be request to be pulled from those shards. Because the shard may not receive all of the accounts requested, it may request accounts that would result in more transactions than will fit in the next block as an overcommit.
The shard does not execute the transactions with accounts owned by another shard to reduce redundant trial executions. The owning shard will execute those transactions.
Additional Account Hints
When a user knows in advance the accounts that will be touched in a transaction, that information can be included in the transaction script header as a hint to the shard. In those cases, the shard does not execute the transaction to determine accounts needed, and proceeds with making the account requests to the account scheduler.
Hidden Optimization — Common Account Relationships
Although the above scheduling algorithm will function reasonably well, there is a hidden optimization that reduces account ownership conflict between shards. Since people have circles of contacts, accounts for the people will reflect circles of other accounts they commonly come in contact with. Accounts that tend to interact with each other a lot will migrate to a single shard as a result of the scheduling algorithm. Essentially, sets of accounts can be thought of as having associations with each other, they are related. Such relations do not need to last for an extended period. The objective for the scheduling algorithm is to increase the probability that accounts that participate in a transaction are already assigned to the same shard, eliminating the need for any additional data movement.
Conclusion
The objective for scaling with sharding is to increase the opportunity for parallel distributed execution. At the heart of this is a requirement to place the data and code optimally to reduce and eliminate synchronization during execution between shards. SagaChain’s approach is to eliminate synchronization between shards by using an account-centric model. Accounts contain their own data, and reference code using the first-class object model. The account scheduling algorithm seeks to maximize the opportunity for parallel distributed execution among the shards by assigning accounts to shards, i.e. account ownership. The above provided a rough description of SagaChain’s scheduling algorithm based on the monte carlo method with heuristics for performing a random walk among the account requests by each shard. The monte carlo method is used in many NP-complete situations. The account scheduling algorithm follows and adopts this method for an otherwise intractable scheduling problem.
Finally, SagaChain’s scheduling algorithm does not eliminate the need for data to be transferred across shards. What it does is separates the instances of data transfers from the instances of transaction execution, which not only allows for parallel distributed execution, but simultaneous transfer of data for transaction queued for execution in subsequent blocks.