Byzantine Fault Tolerance (BFT) and its significance in Blockchain world

To establish blockchain technology, the complicated puzzles that had once perplexed the world of technology were tackled. One of these hurdles was the problem of countering the Byzantine fault tolerance (BFT) problem. To understand it further, let’s see what the problem is.

Any distributed ledger technology, or blockchain technology for that matter, is based upon the basic assumption that the system in consideration has faults and should survive even when the nodes deviate from their normal behaviour. The deviation can be non-malicious, like crashing of a node or no response from a node, which can happen if the node has not received the message due to faults in communication protocol or the node being offline.

So, in terms of fault tolerance, any distributed ledger technology can be a:

Simple fault tolerant system or crash fault tolerant (CFT): This type of mechanism prevents the system from failing when the node(s) crashes or goes offline. It does not take care of the nodes behaving maliciously or arbitrarily. A simple fault tolerant system is not very useful in an uncontrolled environment.

Byzantine fault tolerant system: In a decentralized distributed system, like blockchain, the participants often communicate with each other in an uncontrolled, open, and permission-less system. Their action may vary based on their individual interests and can be malicious. A BFT complaint system ensures that it will continue to operate where the node(s) can fail or be malicious.

The Byzantine Problem

The Byzantine problem is defined in a paper published by Leslie Lamport, Robert Shotak, and Marshall Pease in 1982. The paper was named The Byzantine Generals Problem. It stated an allegory for the problems of achieving consensus in a decentralized system.

The allegory goes like this: “Before battle, the Byzantine generals, commanding different army battalions, try to decide whether they should attack or retreat. This has to achieve by passing the messages via messengers between them and coming to an agreement. However, there is a problem. There is a possibility that some of the generals and/or some messengers may be traitors to the cause. These traitorous generals and messengers may alter the message and pass the malicious response sabotaging the plans of the loyal generals. So, the loyal generals need to find a way to reach a consensus having the above information in hand.

For blockchain technology/distributed ledger technology, the above allegory can be translated to the environment where the generals can be replaced by nodes, and messengers by the connections or messaging protocols.

With time, there were attempts to provide a solution on this distributed network/blockchain-centric problem. At present, we have many solutions for distributed networks which provide the partial answer to this issue if not full. In blockchain, various consensus mechanisms (protocols to reach an agreement in a distributed system) are designed which inherently deals with the Byzantine issue.

In blockchain, various consensus mechanisms are designed which inherently deals with the Byzantine issue.

Before discussing the solutions, let’s understand how blockchain technology or distributed ledger technology systems reach a consensus and the selfsame requirements in any distributed technology system.

Consensus in a Distributed System

In any distributed network, starting with the same initial value state, the nodes should agree on a particular output value in a state of transition. This is termed as reaching a consensus. And this should be achieved where the nodes can deviate and act either non-maliciously (crashing or offline) or maliciously (Byzantine).

The process achieves consensus if:

The nodes decide on some output value
This deterministic approach confirms termination, or
The majority of nodes agree on the same output value
This confirms agreement.

Now, having understood the basics of consensus, let’s see how major blockchain consensus mechanisms are countering it. The various consensus mechanisms deployed in existing blockchain solutions inherently deal with the BFT problem, although at varied probabilities of success.

Practical Byzantine fault tolerance (PBFT), Proof of Work (PoW), and Proof of Stake (PoS) are some of the currently used major consensus algorithms that achieve BFT and are practically used in today’s blockchain systems. Let’s look briefly into PBFT and PoW in the lens of BFT.

PBFT (Practical Byzantine Fault Tolerance)

A PBFT paper published in 1999 by Miguel Castro and Barbara Liskov describes a practical algorithm for state machine replication that tolerates Byzantine faults. The algorithm offers both liveness (client finally receiving correct replies to their requests) and safety, provided:

At most ‘(n-1)/3’ nodes are faulty out of ‘n’ nodes
Delay ‘t’ does not grow faster than indeﬁnitely.

Here, delay is the time between the moment when a message is sent for the ﬁrst time and the moment when it is received by its destination (assuming the sender keeps retransmitting the message until it is received).

Explaining 3f+1

If ‘f’ nodes are byzantine failures, then the system needs to deal with two types of problems. First is not sending the message at all and second is maliciously sending a different message. So the system needs to function well after ‘(n-f)’ nodes. However, it is possible that the ‘f’ replicas that did not respond are not faulty and, therefore, the ‘f’ replicas that responded might be faulty. So, if we want the non-faulty nodes to outnumber the faulty ones; we need at least ‘(n-f)-f > f’. Therefore, ‘n > 3f+1’ is optimal.

PBFT is currently used in Hyperledger fabric along with the Kafka ordering system. Kafka provides crash fault tolerance and finality. But it is important to note that while Kafka is crash fault tolerant, it is not Byzantine fault tolerant, which prevents the system from reaching agreement in the case of malicious or faulty nodes.

Proof of Work (PoW)

Bitcoin has BFT built into its protocol. PoW solves the Byzantine Generals Problem as it achieves a majority agreement without any central authority, in spite of the presence of unknown/potentially untrustworthy parties and despite the network not being instantaneous.

Let’s see how Bitcoin works in context of the Byzantine Generals problem

All the generals have agreed to a protocol which states that a complex problem needs to be solved in order to append the attack message. The cryptographic puzzle is hard for the proposing general but easy for other generals to review. All the generals will already have the essential structural configuration which the solution should have.

Now, on the battlefield, suppose a general fabricates a plan and wants to send the message to others to attack on a particular day and time along with the plan. The steps will be as follows:

He will append a nonce to the original plan.
He will then hash the plan appended with the nonce and see the result. He will then keep adding the nonce and checking if he got the desired hash. In other words, the number of nonce or solution to the complex puzzle.
He will then send the messengers to other relevant generals. Even if the messengers get caught, any modification to their message will lead to entirely different hash, which the other generals can easily identify.
All others finally verify the plan and act accordingly.

Byzantine fault tolerance is 50% assuming zero network latency. It is around 46% (Ethereum) and 49.5% (Bitcoin) fault tolerant under actually observed conditions, but it goes down to 33% if network latency is equal to the block time and reduces to zero as network latency approaches infinity.

The solution could fail only if the malicious party captures 50% of the network power.

Conclusion

With bitcoin, Satoshi Nakomoto provided a way forward that has allowed the protagonists of the distributed system to continue breaking new ground in consensus protocol research. While PoW, PBFT, and PoS are definitely the most popular consensus mechanisms, many other consensus mechanisms like Delegated Proof of Stake (DPOS), Proof of Elapsed Time (PoET), and Directed Acyclic Graphs (DAG), among others, are emerging which caters the issue of BFT albeit with varied levels of success. We are yet to have a perfect consensus mechanism, and while it’s unlikely that we would have any, it will be interesting to look into newer consensus algorithms in the future.

PBFT, PoW, and PoS are some of the currently used major consensus algorithms that achieve BFT.

About Blockchain Services Practice at HCLTech

HCLTech Blockchain Services is a solution-driven practice built on HCLTech's core values, supported by a network of industry experts with extensive domain experience. The CoTrust Platform^TM is a key asset, offering solutions and accelerators to help enterprises advance their blockchain journey.

HCLTech provides blockchain advisory, consulting and cloud-based services using proven frameworks to design and implement blockchain programs, streamlining operations in industries like financial services, shipping and logistics. CoTrust^TM reduces time, cost and risk by offering foundational capabilities and reusable business use cases such as provenance, track and trace, payments, decentralized KYC, tokenization and clinical trial management.

HCLTech is the first to launch blockchain solutions on cloud marketplaces. The CoTrust Platform^TM is available as a SaaS for various industries and a PaaS version on AWS and GCP for quick, scalable blockchain implementation.