name Freecodecamp article
type link
action https://www.freecodecamp.org/news/a-thorough-introduction-to-distributed-systems-3b91562c9b3c/

One server in Japan and other in Australia have a connection with server in Japan. And India sends the information to and fro to both but they can’t connect with each other and thus India is the central network.

It is basically a group of machine acting to be a single machine. Example, databases are big like trillions of GB and thus needs multiple machine located at multiple locations to run it. Now user insert a records in DB, there are multiple copies of same DB, so when one fails other one serves. All DB must contain same info.

Why distribute a system?

Systems are always distributed by necessity. They allow you to scale horizontally. (means addition of new machines to your infra is easy) #Horizontal_vs_Vertical_Scaling vertical scaling means providing more power to the same machine. So we get horizontal scaling fromdistributed_systems but we also getfault_tolerance andlow_latency . Fault tolerance because information spread across is more reliable than info on a single machine. Low latency as request is routed to nearest datacenter.

Technically achieve this

Assumption:

  • Reading of DB is more than writing.

Read performance is increased byPrimary_Replica replication strategy. 2 new DB servers (both read only) → sync up with main one

New info or edit → Main server (Primary_Replica → sends info to replica servers 3x read queries can be done now (wow 3x scaling)

ACIDAtomicity,ConsistencyIsolationdurability

We lostConsistency in above approach towards creating a distributed system with horizontal scaling in that.

Now it is possible that we insert a record and then query for it immediately and it is not available because it was not updated in the replica server. You have to live with it.

How to scale write traffic?

Multi_Primary replication strategy says have multiplePrimary_Nodes which support read and write operation, this gets complex as there can be edit conflicts.Sharding comes to the rescue here. Server → Split to smaller servers calledShards AllShards hold different kind of records and theShards must have a rule as to what data goes into whichShard example: Names A-L go toShard one. Names from M-R go toShard two.

If a singleShard continously gets more request it is called hot spot and must be split up.

Pitfall: Joins become difficult or unusable.

decentralized_vs_distributed Decentralized is still distributed in the technical sense, but the whole decentralized systems is not owned by one actor. No one company can own a decentralized system, otherwise it wouldn’t be decentralized anymore.

We are now going to go through a couple of distributed system categories and list their largest publicly-known production usage.

Distributed Data Stores

Distributed Data Stores are most widely used and recognized as Distributed Databases. Most distributed databases are NoSQL non-relational databases, limited to key-value semantics. They provide incredible performance and scalability at the cost of consistency or availability.

Known Scale — Apple is known to use 75,000 Apache Cassandra nodes storing over 10 petabytes of data, back in 2015

We cannot go into discussions of distributed data stores without first introducing the CAP Theorem.

  • Consistency — What you read and write sequentially is what is expected (remember the gotcha with the database replication a few paragraphs ago?)
  • Availability — the whole system does not die — every non-failing node always returns a response.
  • Partition Tolerant — The system continues to function and uphold its consistency/availability guarantees in spite of network partitions

Here you get to choose between consistency and availability and most applications value availability more. Availability means whole system will not die together, forever.