stributed data systems.CHAPTER 1 Reliable, Scalable, andMaintainable ApplicationsThe Internet was done so well that most people think of it as a natural resource like thePacific Ocean, rather than something that was man-made. When was the last time a tech‐ nology with a scale like that wa show annotation

Chapter 1

xLoading annotations…CHAPTER 2 Data Models and Query Languages The limits of my language mean t show annotation

Chapter 2

Loading annotations…CHAPTER 3S torage and Retrieval Wer Ordnung hält, ist nur zu fau show annotation

Chapter 3

: Storage and RetrievalCHAPTER 4 Encoding and Evolution Everything changes and nothing s show annotation

Chapter 4

ril 9, 2014.Summary | 143PART II Distributed Data For a successful technology, rea show annotation

Second Part

tems (HotOS),May 2015.CHAPTER 5 Replication The major difference between a t show annotation

Chapter 5

ch faults in Chapter 8.CHAPTER 6 Partitioning Clearly, we must break away from show annotation

Chapter 6

1900000036Summary | 219CHAPTER 7 Transactions Some authors have claimed that g show annotation

Chapter 7

e Faults” on page 304).CHAPTER 8 The Trouble with Distributed Sys temsHey I just met youThe networ show annotation

Chapter 8

ith Distributed SystemsCHAPTER 9 Consistency and Consensus Is it better to be alive and wro show annotation

Chapter 9

2-11294-2_6Summary | 383PART III Derived Data In Parts I and II of this book, show annotation

Third Part

cations in the future.CHAPTER 10 Batch Processi ngA system cannot be successful show annotation

Chapter 10

ndertake their own experiments.— Donald Knuth In the first two parts of this b show annotation

There is a DSA book of this guy in my wishlist.

r 10: Batch ProcessingCHAPTER 11 Stream Process ingA complex system that works i show annotation

Chapter 11

1, 2016.Summary | 487CHAPTER 12 The Future of Data Systems If a thing be ordained to anothe show annotation

Chapter 12

e, whendiscussing storage eng ines in Chapter 3, we saw log-structured storage, B-trees, andcolumn-oriented stora ge. When discussing replicati show annotation

Various DS used in Databases.

r 12: The Future of Data Systems Glossary Please note that the definit show annotation

ounded.transaction558 | Glossary Index Aaborts (transactions), 222, 224 show annotation

Detailed content list

, 366590 | IndexAbout the Author Martin Kleppmann is a researcher in distributed show annotation

Author of this book

75%100%125%150%200%300%400%PART IFoundations of Data Systems The first four chapters go show annotation

First part

in some cloud platforms such as Amazon Web Services (AWS) it is fairly common for virtual machine instances tobecome unavailable without warning [7], as the platforms are designed to prioritizeflexibility and elasticityi over single-machine reliabil ity.Hence there is a move toward show annotation

How is there benefit of flexibility and elasticity here but compromising reliability?

aults [5]. Examples include:• A software bug that causes every instance of an application server to crash whengiven a particular bad input. For example, consider the leap second on June 30,2012, that caused many applications to hang simultaneously due to a bug in theLinux kernel [9].• A runaway process that use show annotation

What are some other famous bugs related to time?

hing” and discourage “the wrong thing.” However, if the interfaces are toorestrictive people will work around them, negating their benefit, so this is a trickybalance to get right .• Decouple the places where peo show annotation

If the rules are too strict like one sonarqube rules right now and people don’t agree with them, they will find way to escape our trick them like we do now.

Can this be used to create requirement for a language like TQL in Galore?

arise when data is distributed. Replication Versus Partitioning There are two common ways data i show annotation

Q. Can partitioning help in any way with hardware failure, like replication do?

cation in Chapter 5.Partitioning Splitting a big database into smaller subsets called partitions so that different par‐titions can be assigned to different nodes (also known as sharding). We discusspartitioning in Cha show annotation

Partitioning is required for Sharding.

e of its parts have failed( and thus increase availability )• To scale out the number show annotation

Replication increases availability

Availability

sed (master–slave) replication.T his mode of replication is a built-in feature of many relational databases, such asPostgreSQL (since version 9.0), MySQL, Oracle Data Guard [2], and SQL Server’sAlwaysOn Availability Groups [3]. It is also used in some nonrelational databases,including MongoDB, RethinkDB, and Espre sso [4]. Finally, leader-base show annotation

Leader follower approach is followed in all these widely used databases.

gh availability. Fortu‐nately, setting up a follower can usually be done without downtime. Conceptually,the proc show annotation

tion in the leader’s replication log. That position has vari‐ous names: for example, PostgreSQL calls it the log sequence number, andMySQL calls it the binlog coordin ates.4. When the follower has pr show annotation

These points tell where the replication file was created upto. So rest of the data after that can be retrieved

me mostly a mar‐keting term.( Systems that do not meet the ACID criteria are sometimes called BASE, whichstands for Basical show annotation

o it can safely be retried.The ability to abort a transaction on error and have all writes from that transactiondiscarded is the defining feature of ACID atomicity. Perhaps abortability show annotation

y different categoriesof tools. Although a database and a message queue have some superficial similarity—both store data for some time—they have very different access patterns, which meansdifferent performance characteristics, and thus very different implementat show annotation

gapplication code.For example, if you have an application-managed caching layer (using Memcachedor similar), or a full-text search server (such as Elasticsearch or Solr) separate fromyour main database, it is normally the application code’s responsibility to keep thosecaches and indexes in sync with the main database. Figure 1-1 gives a glimpse of wh show annotation

of tricky questions arise. How doyou ensure that the data remains correct and complete, even when things go wronginternally? How do you provide consistently good performance to clients, even whenparts of your system are degraded? How do you scale to handle an increase in load?What does a good API for the service look like? There are many factors that may show annotation

xploring ways of thinking about reliability, scalability, and maintainability . Then, inthe following chapte show annotation

Three pillars.

y unauthorized access and abuse. If all those things together mean “working correctly,” then we can understand relia‐bility as meaning, roughly, “continuing to work correctly, even when things gowrong.” The things that can go wrong are show annotation

that budget item approved. So it only makes sense to talkabout tolerating certain types of faults .Note that a fault is not the sa show annotation

Maintainable Applicationsspace— good luck getting that budget item approved. So it only makes sense to talkabout tolerating certain types of faults .Note that a fault is not the sa show annotation

when you have a lot of machines. Hard disks are reported as having a mean time to failure (MTTF) of about 10 to 50years [5, 6]. Thus, on a storage cluster with 10,000 disks, we should expect on averageone disk to die per day. Our first response is usuall show annotation