Configuring Failure Domains
A failure domain is a set of services which can fail in a correlated manner. Due to correlated failures it is very critical to scatter data replicas across different failure domains for data availability. Failure domain examples include:
-
A single disk (the smallest possible failure domain). For this reason, Parallels Cloud Storage never places more than 1 data replica per disk or chunk server (CS).
-
A single host running multiple CS services. When such a host fails (e.g., due to a power outage or network disconnect), all CS services on it become unavailable at once. For this reason, Parallels Cloud Storage is configured by default to make sure that a single host never stores more than 1 chunk replica (see
Defining Failure Domains
below).
-
Larger-scale multi-rack cluster setups introduce additional points of failure like per-rack switches or per-rack power units. In this case, it is important to configure Parallels Cloud Storage to store data replicas across such failure domains to prevent data unavailability on massive correlated failures of a single domain.
|