Recommendations on Failure Domains
-
For the flexibility of Parallels Cloud Storage allocator and rebalancing mechanisms, it is always recommended to have at least 5 failure domains configured in a production setup (hosts, racks, etc.). Reserve enough disk space on each failure domain so if a domain fails it can be recovered to healthy ones.
-
When MDS services are created, the topology and failure domains must be taken into account manually. That is, in multi-rack setups, metadata servers should be created in different racks (5 MDSes in total).
-
At least 3 replicas are recommended for multi-rack setups.
-
Huge failure domains are more sensitive to total disk space imbalance. For example, if a domain has 5 racks, with 10 TB, 20 TB, 30 TB, 100 TB, and 100 TB total disk space, it will not be possible to allocate (10+20+30+100+100)/3 = 86 TB of data in 3 replicas. Instead, only 60 TB will be allocatable, as the low-capacity racks will be exhausted sooner, and no 3 domains will be available for data allocation, while the largest racks (the 100TB ones) will still have free space
-
If a huge domain fails and goes offline, Parallels Cloud Storage will not perform data recovery by default, because replicating a huge amount of data may take longer than domain repairs. This behavior managed by the global parameter
mds.wd.max_offline_cs_hosts
(configured with
pstorage-config
) which controls the number of failed hosts to be considered as a normal disaster worth recovering in the automatic mode
-
Failure domains should be similar in terms of I/O performance to avoid imbalance. For example, avoid setups in which
failure-domain
is set to
rack
, all racks but one have 10 Nodes each and one rack has only 1 Node. Parallels Cloud Storage will have to repeatedly save a replica to this single Node, reducing overall performance
-
Depending on the global parameter
mds.alloc.strict_failure_domain
(configured with
pstorage-config
), the domain policy can be strict (default) or advisory. Tuning this parameter is highly not recommended unless you are absolutely sure of what you are doing.
|