Out of Disk Space
When very little free disk space remains in a Parallels Cloud Storage cluster, it is critically important to increase it as soon as possible by adding more chunk servers or removing some data. As soon as 95% of cluster disk space becomes occupied, the allocation of new data chunks is no longer possible and such requests are blocked until the cluster can satisfy the demand. As a result, user I/O becomes blocked as well, effectively freezing Containers and virtual machines.
Note:
It is highly recommended to keep at least 10% of disk space free for recovery in case of host machine failures. You should also monitor usage history, for example, using the
pstorage top
or
pstorage get-event
commands (for more information, see
Monitoring Parallels Cloud Storage Clusters
).
Symptoms
-
Stuck I/O or unresponsive mount point,
dmesg
messages about stuck I/O, frozen Containers and virtual machines.
-
pstorage top
and
pstorage get-event
show error messages like "Failed to allocate X replicas at tier Y since only Z chunk servers are available for allocation".
Solutions
-
Remove any unnecessary data to free disk space.
Note:
Additional effect which may surprise at first is that as soon as I/O queues in the kernel are full with the blocked I/O, a mount point on the client machine may stuck responding altogether and no longer be able to service the requests even such as file listing. In this case an additional mount point can be created to list, access and remove the unneeded data.
-
Add more Chunk Servers on unused disks (see
Setting Up Chunk Servers
).
If the solutions above are not possible, you can use one of the following
temporary
workarounds:
-
Lower the replication factor for some of the least critical user data (see
Configuring Replication Parameters
). Remember to revert the changes afterwards.
-
Reduce the allocation reserve. For example, for cluster
pcs1
:
# pstorage -c pcs1 set-config mds.alloc.fill_margin=2
where
mds.alloc.fill_margin
is the percentage of reserved disk space for CS operation needs (the default value is 5). Remember to revert the changes afterwards.
|