Previous page

Next page

Locate page in Contents

Print this page

Failed Chunk Servers

If a chunk server in your Parallels Cloud Storage cluster fails, you need to identify the cause of failure and choose a correct way to solve the problem.

Do the following:

  1. Run the pstorage top command. For example:

    # pstorage -c pcs1 top

  2. Press i to cycle to the FLAGS column in the chunk server section and find the flags corresponding to the failed CS.
  3. Find the shown flags in the table below to identify the cause of failure and the way to solve the problem.

Flag

Issue

What to do

H

An I/O error.

The disk on which the chunk server runs is broken.

Check the disk for errors. If the disk is broken, replace it and recreate the CS as described in Replacing Disks used as Chunk Servers . Otherwise, contact technical support.

h

A chunk checksum mismatch.

Either the chunk is corrupt or the disk where the chunk is stored is broken.

Check the disk for errors. If the disk is broken, replace it and recreate the CS as described in Replacing Disks used as Chunk Servers . Otherwise, contact technical support.

S

The CS journal stored on a journalling SSD is not accessible.

Either the journal is corrupt or the journalling SSD is broken.

Check the journalling SSD for errors. If the disk is broken, replace it as described in Failed Write Journalling SSDs .

s

The chunks' checksums stored on a caching SSD are not accessible.

Either the checksums are corrupt or the caching SSD is broken.

Check the caching SSD for errors. If the disk is broken, replace it as described in Failed Data Caching SSDs .

R

The path to the chunk repository is invalid on CS start.

The disk on which the chunk server runs is not attached or mounted.

Make sure the disk is attached and correctly mounted. Make sure the disk's entry in /etc/fstab is correct.

T

An I/O request timeout.

The disk may only be inaccessible for some reason and not necessarily broken.

Make sure the disk is attached and check dmesg output for I/O request timeout messages to find out why the disk might be inaccessible.