Storage Engine

Concept

Redcurrant allows fine grain control over data placement. Either @store time, or during the lifecycle, data can be located precisely wherever the user or admin decides. Storage engine is a way to enforce storage policies with 3 main topics:

  • Storage class defines storage type/capabilities in the form of a label and a URL for its location
  • Data security algorithm defines the software protection applied to the content
  • Data treatment defines what should be applied to chunk before being written to disk

Storage class

Storage class is a couple of (storage_label,storage_url) configured for each storage service (aka RAWX) and described the storage device used.
The storage label is a simple label in string format, and must describe the storage device used. It could be SATA, SSD, HIGH_END_NETAPP or whatever
The storage_url is a URL in dot '.' notation that describes the physical location of the storage device. The length of this string must be the same accross a namespace and must take physical information into account, like datacenter location, room name, rack number, server number, disk number … For example: datacenter-paris.room6.rack42.server10.disk1

Data security

Data security defines precisely what kind of software data protection should be applied to contents. Administrator can rely on high end storage device with hardware RAID and clustering (like Netapp, Hitachi, Fujitsu, EMC …) or on basic disk in standard server. In the latter case, software RAID can be applied by Redcurrant. Data security algorithms are:

  • NONE: expect data loss except if high end storage is used
  • DUPLICATION : software RAID 1, multiple times
  • RAIN : software RAID 6

For DUPLICATION, two parameters are tunable:

  • Number of replicas
  • Minimum distance between replicas. Distance is computed from RAWX storage_url and allow the administrator to take physical design into account, like power distribution
DUPONEFIVE=DUP:distance=1|nb_copy=5

For RAIN, three parameters are tunable:

  • Number of data blocks
  • Number of parity blocks: this will define the maximum loss allowed
  • Minimum distance between blocks

Distance

Distance between storage services (RAWX) is computed with their storage_url. Each item of the url has a weight in power of 2, as described in this sample composed of 4 items:
dc.room.rack.server = 8.4.2.1.
Starting from the end, 2^^0, 2^^1, 2^^2, 2^^3 …
The distance is operated with string comparison between two urls. The shortest prefix that matches both urls is the distance:

distance(dc1.room1.rack1.server1,dc1.room1.rack2.server1) = 4

dc1.room1 is the shortest prefix. room1 item of the url has a weight of 4.

Refer to the following picture

Data treatment

Data treatment is the final treatment that is applied to chunk before being written. Data treatment is important for the low level aspect of storage. As an example, one of the data treatments available is compression.


User Tools