The Four Core Competencies of iRODS
Data Virtualization
Data stored in iRODS is typically accessed through an iRODS client. iRODS clients present files as Data Objects organized into Collections. For the most part, there is little difference between Data Objects and files, and between Collections and subdirectories. However, there are a couple of important distinctions:
• Collections make no reference to the physical storage path. It is possible for two Data Objects in a Collection to be stored in different physical locations.
• A Data Object may refer to multiple Replicas. Replicas are exact copies of a file, located in multiple physical locations.
Data Objects and Collections are stored in Storage Resources in an iRODS Zone.
Each Storage Resource has a name (the Resource’s logical representation) and a hostname and path (the physical representation of the Resource, where files are kept). The hostname is the network name of the device that serves the data, and the path is the local file system path or object storage bucket that holds the data.
Data Discovery
This information about data, called metadata, is extremely useful for Data Discovery, locating relevant data within large data sets. Data Object metadata includes rich, user-defined metadata in addition to traditional system metadata, such as filename, file size, and creation date. This rich metadata allows data to be identified by characteristics such as author names, keywords, case ID, and content type.
Rich metadata can include whatever descriptors you choose to apply to your data. Rich metadata can also be applied to Collections, Users, Resources, and other iRODS Zones. The entire iRODS catalog for a Zone is contained in a relational database. Currently, that database must be hosted in a PostgreSQL, MySQL, or Oracle database management system.
Workflow Automation
Each iRODS Server runs a Rule Engine that is an event-triggered background process. The Rule Engine is programmed using iRODS Rules, which specify what actions should be triggered when iRODS initiates a particular system activity.
iRODS event triggers are called Policy Enforcement Points (PEPs). Consider, for example, a rule to transfer ownership of data objects to the project manager when a user is deleted; the trigger — or PEP — is the deletion of the user. Similarly, rules could be written to extract metadata or pre-process data whenever a file is uploaded to an iRODS Resource.
Chaining rules and PEPs allows you to create powerful, customized workflows that save time and prevent human error. Complex multi-step scientific processes can be tightly managed and automated by keeping thorough records of ongoing status and other lab information, and only alerting humans when necessary. Organizational data management policy can be captured in an automated, auditable fashion using iRODS rules.
Secure Collaboration
Even in fields where data may not be published, it is usually necessary to share data sets between multiple workgroups. However, as data sets grow beyond several gigabytes, it becomes difficult to impossible to move the data between locations. iRODS provides Secure Collaboration through three technologies:
Tickets, Permissions, and Federation.
• iRODS Tickets provide controlled public access to Data Objects and Collections. The owner of a Data Object or Collection can create a Ticket and share it with non-iRODS users to grant them read or write access. Tickets can be revoked, and they can be set to automatically expire upon a specified date and time or a specified number of reads or writes.
• iRODS Permissions are analogous to UNIX file system permissions. The owner of a Data Object or Collection can assign read or write access for any number of defined iRODS Users and Groups. Group membership is defined by the administrator(s) of a Zone.
• iRODS Federation extends data sharing and publication beyond a single Zone. In a Federated deployment, once the administrators of two iRODS Zones share a set of keys, the owner of a Data Object or Collection can assign read and write permissions to users from outside Zones. When reading or writing data, the transfer mechanism is analogous to that for a single Zone. Unless the file is very small, iRODS servers broker a connection between the server containing the data and the client requesting it. As a result, Federation enables high performance access to data stored in any other iRODS Zone.