Kinds of Non-Relational Storage
Most IT professionals are familiar with the RDBMS. Relational databases store data in tables that are defined in advance. The definition specifies the columns that comprise the table, their data types, and so forth. The design is referred to as a data model or schema.
Relational storage is immensely useful, but it is not the only game in town. There are several alternative types of data storage including:
- Key-value stores store data in associative arrays comprised of sort keys and associated data values. These flexible data structures can be distributed across nodes of commodity hardware, and are manipulated using distributed processing algorithms (map-reduce). Hadoop functions as a key-value store.
- Document stores track collections of documents that have self-defining structure, often represented in XML or JSON formats. A document store may be built on top of a key-value store. MongoDB is a document store.
- Graph databases store the connections between things as explicit data structures (similar to pointers). These are stored separately from the things they connect. This contrasts with the RDBMS approach, where relationship information is stored within the things being associated (keys within tables). Neo4j is a graph database.
In the age of big data, organizations are tapping into these forms of storage for a variety of reasons. Here are just a few:
- No model: not requiring a predefined schema enables storing new data that has yet to be explored and modeled.
- Low cost: the cost of data management can be significantly lower than RDBMS storage.
- Better fit: some business use cases are a natural fit to alternative paradigms.
Uses for Non-Relational Storage
There are six major use cases for non-relational data in modern data architectures:
- Capture Non-relational storage facilitates intake of raw data. New data sets are captured in non-relational data stores, where they become “raw material” for various uses. A non-relational data store such as Hadoop can be used to capture data without having to model it first. This is often referred to as a data lake. I prefer to call it a landing zone.
- Explore Non-relational storage facilitates exploration and discovery. Exploration is the search for value in new data sets. Exploration applies analytic methods to captured data, often combining it with existing enterprise data. The goal is to find value in the data, and to identify things that will be worth tracking on a regular basis.
- Archive Non-relational storage serves as an archive. Data for which immediate value has not been identified is moved to an archive. From here it can be fetched for future use. Archiving data helps ensure the data lake does not become the fabled “data swamp.” An alternative to archiving unused data is simply to purge it.
- Deploy Non-relational storage supports production analytics. When value is found in data, it is transitioned to a production environment and processes are automated to keep it up to date. Deployments range from simple reports to complex analytic models.
- Augment Non-relational storage serves as a staging area for the data warehouse. In many cases, the insights gained from exploration prove valuable enough to track on a regular basis. Augmentation is the process of adding elements to a relational data warehouse that come from non-relational sources or analytic processes. A credit score, for example, might be incorporated into a customer dimension.
- Extend: Non-relational storage expands what can be maintained in the data warehouse. Sometimes there is lasting value in non-relational data, but is not appropriate to migrate it to relational storage. In such cases, the non-relational data is moved to a non-relational extension of the data warehouse. Applications can link relational and non-relational data. For example, non-relational XML documents may made available for “drill down” from a dimensional cube.
Learn More
Join me for my course Data Modeling in the Age of Big Data, offered exclusively through TDWI. At the time of this writing, it is offered next at TDWI Chicago on May 11, 2017. You can also bring this course to your site through TDWI Onsite Education. For more information contact me.