Kubernetes Database – How to make the right database selection?
It can be perplexing when you visit the CNCF Kubernetes landscape guide for choosing Kubernetes database. There are over 50 different products listed for you to choose from, but which database is appropriate for specific Kubernete deployment at the time you are looking at that table can be very confusing.
This article is a small effort in easing the challenge of seeing what is what and which specific Kubernetes database is your current need.
We will try to cover this challenge by divide and conquer rule.
First table in the list gives you some of the major players from the 50+ list on Kubernetes Landscape page. Look at the “Primary Use / Purpose” column to see what is your need at this moment. Once identified, browse directly to the specific section of this article to learn more about each of DBs in that specific category.
Name | Website | Primary Use / Purpose | Works With | Open Source |
---|---|---|---|---|
Apache CarbonData | carbondata.apache.org | Analytics | Big Data: Spark SQL | Yes |
Apache Druid | druid.apache.org | Analytics | JDBC | Yes |
snowflake | snowflake.com | Analytics | SQL | No |
Presto | prestodb.io | Analytics (Query Engine) | SQL | Yes |
BIGCHAINDB | bigchaindb.com | Blockchain | SQL | Yes |
Cockroach Labs | cockroachlabs.com | Distributed Processing (CloudNative) | SQL | Yes |
Apache Ignite | ignite.apache.org | In-Memory | SQL and Key-Value Store | Yes |
hazelcastIMDG | hazelcast.com | In-Memory | Multiple Languages | Yes |
Redis | redis.io | In-Memory | Redis Commands | Yes |
VoltDB | voltdb.com | In-Memory | SQL | Yes |
Apache Hadoop | hadoop.apache.org | Massive Parallel Processing | MapReduce | Yes |
Crate.IO | crate.io | Massive Parallel Processing | SQL | Yes |
ArangoDB | arangodb.com | Multi Model | AQL (DML) | Yes |
Crux | opencrux.com | Multi Model | Kafka | Yes |
Dgraph | dgraph.io | Multi Model | GraphQL | Yes |
FoundationDB | foundationdb.org | Multi Model | Multiple Languages | Yes |
InterSystems IRIS Data Platform | intersystems.com | Multi Model | SQL | No |
OrientDB | orientdb.org | Multi Model | SQL | Yes |
Apache Cassandra | cassandra.apache.org | NoSQL | Using stored Keys | Yes |
Infinispan | infinispan.org | NoSQL | Java | Yes |
mongoDB | mongodb.com | NoSQL | JSON | Yes |
Couchbase | couchbase.com | NoSQL Database | N1QL | Yes |
IBM DB2 | ibm.com/db2 | RDBMS | SQL | No |
MariaDB | mariadb.org | RDBMS | SQL | Yes |
Ms SQL Server | microsoft.com | RDBMS | SQL | No |
MySQL | mysql.com | RDBMS | SQL | Yes |
Oracle | oracle.com | RDBMS | SQL | No |
PostgreSQL | postgresql.org | RDBMS | SQL | Yes |
KubeDB | kubedb.com | Framework | Yes |
In the sections below we will provide more details and basic introduction on each of the databases listed above within each Category (Color coded “Primary Use/Purpose” column above)
Analytics Databases
Is it a good choice for AWS Kubernetes Database?
Analytics databases (also known as On-Line Analytical Processing – OLAP) systems are used to store and manage big and structured data. These systems are optimized for faster queries and provide complicated aggregate functions. These are widely used for Analytics/BI, and reporting purposes in a typical setting.
Skim through the essential features of Analytics database to see if it fits your Kubernetes database needs.
Name | Salient Features |
---|---|
Apache CarbonData | Fully indexed columnar and Hadoop native data-store for processing PetaBytes of data Multi level indexing, compression and encoding techniques targeted to improve performance of analytical queries Multi level indexing also reduces I/O scans and CPU processing Can write to S3, OBS, HDFS, and Alluxio Integrates with Big Data ecosystem Spark and Presto |
Apache Druid | Distributed system to be used for OLAP queries on streaming data and time-series Commonly used by very high volume data by companies like Netflix, AirBnB, Alibaba etc It is a very fast data base for huge volumes |
prestoDB | High performance, distributed SQL query engine for big data Can query data from where it lives, multiple sources even within same query: Hadoop, AWS S3, Alluxio, MySQL, Cassandra, Kafka, MongoDB and Teradata Targeted at analysts who expect response times ranging from sub-second to minutes Facebook uses Presto for interactive queries against several internal data stores, including their 300PB data warehouse |
Snowflake | An enterprise analytics database Designed to work on public clouds AWS, Azure You only subscribe the service. Store data in one place pay for storage and pay for compute only when you run queries Users can use ANSI SQL to do all DB operations |
Blockchain Databases
Blockchain databases are designed to keep an immutable record of all transactions. When there is a need for some data to be stored which can not be changed by anyone, you want to pick a blockchain database.
A blockchain as a database can contain any information, however, blockchains are not really good at storing vast amounts of data due to network limitations and cost, etc. In the case of the open-source cryptocurrency Bitcoin, only information such as ownership, a timestamp, and other small details are recorded in the ledger.
Name | Salient Features |
---|---|
BigChainDB | Works with MongoDB SQL as its powered by MongoDB Decentralized control via federation of nodes Data storage is immutable Design your own private network with custom assets, transactions, permissions and transparency Transaction level permission-ing |
In Memory Databases
In-memory databases can persist data on disks by storing each operation in a log or by taking snapshots. In-memory databases are ideal for applications that require microsecond response times and can have large spikes in traffic coming at any time such as gaming leaderboards, session stores, and real-time analytics.
Name | Salient Features |
---|---|
Apache Ignite | Distributed in-memory data store that delivers in-memory speed and unlimited read and write scalability to applications SQL and key-value store that supports any kind of structured, semi-structured and unstructured data. Data stored as key-value Durable, consistent, and highly available Ignite cache keeps a subset of records in memory, when required, more data is loaded into memory |
hazelcastIMDG | Provides central, predictable scaling of applications through in-memory access to frequently used data and across an elastically scalable data grid Enables you to use an unparalleled range of massively scalable data structures with your Python applications Enables the largest data sets to run efficiently in an in-memory cluster across your most popular data APIs. |
Redis | In-memory data structure store, used as a database, cache, and message broker Redis provides data structures such as strings, hashes, lists, sets, sorted sets with range queries, bitmaps, hyperloglogs, geospatial indexes, and streams Redis hash data structure store lets you store and retrieve data which makes it fast Mostly used for key-value look ups and not sql type joins |
voltDB | An In-Memory database with community edition and enterprise license Lets you access the data using SQL statements that are run as Java Stored procedures Optimized for a specific application by partitioning the database tables and the stored procedures that access those tables across multiple “sites” or partitions |
Massive Parallel Processing (Mpp)
Analytical Massively Parallel Processing (MPP) Databases are databases that are optimized for analytical workloads: aggregating and processing large datasets. MPP databases tend to be columnar, so rather than storing each row in a table as an object (a feature of transactional databases, MPP databases generally store each column as an object.
This architecture allows complex analytical queries to be processed much more quickly and efficiently. These analytic databases distribute their datasets across many machines, or nodes, to process large volumes of data (hence the name). These nodes all contain their own storage and compute capabilities, enabling each to execute a portion of the query.
For you to explore further on Kubernetes databases, check out the table listed below.
Name | Salient Features |
---|---|
Apache Hadoop | Allows for distributed storage and distributed computing (Massive Parallel Processing – MPP) Efficiently store and process large datasets ranging in size from gigabytes to petabytes of data Achieves fault tolerance by replicating the blocks on the cluster File system is HDFS and processing is MapReduce processing model Hadoop cluster includes a master and multiple worker nodes |
Crate.io | Database purpose-built for machine data, with a unique architecture designed for machine data use cases Use SQL to process, aggregate and join data Distributed SQL query engine features columnar field caches, and a more modern query planner, which makes joins and aggregates real fast Automatic replication of data across cluster make it easy when a disaster hits Real-Time data ingestion, i.e. read massive data while you ingest data at large scale Time series analysis is made fast and easy with automatic table partitions |
Multi Model
Most database management systems are organized around a single data model that determines how data can be organized, stored, and manipulated.
In contrast, a multi-model database is designed to support multiple data models against a single, integrated backend. Document, graph, relational, and key-value models are examples of data models that may be supported by a multi-model database.
Name | Salient Features |
---|---|
ArangoDB | Unlike many NoSQL databases, ArangoDB is a native multi-model database (key/value pairs, graphs or documents) Use ArangoDB when your application design will grow with time, you want to stay flexible Multiple teams can create objects as they need, e.g. Graph, key-value, or Document Reduces the complexity of the technology stack for your application or usage A native multi-model database allows you to have polyglot data without the complexity |
crux DB | Crux is bitemporal, document-centric, schemaless, and designed to work with Kafka as an “unbundled” database. Droadly useful for event-based architectures and is a critical requirement for systems in any industry with strong auditing regulation Supports a Datalog query interface for traversing graph relationships across your documents. Can run in distributed and non-distributed modes |
dgraph | Dgraph is an open source, fast, and distributed graph database written entirely in Go Horizontally scalable transactional graph database with fast arbitrary-depth joins using a GraphQL-like query language. Common uses of graph databases are master data management, recommendation engines, etc. Fast data retrieval for connected data |
foundationDB | Distributed architecture that gracefully scales out, and handles faults while acting like a single ACID database Provides amazing performance on commodity hardware, allowing you to support very heavy loads at low cost. Stores each piece of data on multiple machines according to a configurable replication factor |
InterSystems IRIS Data Platform | All data in an InterSystems IRIS database is stored in efficient, tree-based sparse multidimensional arrays Highly available, saleable, and resilient |
OrientDB | Combines the power of graphs and the flexibility of documents into one scalable, high-performance operational database Open source but enterprise version is also available Written in Java and has a very small server distribution (2mb) Its fast, Stores up to 120,000 records per second |
NoSQL
A NoSQL (originally referring to “non-SQL” or “non-relational”) database provides a mechanism for storage and retrieval of data that is modeled in means other than the tabular relations used in relational databases.
Such databases have existed since the late 1960s, but the name “NoSQL” was only coined in the early 21st century, triggered by the needs of Web 2.0 companies. NoSQL databases are increasingly used in big data and real-time web applications. NoSQL systems are also sometimes called “Not only SQL” to emphasize that they may support SQL-like query languages or sit alongside SQL databases in polyglot-persistent architectures.
Below listed four database softwares can help you choose your kubernetes database.
Name | Salient Features |
---|---|
Apache Cassandra | Distributed, wide-column store, NoSQL database management system Can copy data to multiple sites for stronger disaster recovery and business continuity Supports very heavy load applications, likes of Facebook and Netflix |
Couchbase | Cloud NoSQL database JSON Store and uses N1QL language to retrieve data Specialized to provide low-latency data management for large-scale interactive web, mobile, and IoT applications Simple, uniform and powerful application development APIs across multiple programming languages Couchbase documents are JSON, a self-describing format capable of representing rich structures and relationships |
Infinispan | A distributed cache and key-value NoSQL data store software developed by Red Hat. Get to your data from multiple protocols and data formats Ensure data is always available to meet demanding workloads. Clustered processing makes is faster processing data in real time |
mongoDB | MongoDB is a scalable, flexible NoSQL document database platform It is the leading global cloud database service for modern applications It provides developers with a number of useful out-of-the-box capabilities, whether you need to run privately on site or in the public cloud |
RDBMS
Most of us are very familiar with Relational Database Management Systems (RDBMSs) as they have been around forever. RDBMSs facilitate storage and retrieval of data for many companies and business.
We will provide some of the key players in this area for selecting kubernetes database.
Name | Salient Features |
---|---|
IBM DB2 | Transparently compress data to decrease disk space and storage infrastructure requirements Greatly reduce the cost and risk of moving legacy applications to Db2. This means you can use your existing skills and assets for quicker, easier migrations You can take advantage of in-memory columnar technology as well as parallel vector processing, data skipping, and data compression |
MariaDB | Its speed is one of its most prominent features MariaDB is remarkably scalable, and is able to handle tens of thousands of tables and billions of rows of data It can also manage small amounts of data quickly and smoothly, making it convenient for small businesses or personal projects |
Microsoft SQL Server | Gain insights from all your data by querying across your entire data estate – SQL Server, Azure SQL Database, Azure SQL Data Warehouse, Azure Cosmos DB, MySQL, PostgreSQL, MongoDB, Oracle, Teradata, HDFS, and others – without moving or replicating the data Build a shared data lake by combining both structured and unstructured data in SQL Server and accessing the data using either T-SQL or Spark Use SQL Server with Windows and Linux containers, plus deploy and manage your deployments using Kubernetes |
MySQL | Most popular Open Source SQL database management system, is developed, distributed, and supported by Oracle Corporation Designed to be fully multithreaded using kernel threads, to easily use multiple CPUs if they are available Executes very fast joins using an optimized nested-loop join. Implements in-memory hash tables, which are used as temporary tables |
Oracle | One of the most widely used database in the world. Trusted by almost every major organization in the world Connects with major Operating Systems including Linux (ODBC) includes performance optimizations for commonly used features such as LOBs, PL/SQL, and Index Organized Tables |
PostgreSQL | PostgreSQL is a powerful, open source object-relational database system with over 30 years of active development PostgreSQL runs on all major operating systems, has been ACID-compliant since 2001 Developers can build applications, administrators can protect data integrity and build fault-tolerant environments, and manage your data no matter how big or small the dataset |
Frame Work
At this point we are only listing one item in this category. These systems help you write code that makes it easy to deploy Kubernetes database.
Name | Salient Features |
---|---|
KubeDB | KubeDB is a framework for writing operators for any database that support the following operational requirements Create a database declaratively using CRD Take one-off backups or period backups to various cloud stores, eg,, S3, GCS, etc Restore from backup or clone any database Currently KubeDB includes support for following datastores: Postgres Elasticsearch MySQL MongoDB Redis Memcached |
Conclusion
As you can see choosing a Kubernetes database for a system is a task that requires lots of due diligence and research. Once you have completed the research, implementation of a specific technology is very well documented by their respective vendors.
High Plains Computing team is always ready to help with any Kubernetes related product review, select Kubernetes database architecture design, or design review for your project needs.
We have a team of seasoned CKA admins with many years of cloud native technology experience. You can take a look at our wide range of Kubernetes service.
Committed to delivering the best
Thousands of AWS and CNCF-certified Kubernetes solution partners have unique expertise and focus areas. Our focus is on best practices in security, automation, and excellence in Cloud operations.
Please reach out to us if you have any questions.