Pivotal GemFire XD

Deploying SQL Applications at Extreme Scale

Much of the data in the world can be described as unstructured or semi-structured, but most of the applications in the world are SQL-based. When your application data fits well into a relational model, there’s many advantages to leveraging the reporting and analytics ecosystem that has grown up around SQL standards.

These days, many such applications have become strategic in nature and need to scale to handle thousands of simultaneous operations. This can include online transactional services that now must support mobile access to high velocity streaming use cases involving Internet of things device networks. At scale, these kinds of applications generate a tremendous volume of historical information that, while not important for the operational application, would be nice to keep for future analysis.

Add in requirements for absolute consistency, low-latency response times, and mission critical availability, and you rapidly exceed the scale capabilities of traditional relational databases. This is where distributed in-memory databases such as Pivotal GemFire XD shine - serving many concurrent transactions consistently enough for financial use cases, yet fast enough for real-time analytics-based automation. With an ability to persist and archive information in Pivotal HD, you can also address future advanced analytics requirements - enabling a virtuous circle of continuous improvement and innovation.

PIVOTAL GemFire XD Features

SCALE-OUT SQL ON HADOOP FOR TRANSACTIONAL APPLICATIONS

Pivotal™ GemFire® XD is a distributed in-memory database that is designed to provide:

  • Scale-out performance
  • Consistent database operations across globally distributed applications
  • High availability, resilience, and global scale
  • Standards-based developer features and interfaces
  • Easy administration of distributed nodes

The features of Pivotal GemFire XD that help customers achieve these capabilities include:

Scale-out performance

  • In-memory storage: all operational data available in-memory to avoid disk I/O penalty
  • High-memory nodes: supports systems with memory capacity larger than JVM heap size limits
  • Elastic, linear scalability: easily scale up or down capacity to meet changes in demand
  • Optimized data distribution & processing: configure data distribution across grid to optimize speed of data access & processing

Consistent database operations for Hadoop clusters and across globally distributed applications

  • Flexible persistence: Store data in performance-optimized disk persistence, or within Pivotal HD.
  • Configurable consistency: choose consistency model supporting distributed OLTP applications to balance performance and data availability.
  • SQL query support: Supports SQL queries of data over distributed nodes that can be optimized with indexes on key values
  • Advanced analytics access: analyze archived on-disk, and in-memory data with Pivotal HAWQ via PXF

High availability, resilience, and global scale

  • Node failover: application and data access ensured in event of network split or node failure.
  • Resilient self-healing: fast node startup on reconnect, self-healing of clusters automates restoration after node failure.
  • Cluster to cluster WAN connectivity: enabling global scale of data access and multi-site capability.

Standards-based Developer Features and Interfaces

  • API’s and Standards Support: develop in any programming language that supports JDBC, Spring Data JDBC, ADO.NET, or ODBC.
  • Data type support: ANSI SQL-92 data types, table definitions, and foreign key relationships, JSON documents
  • Powerful application functions: data-aware stored procedures , SQL-compliant queries and DML statements, publish & subscribe event framework with reliable asynchronous queues for delivering events.
  • Use familiar tools: Hibernate, NHibernate, Roo, SQuirreL, IntelliJ, other JDBC-compliant tools

Easy administration of distributed clusters

  • Auto tuning and simplified cluster configuration: automatic distribution of data to optimize usage of system resources on nodes for best cluster performance
  • Simplified Cluster Configuration: configure all nodes in cluster from single fault-tolerant service
  • Cluster monitoring & data query: dashboard showing cluster & node status; view and query data in nodes
  • Performance statistics analysis: offline tool for viewing historical logs and statistics to diagnose bottlenecks
  • Command line tools: easy automation and scripting of administrative tasks via command line interface

PIVOTAL GemFire XD Technology

What Is GemFire XD?

Pivotal GemFire XD is a distributed in-memory SQL database for high scale custom applications. GemFire XD provides low latency data access to applications at massive scale with many concurrent transactions involving terabytes of operational data. Designed for maintaining consistency of concurrent operations across its distributed data nodes, Pivotal GemFire XD can support ACID transactions for massively scaled applications such as data stream analysis and processing, financial payments, and ticket sales in proven customer deployments of more than 10 million user transactions a day. With optional persistence and archival in HDFS, GemFire XD will store an extremely large, consistent database in Hadoop nodes which can be accessed for analysis by Pivotal HAWQ. Through support of standards such as JDBC, GemFire XD works with common development frameworks and reporting tools for relational data.

PIVOTAL GemFire XD Technology

Scale-out Performance

In-Memory Storage

GemFire XD stores all required data in RAM memory across distributed nodes to provide fastest access to data while eliminating the performance penalty of reading from disk.

High-Memory Nodes

GemFire XD allocates in-memory storage off heap to take advantage of hardware systems with memory capacity larger than JVM head size limits, and to provide faster performance by avoiding the Java garbage collection cycle governing memory deallocation.

Elastic, Linear Scalability

GemFire XD provides linear scalability that allows you to predictably increase capacity for number of operations per second, and data storage simply by adding additional nodes to a cluster. Data distribution and system resource usage is automatically adjusted as nodes are added or removed, making it easy to scale up or down to quickly meet expected, or unexpected, spikes of demand.

Optimized Data Distribution Across Nodes

GemFire XD will automatically optimize how data is distributed across nodes to optimize latency and usage of system resources. You can also configure partitioning and replication of data to further optimize application response time. GemFireXD will appropriately direct processing operations on data to the specific nodes where data resides in order to reduce latency and network traffic, according to the cluster configuration you set up for data distribution and replication between nodes.

PIVOTAL GemFire XD Technology

CONSISTENT DATABASE OPERATIONS FOR HADOOP CLUSTERS AND ACROSS GLOBALLY DISTRIBUTED APPLICATIONS

Flexible Persistence

To ensure durability of data in the event of node failure, GemFire XD writes to disk a log of all creates, updates, and deletes of data managed by a node. This log can then be read to reconstruct the last consistent state of the in-memory database on that node when a node comes back online. When persisted or archived in Hadoop, this data can be used in analytics processing with tools such as Pivotal HAWQ, and support even larger data volumes. Using the event framework, you can modify persistence behavior for purposes such as archiving historical data.

Configurable Consistency

GemFire XD is capable of providing ACID consistency across distributed nodes to support high capacity transactional applications. You can also configure consistency models for higher performance such as allowing the entire grid to cache and operate on data, or turn consistency off when your requirements case calls for speed rather than consistency.

SQL Query Support

Pivotal GemFire XD supports the ANSI SQL-92 for authoring queries. Queries are sent to the appropriate nodes that serve relevant partitions of data. Query results are then merged and sent back to the client application. Developers can define indexes on key values to improve performance. You can define key values that control distribution of data across nodes. When functions that operate on partitions of data are invoked, processing will be routed to appropriate nodes responsible for serving partitions of targeted data.

Advanced Analytics Access

Data persisted in Pivotal HD by GemFire XD can be accessed for advanced analytic processing by Pivotal HAWQ by way of Pivotal Extension Framework (PXF). This includes archived data as well as latest state active data in-memory.

PIVOTAL GemFire XD Technology

HIGH AVAILABILITY, RESILIENCE, AND GLOBAL SCALE

Node Fail Over

GemFire XD provides continuous uptime with built in high availability and disaster recovery. Multiple failure detection models detect and react to failures quickly, ensuring that the cluster is always available, and that the data set is always complete.

Resilient Self-Training

GemFire XD has self-healing automation that allows a node to quickly rejoin a cluster once it becomes operational again, with fast startup, reconnect, and incremental updates of changed data, all handled without administrator intervention.

Cluster-to-Cluster WAN Connectivity

GemFire XD allows multiple clusters to be connected via WAN gateways. This allows application data access to span across the globe, and allows companies to meet local data requirements, such as country-specific privacy regulations. WAN connected clusters also enable multi-site failover capability, ensuring ongoing availability and built-in disaster recovery in the case of catastrophic failure.

Figure 1. Example topologies of Pivotal GemFire XD deployments supporting different service level requirements of data-driven applications.

PIVOTAL GemFire XD Technology

STANDARDS-BASED DEVELOPER FEATURES AND INTERFACES

API's and Standars Support

Pivotal GemFire XD will manage data for applications in any programming language that supports JDBC, ADO.NET, or ODBC. For Java developers, GemFire XD provides support for Spring Data JDBC.

Data Type Support

GemFire XD supports structured data in relational data models with declared tables and foreign key relationships. Data types supported include those defined in the ASI SQL-92 standard. GemFire XD also supports JSON documents and custom Java types.

Powerful Application Features

GemFire XD provides powerful advanced application features to developers that want to leverage its distributed database capabilities. Like many database platforms, developers can embed and generate queries using SQL. GemFire XD provides a sophisticated event handling mechanism providing durable asynchronous queues suitable for mission critical application requirements.

Use Familiar Tools

GemFire XD, through support of JDBC and ANSI SQL, allows usage of familiar integrated development environments, app-development frameworks, business intelligence and visualization tools.

PIVOTAL GemFire XD Technology

EASY ADMINISTRATION OF DISTRIBUTED NODES

Automated Tuning

GemFire XD is built to automate administrative tasks as much as possible. This includes automating tuning of system resources between nodes in a cluster by intelligently managing the placement of data while reducing network round trips. Data gets distributed and replicated according to the cluster configuration, and requests for access are routed intelligently using the most direct path available. This data placement and resource allocation is adjusted automatically if nodes are added to, or removed from the cluster.

Simplified Cluster Configuration

Node configuration is handled centrally with automatic redundancy for high-availability. New nodes can get their configuration from the centralized configuration manager upon startup to quickly join a cluster with no additional system administration tasks.

Comprehensive Monitoring & Administration Tools

GemFire XD provides a comprehensive set of online and offline tools for monitoring and administering clusters. The online dashboard allows drill down into cluster and node status, and querying of stored data. The offline analytics tool allows diagnosis of system bottlenecks through analysis of historical statistics logging. A command line tool allows administrators to take action on clusters and nodes such as starting, stopping and configuring settings.

Flexible Deployment Options

GemFire XD runs in Java Virtual Machines in 32 and 64-bit mode on Linux and Windows operating systems. GemFire XD grids can be set up with active/active multi-site bi-directional WAN replication to enable disaster recovery, business continuity, and geographical proximity for lowest possible latency world-wide.

RELATED RESOURCES

News and events, blog posts, videos, case studies, whitepapers, and other related resources.

SEE ALL RESOURCES

Pivotal GemFire XD Datasheet


Datasheet |

Modern applications require data that is up-to-date and delivered in real-time e...


Webcast | Jun 5, 2013

In the grand scheme of all things EMC, I believe there’s a message in naming t...


May 2, 2013 | Forbes

Big data is becoming a big headache, real fast. Traditional approaches to data a...


Blog Post | May 1, 2013

Pivotal officially launched Wednesday morning with "Pivotal: A New Platform...


Blog Post | Apr 26, 2013

Featuring Paul Maritz, the Pivotal Leadership team and special guest from GE: wa...


Video | Apr 26, 2013

Then-CEO Sam Palmisano launched IBM's Smarter Planet initiative five years ...


Blog Post | Apr 25, 2013

Contact Pivotal
Pivotal Support