Top

New Data Loading Technology from Greenplum Offers Breakthrough Speeds For Large-Scale Data Warehousing

March 16, 2009 by  

Webmaster NewsLondon, UK. – Greenplum, a leading provider of database software for the next generation of data warehousing and analytics, today announced new technology designed to accelerate data loading for companies dealing with exponential data growth. Greenplum’s new “MPP Scatter/Gather Streaming” (SG Streaming) technology eliminates the bottlenecks associated with other approaches to data loading, enabling lightning-fast flow of data into the Greenplum Database for large-scale analytics and data warehousing. Greenplum customers are achieving production loading speeds of over four terabytes per hour with negligible impact on concurrent database operations.

Greenplum utilises a “parallel-everywhere” approach to loading in which data flows from one or more source systems to every node of the database without any sequential choke points. This differs from traditional “bulk loading” technologies, used by most mainstream database and MPP appliance vendors that push data from a single source, often over a single or small number of parallel channels, and result in fundamental bottlenecks and ever-increasing load times. Greenplum’s approach also avoids the need for a “loader” tier of servers, as required by some other MPP database vendors, that can add significant complexity and cost while effectively bottlenecking the bandwidth and parallelism of communication into the database.

“The loading capabilities of this database are remarkable,” said Brian Dolan, Director of Research Analytics at Fox Interactive Media. “We’re loading at rates of four terabytes an hour, consistently.”

“We need our data warehouse to store a near real-time view of our data,” said Muhammad Buldansyah, Deputy President Director, Bakrie Telecom. “Greenplum Database allows us to directly integrate operational data from many sources including those external to the database but with the ease of use of tables inside the database.”

Greenplum’s SG Streaming technology ensures parallelism by “scattering” data from all source systems across 100s or 1000s of parallel streams that simultaneously flow to all nodes of the Greenplum Database. Performance scales with the number of Greenplum Database nodes, and the technology supports both large batch and continuous near-real-time loading patterns with negligible impact on concurrent database operations. Data can be transformed and processed in-flight, utilising all nodes of the database in parallel, for extremely high-performance ELT (extract-load-transform) and ETLT (extract-transform-load-transform) loading pipelines. Final “gathering” and storage of data to disk takes place on all nodes simultaneously, with data automatically partitioned across nodes and optionally compressed. This technology is exposed to the DBA via a flexible and programmable “external table” interface and a traditional command-line loading interface.

“Greenplum’s impressive loading speeds make it the perfect complement to GoldenGate’s Real-time CDC solution,” said Anthony Brooks-Williams, senior director of business development with GoldenGate Software. “Now customers can capture changes from multiple domains without the data warehouse becoming a bottleneck.”

“Companies facing growing data volumes need to be able to maintain or shrink their loading windows so they can feed rapid streams of real-time data into the database and query them with low latency,” said Luke Lonergan, CTO, Greenplum. “Greenplum’s SG Streaming was designed to meet the needs of these companies. Given the data loading speeds our customers are reporting using this new technology in real-world, production environments, we believe that Greenplum now offers the fastest data loading technology in the industry.”

SG Streaming technology is available immediately with Greenplum Database. It is included at no extra charge to Greenplum customers.

About Greenplum
Greenplum is a database software company that is reinventing how companies gain insight and competitive advantage from their data. The company’s flagship product, Greenplum Database, is built to support the next generation of data warehousing and large-scale analytics processing. Supporting SQL and MapReduce parallel processing, Greenplum Database offers industry-leading performance at a low cost for companies managing terabytes to petabytes of data. Greenplum Database is used by major global organizations including NASDAQ, NYSE Euronext, Reliance Communications, Skype and Fox Interactive Media/MySpace. Greenplum partners with Sun Microsystems to power the Sun Data Warehouse Appliance. For more information visit www.greenplum.com.

Research, evaluate and learn more about Blog Hosting at FindMyHost Webmaster Toolbox.

Be Sociable, Share!

Comments

Comments are closed.

Bottom