You are here

Pentaho Big Data Capabilities


Modern, Fully Integrated Big Data Analytics Platform

Pentaho provides a complete set of data preparation, data discovery and predictive analytics capabilities that provide deep and native support for all leading big data sources, traditional relational sources and many other data stores critical to business.

Data preparation and modeling

For IT and developers, Pentaho provides a complete visual design environment to simplify and accelerate data preparation and modeling. The rich visual interfaces support for loading, extracting, integrating and transforming data within big data stores.

  • Load and extract – Visual input steps provide an easy way to load data into and extract data from any kind of big data store including Hadoop, NoSQL databases and analytic databases. This includes native interfaces to HDFS, MapReduce and Hive for Hadoop, and native parallel bulk loader utilities for many of the leading analytic databases.

  • Integrating – Visual steps provide a fast and easy way to merge data from multiple sources, both big data sources and traditional sources. A rich library of transformation logic, data consistency and performance optimizations, and the ability to cache lookup data into memory is available.

  • Transforming – Pentaho provides an extensive library of data transformation capabilities, including calculations, string substitution, splitting fields, mapping values and more. These visual transforms can be used to meet the needs of even the most complex data processing requirements.

Job orchestration

Pentaho provides an intuitive visual user interface for orchestration of data processing and data integration jobs for all big data stores (Hadoop, NoSQL, analytic databases) as well as traditional relational databases and other data stores. In addition, Pentaho’s job orchestration and workflow capabilities interoperate with solution-specific tools, leveraging previous investment in these tools. Learn more about big data integration

Instant and interactive reporting and dashboards

Pentaho can be directly connected to any big data store to provide instant ‘friction-less’ reporting, without having to extract data and load it anywhere else, use proprietary data sampling techniques or use the lowest common denominator connection methods (i.e. Hive for Hadoop) used by other tools. The Pentaho report designer provides a rich intuitive graphical interface for designing even the most complex and sophisticated reports.

Interactive visualization and exploration of big data

Pentaho provides interactive analysis and visualization of large volumes of data at a single glance and the ability to explore data to find valuable patterns and anomalies. Visualization types include geo-maps, heat grids and scatter/bubble charts. Interactive capabilities enable drill down into supporting reports and dashboards, as well as extreme-scale in-memory data caching for speed-of-thought analysis with large data volumes. Learn more about big data analytics


Instaview, Pentaho’s big data analytics application, dramatically reduces the time required for data analysts to discover, visualize and explore large volumes of diverse data. With Instaview, data scientists and data analysts can move from data to analytics in three simple steps. Instaview automatically turns raw and unstructured data into self-service, analytic-ready data sets. Instaview simplifies, groups, sorts and aggregates large volumes of unruly data without requiring the help of IT or developers. Learn more: Pentaho Instaview


Legal Notices | Privacy Policy

Copyright © 2005 - 2014 Pentaho Corporation. All Rights Reserved