Pentaho Big Data Community
Immediate Access to All Things Pentaho Big Data
The Pentaho Big Data Community provides a collection of documentation, how-tos, best practices, use cases and forums for companies considering using Pentaho as the go-to technology to power their big data strategy.
Visit the Big Data Community
Single source to big data tools and resources
The community, designed for developers, database administrators, analysts and data scientists, provides the tools you need to operationalize your big data.
Download Pentaho Kettle for big data
Our open source community project provides an easy way to operationalize your big data. Big data capabilities include the ability to input, output, manipulate and report on data using the following Hadoop and NoSQL stores: Cassandra, Hadoop HDFS, Hadoop MapReduce, Hadapt, HBase, Hive, HPCC Systems and MongoDB.
Connect with the Pentaho Community and technical developer network
Access forums, the community irc.freenode.net ##pentaho to share best practices, use cases, work-arounds and insights you encounter while working with Pentaho.
Access the how-tos to get started
Access our collection of technical guides, articles and videos when integrating Pentaho with:
Hadoop
How to:
- Configure Pentaho for Cloudera and Other Hadoop Versions
- Loading Data into a Hadoop Cluster, HDFS, Hive, HBase
- Transforming Data within a Hadoop Cluster using Pentaho MapReduce, Hive, and Pig
- Using Pentaho MapReduce to Parse Weblog Data, Generate an Aggregate Dataset
- Transforming Data within Hive and with Pig
- Extracting Data from the Hadoop Cluster
- Extracting Data from HDFS to Load an RDBMS
- Extracting Data from Hive to Load an RDBMS
- Extracting Data from HBase to Load an RDBMS
- Reporting on Data in Hadoop, HDFS File Data, Reporting on HBase Data, Reporting on Hive Data
- Unit Test Pentaho MapReduce Transformation
- Advanced Pentaho MapReduce
- Using Compression with Pentaho MapReduce
- Using a Custom Partitioner in Pentaho MapReduce
- Using a Custom Input or Output Format in Pentaho MapReduce
MapR
How to:
- Configure Pentaho for MapR
- Loading Data into a MapR Cluster, CLDB, MapR Hive, MapR HBase
- Transforming Data within a MapR Cluster
- Using Pentaho MapReduce to Parse Weblog Data in MapR
- Using Pentaho MapReduce to Generate an Aggregate Dataset in MapR
- Transforming Data within Hive in MapR
- Transforming Data with Pig in MapR
- Extracting Data from the MapR Cluster
- Extracting Data from CLDB to Load an RDBMS
- Extracting Data from Hive to Load an RDBMS in MapR
- Extracting Data from HBase to Load an RDBMS in MapR
- Reporting on Data in the MapR Cluster, CLDB File, DataHBase Data in MapR, Hive Data in MapR
Cassandra
How to:
MongoDB
How to:
