SQOOP
Apache Sqoop is a tool designed for efficiently
transferring bulk data between Apache Hadoop and structured datastores
such as relational databases. Sqoop imports data from external
structured datastores into HDFS or related systems like Hive and HBase.
Sqoop can also be used to extract data from Hadoop and export it to
external structured datastores such as relational databases and
enterprise data warehouses. Sqoop works with relational databases such
as: Teradata, Netezza, Oracle, MySQL, Postgres, and HSQLDB.
What Sqoop Does
Designed to efficiently transfer bulk data between Apache Hadoop and structured datastores such as relational databases, Apache Sqoop:- Allows data imports from external datastores and enterprise data warehouses into Hadoop
- Parallelizes data transfer for fast performance and optimal system utilization
- Copies data quickly from external systems to Hadoop
- Makes data analysis more efficient
- Mitigates excessive loads to external systems.
How Sqoop Works
Sqoop provides a pluggable connector mechanism for optimal connectivity to external systems. The Sqoop extension API provides a convenient framework for building new connectors which can be dropped into Sqoop installations to provide connectivity to various systems. Sqoop itself comes bundled with various connectors that can be used for popular database and data warehousing systems.Refer to the below link for complete details on the sqoop and its commands
Sqoop User Guide
hortonworks sqoop example
FLUME
Apache™ Flume is a distributed, reliable, and
available service for efficiently collecting, aggregating, and moving
large amounts of streaming data into the Hadoop Distributed File System
(HDFS). It has a simple and flexible architecture based on streaming
data flows; and is robust and fault tolerant with tunable reliability
mechanisms for failover and recovery.
What Flume Does
Flume lets Hadoop users make the most of valuable log data. Specifically, Flume allows users to:- Stream data from multiple sources into Hadoop for analysis
- Collect high-volume Web logs in real time
- Insulate themselves from transient spikes when the rate of incoming data exceeds the rate at which data can be written to the destination
- Guarantee data delivery
- Scale horizontally to handle additional data volume
How Flume Works
Flume’s high-level architecture is focused on delivering a streamlined codebase that is easy-to-use and easy-to-extend. The project team has designed Flume with the following components:- Event – a singular unit of data that is transported by Flume (typically a single log entry)
- Source – the entity through which data enters into Flume. Sources either actively poll for data or passively wait for data to be delivered to them. A variety of sources allow data to be collected, such as log4j logs and syslogs.
- Sink – the entity that delivers the data to the destination. A variety of sinks allow data to be streamed to a range of destinations. One example is the HDFS sink that writes events to HDFS.
- Channel – the conduit between the Source and the Sink. Sources ingest events into the channel and the sinks drain the channel.
- Agent – any physical Java virtual machine running Flume. It is a collection of sources, sinks and channels.
- Client – produces and transmits the Event to the Source operating within the Agent
Reliability & Scaling
Flume is designed to be highly reliable, thereby no data is lost during normal operation. Flume also supports dynamic reconfiguration without the need for a restart, which allows for reduction in the downtime for flume agents. Flume is architected to be fully distributed with no central coordination point. Each agent runs independent of others with no inherent single point of failure. Flume also features built-in support for load balancing and failover. Flume’s fully decentralized architecture also plays a key role in its ability to scale. Since each agent runs independently, Flume can be scaled horizontally with ease.For more details on the Flume click on the below link
Flume User Guide
Hortonworks example on Flume
Hortonworks Example 2
No comments:
Post a Comment