Sqoop Tutorial
March 09, 2018
Sqoop is a tool that uses the mapreduce framework to export data from relational databases into HDFS in a parallel fashion. That’s all that Sqoop does — it is a simple but effective tool. Let’s make a MySQL database and put it into HDFS using Sqoop.
Sqoop Example — MySQL, HDFS, and Hive
First thing we need to do is create a database in MySql.
mysql -u root -p
The password is cloudera.
create database people;
use people;
create table friends(id int, name varchar(20), nickname varchar(20));
desc friends;
mysql> desc friends;
+----------+-------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+----------+-------------+------+-----+---------+-------+
| id | int(11) | YES | | NULL | |
| name | varchar(20) | YES | | NULL | |
| nickname | varchar(20) | YES | | NULL | |
+----------+-------------+------+-----+---------+-------+
3 rows in set (0.00 sec)
Let’s insert a few values into this table:
mysql> INSERT INTO friends VALUES (1, "Donald Duck", "D. Duck");
Query OK, 1 row affected (0.00 sec)
mysql> INSERT INTO friends VALUES (2, "Goofy", "Goofy");
Query OK, 1 row affected (0.00 sec)
mysql> INSERT INTO friends VALUES (3, "Bruce Wayne", "Batman");
Query OK, 1 row affected (0.00 sec)
mysql> INSERT INTO friends VALUES(4, "Clark Kent", "Superman");
Query OK, 1 row affected (0.01 sec)
Select * from friends;
+------+-------------+----------+
| id | name | nickname |
+------+-------------+----------+
| 1 | Donald Duck | D. Duck |
| 2 | Goofy | Goofy |
| 3 | Bruce Wayne | Batman |
| 4 | Clark Kent | Superman |
+------+-------------+----------+
4 rows in set (0.00 sec)
Sqoop Example — Import MySQL Table into Hive
Next, we will import this into Hive using Sqoop.
sqoop import --connect jdbc:mysql://quickstart:3306/people --username root -P --split-by id --columns id,name,nickname --table friends --target-dir /sqoop/people/ --fields-terminated-by "," --hive-import --create-hive-table --hive-table moviedata.friends
Let’s jump into the Hive and query the table to see it.
beeline -u jdbc:hive2://quickstart:10000/default -n admin -d org.apache.hive.jdbc.HiveDriver
use moviedata;
select * from friends;
+-------------+---------------+-------------------+--+
| friends.id | friends.name | friends.nickname |
+-------------+---------------+-------------------+--+
| 1 | Donald Duck | D. Duck |
| 2 | Goofy | Goofy |
| 3 | Bruce Wayne | Batman |
| 4 | Clark Kent | Superman |
+-------------+---------------+-------------------+--+
4 rows selected (0.201 seconds)
Well done. You have imported a MySQL table into Hive.
Sqoop Example — Export Hive Table to MySQL
Let’s try and export a Hive table into MySql. To prepare for the table, we must have a table created inside MySql so we are going to build a table inside the people database and create a table that we are going to populate with our ratings hive table. Let’s jump into it.
mysql -u root -p
#password cloudera
use people;
create table ratings(id int, movieid int, rating INT, tstamp varchar(20));
Query OK, 0 rows affected (0.01 sec)
sqoop export --connect jdbc:mysql://quickstart:3306/people --username root --password cloudera --table ratings --export-dir /moviedata/ratings
Now the sqoop export command will take the data out of HDFS and put the data into an already defined table inside of a relational database.
Now we just need to jump into MySQL and check out the table that we just created.
mysql -u root -p
#password cloudera
use people;
select * from ratings;
| 3129 | 1136 | 1 | 969365204 |
| 3129 | 466 | 3 | 969281574 |
| 149 | 2002 | 2 | 977590434 |
| 3129 | 3826 | 3 | 969278845 |
| 149 | 1200 | 5 | 977588505 |
| 3129 | 3688 | 4 | 969291462 |
| 149 | 1202 | 3 | 977589370 |
| 3129 | 3689 | 3 | 969291537 |
| 149 | 3744 | 4 | 996939065 |
| 3129 | 2889 | 3 | 969281005 |
| 149 | 2006 | 4 | 978190099 |
| 3129 | 471 | 3 | 969280818 |
| 149 | 1204 | 5 | 977588131 |
| 3129 | 477 | 3 | 969279612 |
| 149 | 2871 | 4 | 977591435 |
| 3129 | 3690 | 2 | 969291537 |
| 149 | 1206 | 5 | 977588555 |
DON’T FORGET TO DESTROY YOUR DIGITAL OCEAN MACHINE!
Congratulations you have successfully imported and exported a Hive table using Sqoop.
Conclusion
Sqoop is a great way to import and export relational data in and out of Hadoop.