Monday, 31 March 2014

Hive UDF Example

Hive UDF Example

UDF(User Defined Function) is a Very important functionality provided by Hive. It is very simple to create a UDF for Hive. In this tutorial we will learn creating UDF and how to use it with hive. There are two possible ways to create UDF.

1. using org.apache.hadoop.hive.ql.exec.UDF
2. using org.apache.hadoop.hive.ql.udf.generic.GenericUDF

       If input and output to your custom function is basic type eg. Text, FloatWritable, DoubleWritable,IntWritable etc the use org.apache.hadoop.hive.ql.exec.UDF.

       If yor input and output can be Map, set, list type of data structure the use using org.apache.hadoop.hive.ql.udf.generic.GenericUDF.

We will discuss the first type of UDF here. I will write one more post to discuss the second approach.

First of all lets assume I want to create a hive function called toUpper which will convert a string to uppercase. Follow the following steps to achieve it.

1. Download and install eclipse from here

2. Hive should be installed, if it is not installed, please follow the instructions here

3. Start Eclipse and create a java project.

4. Right click on the project, the click on build path, it should open a window which will have different tabs. Click on "libraries" tab.  

5. Click add external jars. It will open a new window. go to hive installation directory. Then select and add all jar files in this folder. Again click on add external jars. Go to <hive installation>/lib directory. select and add all the jar files in this folder.


6. No to create a UDF first create a Class which extends using org.apache.hadoop.hive.ql.exec.UDF. Say the class name is ToUpper. add the following code in the class.

package org.learn.hive;

import org.apache.hadoop.hive.ql.exec.UDF;
import org.apache.hadoop.io.Text;

class ToUpper extends UDF {
  
  public Text evaluate(Text input) {
    if(input == null) return null;
    return new Text(input.toString().toUpperCase());
  }
}


7. Now export this project as jar file and name it hive-to-upper-udf.jar.

8. copy this jar file in <hive-installation-dir>/lib/ directory.

9. Now go to hive shell and type following command.

ADD JAR /home/hduser/hive/lib/ hive-to-upper-udf.jar;

CREATE TEMPORARY FUNCTION toUpper as 'org.learn.hive.ToUpper';


select toUpper(name) from user_table limit 1000;

Third command is specific to table name of your database. change it accordingly. Path of of jar file can also be different in your machine. please change it accordingly.

No comments:

Post a Comment