Goal: enable LZO-compressed files to be used for Hive tables.
Environment: Hadoop cluster managed with Cloudera Manager version 5.
Prerequisites:
- install and activate the parcel that contains the LZO library as shown here
- configure it as shown here
cannot find class com.hadoop.mapred.DeprecatedLzoTextInputFormat
To fix this:
- Go to your Cloudera Manager UI home page
- Click Hive
- Click Configuration > View and Edit
- Under Service-Wide > Advanced, look for Hive Auxiliary JARs Directory
- Set the value to /opt/cloudera/parcels/HADOOP_LZO/lib/hadoop/lib
- Restart the Hive service (and any related services)
As a reminder, to create a table backed by LZO-compressed files in HDFS, do something like this:
CREATE EXTERNAL TABLE `my_lzo_table`(`something` string)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\t'
STORED AS INPUTFORMAT
'com.hadoop.mapred.DeprecatedLzoTextInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
'/hdfs/path/to/your/lzo/files';