Hive表生成函数explode讲解 | 您所在的位置:网站首页 › hiveudtf › Hive表生成函数explode讲解 |
Hive中的表分析函数接受零个或多个输入,然后产生多列或多行输出。 1.explode函数 explode函数以array类型数据输入,然后对数组中的数据进行迭代,返回多行结果,一行一个数组元素值 ARRAY函数是将一列输入转换成一个数组输出。 hive (jimdb)> SELECT ARRAY(1,2,3) FROM dual;OK_c0[1,2,3]Time taken: 0.448 seconds, Fetched: 1 row(s) SELECT explode(array(1,2,3)) AS element; hive (jimdb)> SELECT explode(array(1,2,3)) AS element;OKelement123Time taken: 0.327 seconds, Fetched: 3 row(s) 创建一张测试表单: CREATE TABLE udtf_test(name STRING,subordinates ARRAY)ROW FORMAT DELIMITEDFIELDS TERMINATED BY '\t'COLLECTION ITEMS TERMINATED BY ','; hive (jimdb)> select * from udtf_test;OKudtf_test.name udtf_test.subordinatesjim5 ["james","datacloase"]jim4 ["james","datacloase"]jim3 ["james","datacloase"]jim2 ["james","datacloase"]jim ["james","datacloase"]Time taken: 0.348 seconds, Fetched: 5 row(s) 我执行下面这条语句,希望将字段subordinates拆分开,新生成一列,但是语句报错: select name,explode(subordinates) from udtf_test;
hive (jimdb)> select name,explode(subordinates) from udtf_test;FAILED: SemanticException [Error 10081]: UDTF's are not supported outside the SELECT clause, nor nested in expressions Hive的表生成函数只是生成了一种数据的展示方式,而无法在表中产生一个其他的列。 因此这块需要使用LATERAL VIEW功能来进行处理。LATERAL VIEW将explode生成的结果当做一个视图来处理。
SELECT name, sub FROM udtf_testLATERAL VIEW explode(subordinates) subView AS sub; 在这里LATERAL VIEW 是将 explode结果转换成一个视图subView,在视图中的单列列名定义为sub,然后在查询的时候引用这个列名就能够查到。
hive (jimdb)> SELECT name, sub > FROM udtf_test> LATERAL VIEW explode(subordinates) subView AS sub;OKname subjim5 jamesjim5 datacloasejim4 jamesjim4 datacloasejim3 jamesjim3 datacloasejim2 jamesjim2 datacloasejim jamesjim datacloaseTime taken: 0.399 seconds, Fetched: 10 row(s) 创建一张测试表: drop table test1;create table test1(name string,phonenumber string)ROW FORMAT DELIMITEDFIELDS TERMINATED BY '\t';
--需求是过滤掉该表中电话号码中0-9的某个数字在电话号码中出现6次及以上的号码,然后将正常的号码返回。 hive (jimdb)> select * from test1;OKtest1.name test1.phonenumber'jim he' '18191512076''xiaosong' '18392988059''jingxianghua' '18118818818''donghualing' '17191919999'
执行语句如下: SELECT c.name,c.phonenumberFROM (SELECT dd.name,dd.phonenumber,MAX(dd.cn) FROM (SELECT d.name,d.phonenumber,d.m, COUNT(*) cnFROM (SELECT name,phonenumber,m FROM test1 LATERAL VIEW explode(split(phonenumber,'')) n AS m) d GROUP BY d.name,d.phonenumber,d.m) ddGROUP BY dd.name,dd.phonenumber HAVING MAX(dd.cn) SELECT c.name,c.phonenumber> FROM > (SELECT dd.name,dd.phonenumber,MAX(dd.cn) > FROM (SELECT d.name,d.phonenumber,d.m, COUNT(*) cn> FROM (SELECT name,phonenumber,m FROM test1 LATERAL VIEW explode(split(phonenumber,'')) n AS m) d > GROUP BY d.name,d.phonenumber,d.m) dd> GROUP BY dd.name,dd.phonenumber HAVING MAX(dd.cn) |
CopyRight 2018-2019 实验室设备网 版权所有 |