HIVE 权限配置 [没有趟过坑的人生是不完美的]

您所在的位置：网站首页 › hive写入es参数设置 › HIVE 权限配置 [没有趟过坑的人生是不完美的]

HIVE 权限配置 [没有趟过坑的人生是不完美的]

2024-07-16 19:04| 来源: 网络整理| 查看: 265

这两天被hive的权限问题,折腾的不轻.记录一下

Hive的基本配置我就不细说了,自行配置,网上一堆堆的.

1.背景

要求可以使用hdfs和hive用户操作自己创建的数据库. 权限不可乱. 要求,如下,[基本就是裸奔,没做任何配置,但依旧是坑不断.]

1.hive没有设置任何权限采用默认 NONE

2.hadoop权限体系采用默认最简单的Simple机制.

3. 要求目录权限不能设置777

4. hdfs-site.xml 要开启权限检查

hdfs-site.xml : dfs.permissions true

5.hive-site.xml 配置 : hive.server2.enable.doAs

hive.server2.enable.doAs true hive.server2.enable.doAs设置成false则， yarn作业获取到的hiveserver2用户都为hive用户。设置成true则为实际的用户名

如果设置hive.server2.enable.doAs为false的话,则任何用户用hiveserver2连接的用户都会使用hive用户.

这样的话,假设我用hdfs用户在hive中创建的数据库, 采用hiveserver2 就会报错.

因为你不管设置用户名是什么,都会以hive的权限去访问hdfs用户权限的hive数据.

2..命令行权限配置

[ 其实这样是最基本的配置. ]

core-site.xml 配置代理.

在这里要配置需要用hive连接,创建数据库/操作数据的用户.

这里我只举了两个用户. 一个是hive,一个是hdfs.

httpfs.proxyuser.hive.hosts * httpfs.proxyuser.hive.groups * httpfs.proxyuser.hdfs.hosts * httpfs.proxyuser.hdfs.groups *

配置完之后, 可以在命令行中用hive / hdfs 操作自己对应权限的用户. 用户名默认是当前登录的系统名

2. 使用thrift方式,获取元数据,直接操作数据. [Presto]

在这里我直接拿的Presto进行举例.因为我这边是用presto连接的hive.

直接上Presto 配置文件 /etc/catalog里面关于hive.properties的配置.

我们看到这是通过thrift进行配置的.

但是我们用什么用户进行操作呢 ?? 我没看到怎么配置的. 比如我用的hdfs用户和hive操作自己的数据库怎么办? 怎么区分权限 ??

[root@master catalog]# pwd /opt/presto/etc/catalog [root@master catalog]# [root@master catalog]# [root@master catalog]# ll 总用量 8 -rw-rw-r--. 1 presto hadoop 172 6月 18 13:07 hive.properties -rw-rw-r--. 1 presto hadoop 124 4月 18 17:33 mysql.properties [root@master catalog]# [root@master catalog]# more hive.properties connector.name=hive-hadoop2 hive.metastore.uri=thrift://hive-metaserver:9083 hive.config.resources=/opt/hadoop/etc/hadoop/core-site.xml,/opt/hadoop/etc/hadoop/hdfs-site.xml hive.config.resources=/opt/hadoop/etc/hadoop/core-site.xml,/opt/hadoop/etc/hadoop/hdfs-site.xml [root@master catalog]# [root@master catalog]#

直接说答案.

当不将Kerberos与HDFS一起使用时，Presto将使用Presto进程的OS用户访问HDFS(即系统用户)。

例如，如果Presto以root身份运行，它将以身份访问HDFS的权限为root用户的权限。

可以通过HADOOP_USER_NAME在Presto JVM Config中设置系统属性来覆盖此用户名，并替换hdfs_user为适当的用户名：

-DHADOOP_USER_NAME=hdfs_user

[root@master etc]# pwd /opt/presto/etc [root@master etc]# ll 总用量 16 drwxr-xr-x. 2 presto hadoop 53 6月 18 13:07 catalog -rw-r--r--. 1 presto hadoop 177 6月 18 13:34 config.properties -rw-r--r--. 1 presto hadoop 194 6月 18 13:34 jvm.config -rw-rw-r--. 1 presto presto 25 4月 18 17:33 log.properties -rw-r--r--. 1 presto hadoop 85 6月 18 13:34 node.properties [root@master etc]# more jvm.config -server -Xmx2G -XX:+UseG1GC -XX:G1HeapRegionSize=32M -XX:+UseGCOverheadLimit -XX:+ExplicitGCInvokesConcurrent -XX:+HeapDumpOnOutOfMemoryError -XX:+ExitOnOutOfMemoryError -DHADOOP_USER_NAME=hdfs

登录验证:

cd ${PRESTO_HOME}/bin ./presto --server 192.168.100.100:8989 --catalog hive --schema default

当然,这里面并不一定是用你配置的账户进行登录.因为有可能不管你用任何账户登录,都可能是用hive的权限进行操作.

3.beeline,hiveserver2 ,Hive on Spark 权限设置

有同学会问,为啥这个要单独拎出来.因为我遇到一个奇葩的问题. 折腾了我一天. 我先把报错问题跑出来.

使用hdfs用户去访问hdfs用户在命令行创建的数据库. 在查询的时候无法查出数据[报错]. 但是使用hive确可以查出来.

运行条件:

hive运行的时候采用的是hive on spark 跑数据. 跑数据的时候,需要执行一条涉及到聚合操作的sql语句例: selct count(*) from table

hive采用beeline进行连接 : beeline -u 'jdbc:hive2://localhost:10000/default' -n hdfs

java代码使用hiveserver2连接 : jdbc:hive2://localhost:10000/default 这个连接,设置hdfs 用户进行登录.

报错信息:

hiveserver2.log:

FAILED: Execution Error, return code 30041 from org.apache.hadoop.hive.ql.exec.spark.SparkTask.

Failed to create Spark client for Spark session 3688eac6-e3ff-4c45-b0c1-ef09c34909ef

2020-06-18 19:49:29,031 INFO [23b1bf4c-9333-4095-b6f5-0f362ef59609 HiveServer2-Handler-Pool: Thread-75] reducesink.VectorReduceSinkEmptyKeyOpe rator: VectorReduceSinkEmptyKeyOperator constructor vectorReduceSinkInfo org.apache.hadoop.hive.ql.plan.VectorReduceSinkInfo@582d2544 Query ID = hive_20200618194928_671bb64c-92e7-4930-8b09-b27bab7a64a0 Total jobs = 1 Launching Job 1 out of 1 In order to change the average load for a reducer (in bytes): set hive.exec.reducers.bytes.per.reducer= In order to limit the maximum number of reducers: set hive.exec.reducers.max= In order to set a constant number of reducers: set mapreduce.job.reduces= Failed to execute spark task, with exception 'org.apache.hadoop.hive.ql.metadata.HiveException(Failed to create Spark client for Spark session 3688eac6-e3ff-4c45-b0c1-ef09c34909ef)' FAILED: Execution Error, return code 30041 from org.apache.hadoop.hive.ql.exec.spark.SparkTask. Failed to create Spark client for Spark session 3688eac6-e3ff-4c45-b0c1-ef09c34909ef OK

报错信息只有: Failed to create Spark client for Spark session

思考: 为啥 ??? 因为hive是运行在spark上 , spark 采用yarn模式进行运行. 所以查找了所有的日志.最终发现.

hadoop-yarn-resourcemanager-xxxx-103.log

2020-06-18 20:49:58,590 INFO org.apache.hadoop.ipc.Server: Connection from 192.168.xxx.103:55442 for protocol org.apache.hadoop.yarn.api.ApplicationClientProtocolPB is unauthorized for user hdfs (auth:PROXY) via hive (auth:SIMPLE)

注意这句: is unauthorized for user hdfs (auth:PROXY) via hive (auth:SIMPLE)

我去查了hadoop3.1.3的源码

很明显是用户权限没通过. 为啥没通过?

网上很多说是因为没有在core-site.xml里面配置代理对象. 但是这个"确实已经加了"

httpfs.proxyuser.hive.hosts * httpfs.proxyuser.hive.groups * httpfs.proxyuser.hdfs.hosts * httpfs.proxyuser.hdfs.groups *

然后各种查资料,各种尝试,不好用....

没办法了,将: hdfs-site.xml中的 dfs.permissions 中的权限校验关掉了....

依旧不好用............

最后在:httpfs-site.xml 增加上面的配置.

httpfs.proxyuser.hive.hosts * httpfs.proxyuser.hive.groups * httpfs.proxyuser.hdfs.hosts * httpfs.proxyuser.hdfs.groups *

然后满心欢喜的以为好用了.

hadoop关于文件的权限校验 hdfs-site.xml中的 dfs.permissions ,然后又不好用!!!!!!!!!!!

这时候已经到了第二天了.还没解决. 心态已经要炸了....

已经要怀疑是不是版本不兼容的问题?

接下来没办法了, 上远程调试[这部分我就不细说了,总之超级麻烦].

权限检查的代码在这里:

//有兴趣的可以去看看hadoop关于权限的源码: org.apache.hadoop.ipc.Server#authorizeConnection /** * Authorize proxy users to access this server * @throws RpcServerException - user is not allowed to proxy */ private void authorizeConnection() throws RpcServerException { try { // If auth method is TOKEN, the token was obtained by the // real user for the effective user, therefore not required to // authorize real user. doAs is allowed only for simple or kerberos // authentication if (user != null && user.getRealUser() != null && (authMethod != AuthMethod.TOKEN)) { // 默认会走这里 ProxyUsers.authorize(user, this.getHostAddress()); } authorize(user, protocolName, getHostInetAddress()); if (LOG.isDebugEnabled()) { LOG.debug("Successfully authorized " + connectionContext); } rpcMetrics.incrAuthorizationSuccesses(); } catch (AuthorizationException ae) { LOG.info("Connection from " + this + " for protocol " + connectionContext.getProtocol() + " is unauthorized for user " + user); rpcMetrics.incrAuthorizationFailures(); throw new FatalRpcServerException( RpcErrorCodeProto.FATAL_UNAUTHORIZED, ae); } }

在调试的工程中,发现加载配置文件,代理对象加载的时候,竟然没有hive ???

思考了一下,因为提交任务到yarn上面的时候,权限校验不通过. ResourceManager负责权限以及资源调度.

所以直接看ResourceManager上的代理配置文件.

core-site.xml配置文件

一堆神兽跑过..........

又一堆神兽跑过..........

双双一堆神兽跑过..........

叒叒叒一堆神兽跑过..........

叕叕叕叕一堆神兽跑过..........

叕叕叕叕叕一堆神兽跑过..........

叕叕叕叕叕叕一堆神兽跑过..........

溜达一圈,泡杯咖啡,回来之后,去掉注释,重启.可以正常使用了.....

【本文地址】

公司简介

联系我们