Phoenix命令及语法

2023-10-02 22:05| 来源: 网络整理| 查看: 265

Phoenix命令及语法

官网：http://phoenix.apache.org/language/index.html#select

基本命令：

登录相应机器启动phoenix的客户端首先sqlline.py lyy1,lyy2,lyy3,lyy4:2181登录到Phoenix的shell中，可以使用正常的SQL语句进行操作。 !table查看表信息 !describe tablename可以查看表字段信息 !history可以查看执行的历史SQL !dbinfo !index tb;查看tb的索引 help查看其他操作导入数据：

在phoenix 目录下执行 hadoop jar /home/phoenix-4.12/phoenix-4.6.0-HBase-1.0-client.jar org.apache.phoenix.mapreduce.CsvBulkLoadTool -t POPULATION -i /datas/us_population.csv -t ：tableName -i: input file 文件必须在hdfs文件上。

1.插入数据

在Phoenix中是没有Insert语句的，取而代之的是Upsert语句。Upsert有两种用法，分别是:upsert into 和 upsert select

upsert into: 类似于insert into的语句，旨在单条插入外部数据 upsert into tb values(‘ak’,‘hhh’,222) upsert into tb(stat,city,num) values(‘ak’,‘hhh’,222)

upsert select：类似于Hive中的insert select语句，旨在批量插入其他表的数据。 upsert into tb1 (state,city,population) select state,city,population from tb2 where population < 40000; upsert into tb1 select state,city,population from tb2 where population > 40000; upsert into tb1 select * from tb2 where population > 40000; 注意：在phoenix中插入语句并不会像传统数据库一样存在重复数据。因为Phoenix是构建在HBase之上的，也就是必须存在一个主键。后面插入的会覆盖前面的，但是时间戳不一样。

2.删除数据

delete from tb; 清空表中所有记录，Phoenix中不能使用truncate table tb； delete from tb where city = ‘kenai’; drop table tb;删除表 delete from system.catalog where table_name = ‘int_s6a’; drop table if exists tb; drop table my_schema.tb; drop table my_schema.tb cascade;用于删除表的同时删除基于该表的所有视图。

3.修改数据

由于HBase的主键设计，相同rowkey的内容可以直接覆盖，这就变相的更新了数据。所以Phoenix的更新操作仍旧是upsert into 和 upsert select upsert into us_population (state,city,population) values(‘ak’,‘juneau’,40711);

4.查询数据

union all， group by， order by， limit 都支持 select * from test limit 1000; select * from test limit 1000 offset 100; select full_name from sales_person where ranking >= 5.0 union all select reviewer_name from customer_review where score >= 8.0

5.在Phoenix中是没有Database的概念的，所有的表都在同一个命名空间。但支持多个命名空间设置为true，创建的带有schema的表将映射到一个namespace phoenix.schema.isNamespaceMappingEnabled true

6.创建表

A.SALT_BUCKETS(加盐) 加盐Salting能够通过预分区(pre-splitting)数据到多个region中来显著提升读写性能。本质是在hbase中，rowkey的byte数组的第一个字节位置设定一个系统生成的byte值，这个byte值是由主键生成rowkey的byte数组做一个哈希算法，计算得来的。 Salting之后可以把数据分布到不同的region上，这样有利于phoenix并发的读写操作。

SALT_BUCKETS的值范围在（1 ~ 256）： create table test(host varchar not null primary key, description varchar)salt_buckets=16;

upsert into test (host,description) values (‘192.168.0.1’,‘s1’); upsert into test (host,description) values (‘192.168.0.2’,‘s2’); upsert into test (host,description) values (‘192.168.0.3’,‘s3’);

salted table可以自动在每一个rowkey前面加上一个字节，这样对于一段连续的rowkeys，它们在表中实际存储时，就被自动地分布到不同的region中去了。当指定要读写该段区间内的数据时，也就避免了读写操作都集中在同一个region上。简而言之，如果我们用Phoenix创建了一个saltedtable，那么向该表中写入数据时，原始的rowkey的前面会被自动地加上一个byte（不同的rowkey会被分配不同的byte），使得连续的rowkeys也能被均匀地分布到多个regions。

B.Pre-split（预分区） Salting能够自动的设置表预分区，但是你得去控制表是如何分区的，所以在建phoenix表时，可以精确的指定要根据什么值来做预分区，比如： create table test (host varchar not null primary key, description varchar) split on (‘cs’,‘eu’,‘na’);

C.使用多列族列族包含相关的数据都在独立的文件中，在Phoenix设置多个列族可以提高查询性能。创建两个列族： create table test ( mykey varchar not null primary key, a.col1 varchar, a.col2 varchar, b.col3 varchar ); upsert into test values (‘key1’,‘a1’,‘b1’,‘c1’); upsert into test values (‘key2’,‘a2’,‘b2’,‘c2’);

D.使用压缩 create table test (host varchar not null primary key, description varchar) compression=‘snappy’;

7.创建视图,删除视图

create view “my_hbase_table”( k varchar primary key, “v” unsigned_long) default_column_family=‘a’; create view my_view ( new_col smallint ) as select * from my_table where k = 100; create view my_view_on_view as select * from my_view where new_col > 70 create view v1 as select * from test where description in (‘s1’,‘s2’,‘s3’)

drop view my_view drop view if exists my_schema.my_view drop view if exists my_schema.my_view cascade

8.创建二级索引

支持可变数据和不可变数据（数据插入后不再更新）上建立二级索引 create index my_idx on sales.opportunity(last_updated_date desc) create index my_idx on log.event(created_date desc) include (name, payload) salt_buckets=10 create index if not exists my_comp_idx on server_metrics ( gc_time desc, created_date desc ) data_block_encoding=‘none’,versions=?,max_filesize=2000000 split on (?, ?, ?) create index my_idx on sales.opportunity(upper(contact_name)) create index test_index on test (host) include (description);

删除索引： drop index my_idx on sales.opportunity drop index if exists my_idx on server_metrics drop index if exists xdgl_acct_fee_index on xdgl_acct_fee

默认是可变表，手动创建不可变表 create table hao2 (k varchar primary key, v varchar) immutable_rows=true; alter table HAO2 set IMMUTABLE_ROWS = false; 修改为可变 alter index index1 on tb rebuild;索引重建是把索引表清空后重新装配数据。

Global Indexing多读少写，适合条件较少 create index my_index on items(price); 调用方法： 1.强制索引 select /*+ index(items my_index) */ * from items where price=0.8824734; drop index my_name on usertable;

2.覆盖索引 Covered Indexes，需要include包含需要返回数据结果的列。 create index index1_c on hao1 (age) include(name); name已经被缓存在这张索引表里了。对于select name from hao1 where age=2，查询效率和速度最快 select * from hao1 where age =2，其他列不在索引表内，会全表扫描

Local Indexing写多读少，不是索引字段索引表也会被使用，索引数据和真实数据存储在同一台机器上（ create local index index3_l_name on hao1 (name);

异步创建索引，创建的索引表中不会有数据，单独使用命令行工具来执行数据的创建 create index index1_c on hao1 (age) include(name) async; hbase org.apache.phoenix.mapreduce.index.indextool –schema my_schema --data-table my_table --index-table async_idx –output-path async_idx_hfiles

9.与现有的HBase表关联

首先创建一张HBase表，再创建的Phoenix表，表名必须和HBase表名一致即可。 create ‘stu’ ,‘cf1’,‘cf2’ put ‘stu’, ‘key1’,‘cf1:name’,‘luozhao’ put ‘stu’, ‘key1’,‘cf1:sex’,‘man’ put ‘stu’, ‘key1’,‘cf2:age’,‘24’ put ‘stu’, ‘key1’,‘cf2:adress’,‘cqupt’

create table “stu” ( id VARCHAR NOT NULL PRIMARY KEY , “cf1”.“name” VARCHAR , “cf1”.“sex” VARCHAR , “cf2”.“age” VARCHAR , “cf2”.“adress” VARCHAR ); upsert into “stu”(id,“cf1”.“name”,“cf1”.“sex”,“cf2”.“age”,“cf2”.“adress”) values(‘key6’,‘zkk’,‘man’,‘111’,‘Beijing’);

select * from “stu”;会发现两张表是数据同步的。这里查询表名需要用双引号括起来，强制不转换为大写。这里一定要注意的是表名和列族以及列名需要用双引号括起来，因为HBase是区分大小写的，如果不用双引号括起来的话Phoenix在创建表的时候会自动将小写转换为大写字母

10.在Spark运行环境中添加Phoenix依赖

spark-env.sh添加如下代码: #添加Phoenix依赖 for file in ( f i n d / o p t / h b a s e − 1.2.4 / l i b ∣ g r e p p h o e n i x ) d o S P A R K D I S T C L A S S P A T H = " (find /opt/hbase-1.2.4/lib |grep phoenix) do SPARK_DIST_CLASSPATH=" (find/opt/hbase−1.2.4/lib∣grepphoenix)doSPARKDISTCLASSPATH="SPARK_DIST_CLASSPATH:$file" done export SPARK_DIST_CLASSPATH 这样每次启动spark任务都会将phoenix的jar包添加到classpath了

总的来说：官网讲解很详细也很形象的这里面涉及的问题只是单方面Phoenix的语法如果跟hive和hbase之前的搭建，还涉及到很多问题，语法可能有所变化，

【本文地址】

公司简介

联系我们