鸿 网 互 联 www.68idc.cn

Greenplum6 数据库数据库学习_基本语法

来源:互联网 作者:佚名 时间:2022-07-19 11:10
Greenplum6 数据库分布 1. hash分布2. 随机分布3. 复制分布 基本语法介绍 1. 获取语法 [gpadmin@mdw logs]$ psqlpsql (9.4.24)Type "help" for help.postgres=# \hAvailable help: ABORT ALTER TEXT SEARCH TEMPLATE CREATE RESOURC

Greenplum6 数据库分布

1. hash分布 2. 随机分布 3. 复制分布

基本语法介绍

1. 获取语法

[gpadmin@mdw logs]$ psql psql (9.4.24) Type "help" for help. postgres=# \h Available help: ABORT ALTER TEXT SEARCH TEMPLATE CREATE RESOURCE GROUP DROP FUNCTION LOAD ALTER AGGREGATE ALTER TRIGGER CREATE RESOURCE QUEUE DROP GROUP LOCK ALTER COLLATION ALTER TYPE CREATE ROLE DROP INDEX MOVE ALTER CONVERSION ALTER USER CREATE RULE DROP LANGUAGE NOTIFY ALTER DATABASE ALTER USER MAPPING CREATE SCHEMA DROP MATERIALIZED VIEW PREPARE ALTER DEFAULT PRIVILEGES ALTER VIEW CREATE SEQUENCE DROP OPERATOR PREPARE TRANSACTION ALTER DOMAIN ANALYZE CREATE SERVER DROP OPERATOR CLASS REASSIGN OWNED ALTER EVENT TRIGGER BEGIN CREATE TABLE DROP OPERATOR FAMILY REFRESH MATERIALIZED VIEW ALTER EXTENSION CHECKPOINT CREATE TABLE AS DROP OWNED REINDEX ALTER EXTERNAL TABLE CLOSE CREATE TABLESPACE DROP PROTOCOL RELEASE SAVEPOINT ALTER FOREIGN DATA WRAPPER CLUSTER CREATE TEXT SEARCH CONFIGURATION DROP RESOURCE GROUP RESET ALTER FOREIGN TABLE COMMENT CREATE TEXT SEARCH DICTIONARY DROP RESOURCE QUEUE REVOKE ALTER FUNCTION COMMIT CREATE TEXT SEARCH PARSER DROP ROLE ROLLBACK ALTER GROUP COMMIT PREPARED CREATE TEXT SEARCH TEMPLATE DROP RULE ROLLBACK PREPARED ALTER INDEX COPY CREATE TRIGGER DROP SCHEMA ROLLBACK TO SAVEPOINT ALTER LANGUAGE CREATE AGGREGATE CREATE TYPE DROP SEQUENCE SAVEPOINT ALTER LARGE OBJECT CREATE CAST CREATE USER DROP SERVER SECURITY LABEL ALTER MATERIALIZED VIEW CREATE COLLATION CREATE USER MAPPING DROP TABLE SELECT ALTER OPERATOR CREATE CONVERSION CREATE VIEW DROP TABLESPACE SELECT INTO ALTER OPERATOR CLASS CREATE DATABASE DEALLOCATE DROP TEXT SEARCH CONFIGURATION SET ALTER OPERATOR FAMILY CREATE DOMAIN DECLARE DROP TEXT SEARCH DICTIONARY SET CONSTRAINTS ALTER PROTOCOL CREATE EVENT TRIGGER DELETE DROP TEXT SEARCH PARSER SET ROLE ALTER RESOURCE GROUP CREATE EXTENSION DISCARD DROP TEXT SEARCH TEMPLATE SET SESSION AUTHORIZATION ALTER RESOURCE QUEUE CREATE EXTERNAL TABLE DO DROP TRIGGER SET TRANSACTION ALTER ROLE CREATE FOREIGN DATA WRAPPER DROP AGGREGATE DROP TYPE SHOW ALTER RULE CREATE FOREIGN TABLE DROP CAST DROP USER START TRANSACTION ALTER SCHEMA CREATE FUNCTION DROP COLLATION DROP USER MAPPING TABLE ALTER SEQUENCE CREATE GROUP DROP CONVERSION DROP VIEW TRUNCATE ALTER SERVER CREATE INDEX DROP DATABASE END UNLISTEN ALTER SYSTEM CREATE LANGUAGE DROP DOMAIN EXECUTE UPDATE ALTER TABLE CREATE MATERIALIZED VIEW DROP EVENT TRIGGER EXPLAIN VACUUM ALTER TABLESPACE CREATE OPERATOR DROP EXTENSION FETCH VALUES ALTER TEXT SEARCH CONFIGURATION CREATE OPERATOR CLASS DROP EXTERNAL TABLE GRANT WITH ALTER TEXT SEARCH DICTIONARY CREATE OPERATOR FAMILY DROP FOREIGN DATA WRAPPER INSERT ALTER TEXT SEARCH PARSER CREATE PROTOCOL DROP FOREIGN TABLE LISTEN postgres=#

2. 创建数据库

[gpadmin@mdw logs]$ createdb testDB -E utf-8 [gpadmin@mdw logs]$ psql -h 10.10.10.101 -p 5432 -d testDB -U gpadmin psql (9.4.24) Type "help" for help. testDB=# \q [gpadmin@mdw logs]$ export PGDATABASE=testDB [gpadmin@mdw logs]$ psql psql (9.4.24) Type "help" for help. testDB=#

3. 建表语句

- GreenPlum中创建表需要指定表的分布键。 - 如果表需要用某个字段分区,可以通过partition by 将表建成分区表。 - 可以使用like操作创建与like的表一样结构的表,功能类似create table t1 as select * from t2 limit 0。 - 可以使用inherits实现表的继承,具体实现参考postgresql文档。 --语法查询 \h create table --创建表 create table test001(id int,name varchar(128)); --id 为分布键 create table test002(id int,name varchar(128)) distributed by (id); --同上 testDB=# create table test001(id int,name varchar(128)); NOTICE: Table doesn't have 'DISTRIBUTED BY' clause -- Using column named 'id' as the Greenplum Database data distribution key for this table. HINT: The 'DISTRIBUTED BY' clause determines the distribution of data. Make sure column(s) chosen are the optimal data distribution key to minimize skew. CREATE TABLE testDB=# create table test002(id int,name varchar(128)) distributed by (id); CREATE TABLE testDB=# create table test003(id int,name varchar(128)) distributed by (id,name) --多个分布键 testDB=# create table test003(id int,name varchar(128)) distributed by (id,name); CREATE TABLE create table test004(id int,name varchar(128)) distributed randomly; --随机分布键 testDB=# create table test004(id int,name varchar(128)) distributed randomly; CREATE TABLE create table test005(id int primary key ,name varchar(128)); create table test006(id int unique ,name varchar(128)); testDB=# create table test005(id int primary key ,name varchar(128)); CREATE TABLE testDB=# create table test006(id int unique ,name varchar(128)); CREATE TABLE testDB=# 采用随机分布策略的表默认将主键,或者唯一键作为分布键,因为每个Segment都是一个单一的数据库,单个数据库可以确保唯一性,多个数据库节点就无法保证全局的跨库唯一性,故只能按照唯一键分布,同一个值的数据都在一个节点上,以此来保证唯一性。 --如果指定的分布键与主键盘不一样,那么分布键会被更改为主键。在greenplum6 中这句话貌似显示不兼容,如下: create table test007(id int unique,name varchar(128)) distributed by (id,name); testDB=# create table test007(id int unique,name varchar(128)) distributed by (id,name); ERROR: UNIQUE constraint and DISTRIBUTED BY definitions are incompatible HINT: When there is both a UNIQUE constraint and a DISTRIBUTED BY clause, the DISTRIBUTED BY clause must be a subset of the UNIQUE constraint. testDB=# 按照它的要求进行修改,我们distribute by 修改成id testDB=# create table test007(id int unique,name varchar(128)) distributed by (id); CREATE TABLE testDB=# ---创建一模一样的 表 create table test_like (like test001); --使用like 创建表的时候,只是表结构会跟原表一摸一样,表的特殊属性并不会一样,例如压缩,只增(appendonly)等属性,如果不指定分布键,默认分布键与原表一致。 testDB=# create table test_like (like test001); NOTICE: table doesn't have 'DISTRIBUTED BY' clause, defaulting to distribution columns from LIKE table CREATE TABLE testDB=# create table as 和select into 有一样的功能。都可以根据直接执行select 的结果创建一个新的表。创建一个表的时候,如果默认不指定分布键,那么Greenplum根据执行select 得到的结果集来选择,不用再次重分布数据的字段作为表的分布键。 testDB=# create table test2 as select * from test002; NOTICE: Table doesn't have 'DISTRIBUTED BY' clause. Creating a NULL policy entry. SELECT 4 testDB=# create table test1 as select * from test001; NOTICE: Table doesn't have 'DISTRIBUTED BY' clause. Creating a NULL policy entry. SELECT 4 手工加入distributed by,指定分布键,这样就会根据指定分布键再建表。 testDB=# create table test02 as select * from test002 distributed by(id); SELECT 4 select into 不能指定分布键,只能使用默认的分布键 testDB=# select * into test01 from test001; NOTICE: Table doesn't have 'DISTRIBUTED BY' clause. Creating a NULL policy entry. SELECT 4

4. 插入语句

插入语句 执行insert语句注意分布键不要为空,否则分布键默认变成null',数据都被保存到一个节点上会导致分布不均 insert into test001 values (100,'tom'),(101,'lily'),(102,'jack'),(103,'linda'); insert into test002 values (200,'tom'),(101,'lily'),(202,'jack'),(103,'linda');

5. 更新语句

不能批量对分布键执行update,因为分布键执行update需要将数据重分布. testDB=# update test002 set id=203 where id=202; UPDATE 1 testDB=#

6. 删除语句delete--truncate

在Greenplum 3.x 的版本中,如果delete 操作涉及子查询,子查询的结果还涉及到数据重分布,这样的删除语句会报错,Greenplum 4.x以上,支持该操作。 testDB=# delete from test001 where name in (select name from test002); DELETE 4 testDB=# 如果对整张表执行delete会比较慢,建议使用truncate. truncate执行truncate直接删除表的物理文件,然后创建新的数据文件。如果有sql正在操作这张表,那么truncate会被锁住,直到表上面的所有锁会被释放。

7. 查询语句

testDB=# select * from test001 x,test002 y where x.id=y.id; id | name | id | name -----+-------+-----+------- 103 | linda | 103 | linda 101 | lily | 101 | lily (2 rows)

8. 执行计划

testDB=# explain select * from test001 x,test002 y where x.id=y.id; QUERY PLAN ------------------------------------------------------------------------------- Gather Motion 6:1 (slice1; segments: 6) (cost=0.00..862.00 rows=5 width=18) -> Hash Join (cost=0.00..862.00 rows=1 width=18) Hash Cond: (test001.id = test002.id) -> Seq Scan on test001 (cost=0.00..431.00 rows=1 width=9) -> Hash (cost=431.00..431.00 rows=1 width=9) -> Seq Scan on test002 (cost=0.00..431.00 rows=1 width=9) Optimizer: Pivotal Optimizer (GPORCA) (7 rows) testDB=# explain select * from test001 x,test002 y where x.id=y.id; QUERY PLAN ------------------------------------------------------------------------------- Gather Motion 6:1 (slice1; segments: 6) (cost=0.00..862.00 rows=5 width=18) -> Hash Join (cost=0.00..862.00 rows=1 width=18) Hash Cond: (test001.id = test002.id) -> Seq Scan on test001 (cost=0.00..431.00 rows=1 width=9) -> Hash (cost=431.00..431.00 rows=1 width=9) -> Seq Scan on test002 (cost=0.00..431.00 rows=1 width=9) Optimizer: Pivotal Optimizer (GPORCA) (7 rows) 第一步: 数据库先顺序扫描test002表,扫描大概有431单位的消耗,有1行数据,平均长度为9个字节。 其中1是一个估算值,是一个segment的数据量,如果数据分布均匀,大概是总数据量除以segment的个数。 由于这个gp集群有6个segment节点,因此可以推算test002大概有6条数据。 第二步: 扫描test002表,并计算hash值,将其保存在内存中。 第三步: 顺序扫描test001表 第四步: 在扫描test001表的过程中,与test002表进行hash后的结果关联(hash join),关联条件是(test001.id = test002.id) 第五步:将数据汇总到master上,master 将数据结果进行汇总并展现。

常用函数

字符串函数

testDB=# select 'green'||'plum' as dbname; dbname ----------- greenplum (1 row) testDB=# select split_part(col,'|',1) ,split_part(col,'|',2) from (values('hello|world!'),('greenplum|database')) t(col); --values 特有语法,可以堪称一张表,有两行数据,表明为t,字段名为col split_part | split_part ------------+------------ hello | world! greenplum | database (2 rows) values('hello|world!'),('greenplum|database'); testDB=# values('hello|world!'),('greenplum|database'); column1 -------------------- hello|world! greenplum|database (2 rows) testDB=# select substr('hello world!',2,3); select position('world' in 'hello world'); testDB=# select substr('hello world!',2,3); substr -------- ell (1 row) testDB=# select position('world' in 'hello world'); position ---------- 7 (1 row)

时间函数

[root@mdw ~]# su - gpadmin Last login: Wed Apr 14 14:45:20 CST 2021 on pts/0 ^[[A^[[A[gpadmin@export PGDATABASE=testDB [gpadmin@mdw ~]$ psql -h 10.10.10.101 -p 5432 -d testDB -U gpadmin psql (9.4.24) Type "help" for help. testDB=# select '2021-03-23 1:0:0'::timestamp + interval '10 days 2 hours 10 seconds'; ?column? --------------------- 2021-04-02 03:00:10 (1 row) testDB=# select now(),current_date,current_time,current_timestamp; now | date | timetz | now -------------------------------+------------+--------------------+------------------------------- 2021-04-14 15:48:52.166167+08 | 2021-04-14 | 15:48:52.166167+08 | 2021-04-14 15:48:52.166167+08 (1 row) 当月第一天 testDB=# select date_trunc('months',now())::date; date_trunc ------------ 2021-04-01 (1 row) 获取当前时间距离2021-01-3 10:10:10 过了多少秒 testDB=# select extract(EPOCH from now() -'2021-01-3 10:10:10'); date_part --------------- 8746923.35616 (1 row) 时间域有重叠返回t,不重叠f testDB=# select (DATE '2020-02-16',DATE '2020-12-21') OVERLAPS (DATE '2020-10-30',DATE '2021-1-15'); overlaps ---------- t (1 row) testDB=# select (DATE '2020-02-16',INTERVAL '100 days') OVERLAPS (DATE '2020-10-30',DATE '2021-1-15'); overlaps ---------- f (1 row) testDB=#

其他常用函数

* 序列号生成函数 testDB=# select * from generate_series(10,20); generate_series ----------------- 10 11 12 13 14 15 16 17 18 19 20 (11 rows) testDB=# create table test_gen as select generate_series(1,10000) as id ,'hello'::text as name distributed by (id); SELECT 10000 testDB=# select sum(num) from generate_series(1,2000,2) num; sum --------- 1000000 (1 row) * 字符串列转行函数 testDB=# create table test_string (id integer,str varchar(50)) distributed by (id); CREATE TABLE testDB=# insert into test_string values(1,'hello'),(1,'world'),(2,'greenplum'),(2,'database'),(2,'system'); INSERT 0 5 testDB=# select * from test_string; id | str ----+----------- 2 | greenplum 2 | database 2 | system 1 | hello 1 | world (5 rows) testDB=# select id,string_agg(str,'|' order by str) from test_string group by id; id | string_agg ----+--------------------------- 2 | database|greenplum|system 1 | hello|world (2 rows) testDB=# select id,string_agg(str,'|') from test_string group by id; id | string_agg ----+--------------------------- 2 | greenplum|database|system 1 | hello|world (2 rows) * 字符串行转列 testDB=# create table test_string2 as select id,string_agg(str,'|') as str from test_string group by id distributed by (id); SELECT 2 testDB=# select * from test_string2 ; id | str ----+--------------------------- 2 | greenplum|database|system 1 | hello|world (2 rows) testDB=# select id,regexp_split_to_table(str,E'\\|') str from test_string2; id | str ----+----------- 2 | greenplum 2 | database 2 | system 1 | hello 1 | world (5 rows) * hash函数 testDB=# select md5('helloworld'); md5 ---------------------------------- fc5e038d38a57032085441e7fe7010b0 (1 row) testDB=# select hashbpchar('helloworld'); hashbpchar ------------ 1836618988 (1 row) testDB=# ## 参考 ``` https://www.cnblogs.com/kingle-study/p/10550785.html 《GreenPlum企业应用实战》 ```
上一篇:jenkins发布nodejs项目(修改)
下一篇:没有了
网友评论
<