Presto入门: 配置第一个http connector

Presto入门: 配置第一个http connector

1. connector

在presto中,可以对接多种类型的数据源,今天以http 服务器数据为例,简单介绍如何接入presto。

2. 搭建http数据数据源

2.1 http数据源的schema

在http服务器上,提供一个文件,文件内容是数据源的格式。 一个文件是json格式,顶层是schema的名称,schema类似数据的database。schema之下是一个表的list。每张表要提供列的名称和类型,以及数据的地址,即http地址,见一个样例:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
{
"schema":[{
"name":"table1",
"columns":[
{
"name":"key1",
"type":"bigint"
},
{
"name":"key2",
"type":"varchar"
}
],
"sources":[
"http://localhost:9080/data.csv"
]
}
]
}

2.2 提供数据:

http数据是一个csv格式,例如上文提到的data.csv的内容是:

1
2
10,b
1,d

2.3 配置presto

接下来配置presto,使得presto知道http 数据源的存在,创建文件etc/catalog/http.properties ,在文件中指定schema的地址:

1
2
connector.name=example-http
metadata-uri=http://localhost:9080/schema.json

2.4 查看查询效果:

2.4.1 展示http catalog中的schema

1
2
3
4
5
6
7
8
9
10
11
presto> show schemas from http;
Schema
--------------------
information_schema
schema
(2 rows)

Query 20180510_030439_00002_58j4x, FINISHED, 1 node
Splits: 18 total, 18 done (100.00%)
0:00 [2 rows, 34B] [15 rows/s, 263B/s]

2.4.2 展示http catalog的schema库中的表内容

1
2
3
4
5
6
7
8
9
presto> show tables  from http.schema;
Table
--------
table1
(1 row)

Query 20180510_030453_00003_58j4x, FINISHED, 1 node
Splits: 18 total, 18 done (100.00%)
0:00 [1 rows, 22B] [4 rows/s, 108B/s]

2.4.3 展示表的格式

1
2
3
4
5
6
7
8
9
10
11
presto> describe  http.schema.table1;
Column | Type | Extra | Comment
--------+---------+-------+---------
key1 | bigint | |
key2 | varchar | |
(2 rows)

Query 20180510_030507_00004_58j4x, FINISHED, 1 node
Splits: 18 total, 18 done (100.00%)
0:00 [2 rows, 123B] [9 rows/s, 603B/s]

2.4.4 获取表的数据

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
presto> select * from http.schema.table1;

Query 20180510_031258_00005_58j4x, FAILED, 1 node
Splits: 17 total, 0 done (0.00%)
0:00 [0 rows, 0B] [0 rows/s, 0B/s]

Query 20180510_031258_00005_58j4x failed: For input string: "a"

presto> select * from http.schema.table1;
key1 | key2
------+------
10 | b
1 | d
(2 rows)

Query 20180510_031315_00006_58j4x, FINISHED, 1 node
Splits: 17 total, 17 done (100.00%)
0:00 [2 rows, 0B] [41 rows/s, 0B/s]