将数据加载到Amazon redshift中,并将分隔符作为多个空格

时间:2022-09-30 23:10:46

I am trying to load 73 local files onto redshift. The data do not have common delimiters such as comma or tab. Instead, the delimiter is 13 spaces. Is there a way to treat these spaces as delimiters?

我正在尝试将73个本地文件加载到redshift上。数据没有常见的分隔符,例如逗号或制表符。相反,分隔符是13个空格。有没有办法将这些空间视为分隔符?

I am using the same example from AWS documentation. The actual data looks like the following:

我在AWS文档中使用了相同的示例。实际数据如下所示:

1          ToyotaPark          Bridgeview          IL
2          ColumbusCrewStadium          Columbus          OH
3          RFKStadium          Washington          DC
4          CommunityAmericaBallpark          KansasCity          KS
5          GilletteStadium          Foxborough          MA
6          NewYorkGiantsStadium          EastRutherford          NJ
7          BMOField          Toronto          ON
8          TheHomeDepotCenter          Carson          CA
9          Dick'sSportingGoodsPark          CommerceCity          CO
10          PizzaHutPark          Frisco          TX

Sample code:

create table venue_new(
    venueid smallint not null,
    venuename varchar(100) not null,
    venuecity varchar(30),
    venuestate char(2),
    venueseats integer not null default '1000');

copy venue_new(venueid, venuename, venuecity, venuestate) 
from 's3://mybucket/data/venue_noseats.txt' 
credentials 'aws_access_key_id=<access-key-id>;aws_secret_access_key=<secret-access-key>'
delimiter '          ';

The actual data has about 80 columns with different width. The good thing is that there is no space in each data element. Instead of specifying fixed width for each column. Is there an easier way to delimit the data by 13 spaces?

实际数据有大约80列,宽度不同。好处是每个数据元素都没有空间。而不是为每列指定固定宽度。有没有更简单的方法来划分13个空格的数据?

1 个解决方案

#1


1  

The copy command only allows for single character delimiters, so you cannot import this data directly into your target table. Instead, you will need to create a staging table:

复制命令仅允许单字符分隔符,因此您无法将此数据直接导入目标表。相反,您需要创建一个临时表:

create table stage_venue (venue_record varchar(200));

Run your copy command (assuming your data does not have the pipe, |, character in it):

运行您的复制命令(假设您的数据中没有管道,|,字符):

copy stage_venue from 's3://mybucket/data/venue_noseats.txt' credentials 'aws_access_key_id=<access-key-id>;aws_secret_access_key=<secret-access-key>';

Then use the split command to populated your target table (note that I counted only 10 spaces and not 13 in your sample):

然后使用split命令填充目标表(请注意,我的样本中只计算了10个空格而不是13个):

insert into venue_new (venueid, venuename, venuecity, venuestate), select split_part(venue_record,'          ',1),split_part(venue_record,'          ',2),split_part(venue_record,'          ',3),split_part(venue_record,'          ',4) from stage_venue;

#1


1  

The copy command only allows for single character delimiters, so you cannot import this data directly into your target table. Instead, you will need to create a staging table:

复制命令仅允许单字符分隔符,因此您无法将此数据直接导入目标表。相反,您需要创建一个临时表:

create table stage_venue (venue_record varchar(200));

Run your copy command (assuming your data does not have the pipe, |, character in it):

运行您的复制命令(假设您的数据中没有管道,|,字符):

copy stage_venue from 's3://mybucket/data/venue_noseats.txt' credentials 'aws_access_key_id=<access-key-id>;aws_secret_access_key=<secret-access-key>';

Then use the split command to populated your target table (note that I counted only 10 spaces and not 13 in your sample):

然后使用split命令填充目标表(请注意,我的样本中只计算了10个空格而不是13个):

insert into venue_new (venueid, venuename, venuecity, venuestate), select split_part(venue_record,'          ',1),split_part(venue_record,'          ',2),split_part(venue_record,'          ',3),split_part(venue_record,'          ',4) from stage_venue;