转义为JSON键名称中的特殊字符,以便加载到BigQuery或Hive中

时间:2021-09-10 13:48:26

I have a json file with one of the keys containing a special character, "-". JavaScript would not allow it to be used inside names, so does not BigQuery.

我有一个json文件,其中一个键包含一个特殊字符“ - ”。 JavaScript不允许在名称中使用它,因此不是BigQuery。

{"timestamp":"2016-06-01T00:10:55.307Z","ip":"71.223.x.x","user-id":"5755w33e95f626jyh3d31"}

Loading data into BigQuery (from the UI) I don't see how to reference 'user-id'.

将数据加载到BigQuery(从UI)我没有看到如何引用'user-id'。

I tried to quote it, escape quotes, place them in square brackets - nothing worked. This thread suggests it's not allowed. What can I do about it? How JSON is generated is out of my control.

我试着引用它,逃避引号,将它们放在方括号中 - 没有任何效果。这个帖子暗示它是不允许的。我该怎么办?如何生成JSON是我无法控制的。

Same thing in Hive.

在Hive中也是如此。

1 个解决方案

#1


1  

Unfortunately you cannot load these JSON objects directly into BigQuery (via "bq load" or the web UI's "Create Table" flow), since '-' is not a valid character in field names for a BQ table. In other words, there's no way to create a BQ table whose schema matches this JSON data.

遗憾的是,您无法将这些JSON对象直接加载到BigQuery中(通过“bq load”或Web UI的“Create Table”流程),因为' - '不是BQ表的字段名称中的有效字符。换句话说,无法创建其架构与此JSON数据匹配的BQ表。

An alternative is to load your JSON data into BQ as an uninterpreted JSON string (i.e., a BQ table with one field of string type) and then run a query to pull out relevant fields to populate a BQ table.

另一种方法是将JSON数据作为未解释的JSON字符串(即具有一个字符串类型字段的BQ表)加载到BQ中,然后运行查询以提取相关字段以填充BQ表。

Your input data can probably be loaded without modification by picking an obscure character to be the field delimiter and quote character--something that doesn't exist anywhere in your input data. I'd recommend picking something from the bottom half of this chart:

您可以通过选择一个不起眼的字符作为字段分隔符和引号字符来加载您的输入数据而无需修改 - 这些字符在输入数据中的任何位置都不存在。我建议从这个图表的下半部分选择一些东西:

https://en.wikipedia.org/wiki/ISO/IEC_8859-1#Codepage_layout

Once it's ingested into BQ as a single string column, you can use JSON_EXTRACT_SCALAR to pull out fields as needed. For example:

一旦它作为单个字符串列被摄入BQ,您可以根据需要使用JSON_EXTRACT_SCALAR来提取字段。例如:

SELECT
  JSON_EXTRACT_SCALAR(json, '$.timestamp') timestamp,
  JSON_EXTRACT_SCALAR(json, '$.ip') ip,
  JSON_EXTRACT_SCALAR(json, '$.user-id') user_id
FROM
  table

#1


1  

Unfortunately you cannot load these JSON objects directly into BigQuery (via "bq load" or the web UI's "Create Table" flow), since '-' is not a valid character in field names for a BQ table. In other words, there's no way to create a BQ table whose schema matches this JSON data.

遗憾的是,您无法将这些JSON对象直接加载到BigQuery中(通过“bq load”或Web UI的“Create Table”流程),因为' - '不是BQ表的字段名称中的有效字符。换句话说,无法创建其架构与此JSON数据匹配的BQ表。

An alternative is to load your JSON data into BQ as an uninterpreted JSON string (i.e., a BQ table with one field of string type) and then run a query to pull out relevant fields to populate a BQ table.

另一种方法是将JSON数据作为未解释的JSON字符串(即具有一个字符串类型字段的BQ表)加载到BQ中,然后运行查询以提取相关字段以填充BQ表。

Your input data can probably be loaded without modification by picking an obscure character to be the field delimiter and quote character--something that doesn't exist anywhere in your input data. I'd recommend picking something from the bottom half of this chart:

您可以通过选择一个不起眼的字符作为字段分隔符和引号字符来加载您的输入数据而无需修改 - 这些字符在输入数据中的任何位置都不存在。我建议从这个图表的下半部分选择一些东西:

https://en.wikipedia.org/wiki/ISO/IEC_8859-1#Codepage_layout

Once it's ingested into BQ as a single string column, you can use JSON_EXTRACT_SCALAR to pull out fields as needed. For example:

一旦它作为单个字符串列被摄入BQ,您可以根据需要使用JSON_EXTRACT_SCALAR来提取字段。例如:

SELECT
  JSON_EXTRACT_SCALAR(json, '$.timestamp') timestamp,
  JSON_EXTRACT_SCALAR(json, '$.ip') ip,
  JSON_EXTRACT_SCALAR(json, '$.user-id') user_id
FROM
  table