在SAS宏中解析JSON对象 - 第2部分 - 使用OUTPUT函数处理嵌套值

时间:2022-09-30 21:30:48

This question is related to: prior question link

此问题与:先前的问题链接有关

I have a JSON file that looks like:

我有一个看起来像这样的JSON文件:

    [
      {
        "rxnorm_id": "999999999",
        "drug_name": "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx",
        "plans": [
          {
            "plan_id_type": "xxxxxxxxxxxxx",
            "plan_id": "999999999999999",
            "drug_tier": "xxxxxxxxxxxxxxx",
            "prior_authorization": false,
            "step_therapy": false,
            "quantity_limit": false
           },

I am able to import every line into SAS that has 'rxnorm_id and drug_name using this code:

我可以使用以下代码将每一行导入到具有'rxnorm_id和drug_name的SAS中:

    >
    filename data url 'http://stg-oh-medicaid.molinahealthcare.com/JSON/Drugs_Molina_Healthcare.json';
    data formularies;
    infile data lrecl = 32000 truncover scanover;
    input   @'"rxnorm_id": "' rxnorm_id $255.
    @'"drug_name": "' drug_name $255.
    @'"plan_id_type": "' plan_id_type $255. 
    @'"plan_id": "' plan_id $255.
    @'"drug_tier": "' drug_tier $255.
    @'"prior_authorization": ' prior_authorization $255.
    @'"step_therapy": ' step_therapy $255.
    @'"quantity_limit": ' quantity_limit $255.;
    rxnorm_id = scan(rxnorm_id,1,'",');
    drug_name = scan(drug_name,1,'",');
    plan_id_type = scan(plan_id_type,1,'",');
    plan_id = scan(plan_id,1,'",');
    drug_tier = scan(drug_tier,1,'",');
    prior_authorization = scan(prior_authorization,1,'",');
    step_therapy = scan(step_therapy,1,'",');
    quantity_limit = scan(quantity_limit,1,'",');
    run;

But, I want to pick up all of the values in the 'plans' nest that are in between the rxnorm and drug name values. Someone suggested using the OUTPUT option in SAS to see the missing rows. Anyone got a good fix to my code to do this?

但是,我想要获取rxnorm和药物名称值之间的'计划'嵌套中的所有值。有人建议使用SAS中的OUTPUT选项查看缺失的行。任何人都有一个很好的修复我的代码来做到这一点?

Thanks

1 个解决方案

#1


1  

As of 9.4, the best way to parse JSON in SAS is using PROC GROOVY. That is what I recommend. You can also do it with DS2. If you are adventurous, and on 9.4m3, you can also use PROC LUA. That is what I would try, since it allows you to manipulate SAS datasets easily.

从9.4开始,在SAS中解析JSON的最佳方法是使用PROC GROOVY。这就是我的建议。您也可以使用DS2。如果您喜欢冒险,并且在9.4m3,您也可以使用PROC LUA。这就是我想要的,因为它允许您轻松操作SAS数据集。

That being said, if you can rely on the simple structure of your example, then you can select only the lines that have fields and output them in the format you wanted using regular expressions in data step:

话虽这么说,如果您可以依赖示例的简单结构,那么您只能选择包含字段的行,并在数据步骤中使用正则表达式以您希望的格式输出它们:

data want;
    infile 'c:/tmp/json_snippet.txt';
    length field $20 data $100;
    keep field data;
    retain re;

    input;
    if _n_ = 1 then do;
        re = prxparse('/"(.*?)": "?(true|false|.*?(?="))/');
    end;

    if prxmatch(re,_infile_); /* grep only matching lines */

    call prxposn(re,1,start,len);
    field = substr(_infile_,start,len);
    call prxposn(re,2,start,len);
    data  = substr(_infile_,start,len);
run;

Caveat emptor: A wise person said that when you solve a problem using regular expressions, now you have two problems :). Among the things that can go wrong:

警告:一个聪明的人说,当你用正则表达式解决问题时,现在你有两个问题:)。可能出错的事情包括:

  • line breaks
  • using ' instead of " for string delimiters
  • 使用'而不是'来表示字符串分隔符

  • lengths
  • mixed types

#1


1  

As of 9.4, the best way to parse JSON in SAS is using PROC GROOVY. That is what I recommend. You can also do it with DS2. If you are adventurous, and on 9.4m3, you can also use PROC LUA. That is what I would try, since it allows you to manipulate SAS datasets easily.

从9.4开始,在SAS中解析JSON的最佳方法是使用PROC GROOVY。这就是我的建议。您也可以使用DS2。如果您喜欢冒险,并且在9.4m3,您也可以使用PROC LUA。这就是我想要的,因为它允许您轻松操作SAS数据集。

That being said, if you can rely on the simple structure of your example, then you can select only the lines that have fields and output them in the format you wanted using regular expressions in data step:

话虽这么说,如果您可以依赖示例的简单结构,那么您只能选择包含字段的行,并在数据步骤中使用正则表达式以您希望的格式输出它们:

data want;
    infile 'c:/tmp/json_snippet.txt';
    length field $20 data $100;
    keep field data;
    retain re;

    input;
    if _n_ = 1 then do;
        re = prxparse('/"(.*?)": "?(true|false|.*?(?="))/');
    end;

    if prxmatch(re,_infile_); /* grep only matching lines */

    call prxposn(re,1,start,len);
    field = substr(_infile_,start,len);
    call prxposn(re,2,start,len);
    data  = substr(_infile_,start,len);
run;

Caveat emptor: A wise person said that when you solve a problem using regular expressions, now you have two problems :). Among the things that can go wrong:

警告:一个聪明的人说,当你用正则表达式解决问题时,现在你有两个问题:)。可能出错的事情包括:

  • line breaks
  • using ' instead of " for string delimiters
  • 使用'而不是'来表示字符串分隔符

  • lengths
  • mixed types