如何从外部链接(Amazon S3)使用SheetJS解析Excel文件

时间:2022-04-27 16:11:49

I am trying to parse an excel file that I already have the URL for. I keep getting different errors when trying to access the file so that it can be readable. Right now, here is my code:

我正在尝试解析我已经拥有URL的excel文件。尝试访问该文件时,我一直遇到不同的错误,以便它可以读取。现在,这是我的代码:

  const input_file = doc.input_file;
  const extension = input_file.split('.').pop();



  let XMLHttpRequest = require("xmlhttprequest").XMLHttpRequest;
  let oReq = new XMLHttpRequest();
  oReq.open("GET", input_file, true);
  oReq.responseType = "arraybuffer";

  oReq.onload = function(e) {
    let arraybuffer = oReq.responseText;
    /* convert data to binary string */
    let data = new Uint8Array(arraybuffer);
    let arr = new Array();
    for(let i = 0; i != data.length; ++i) arr[i] = String.fromCharCode(data[i]);
    let bstr = arr.join("");

    /* Call XLSX */
    let workbook = XLSX.read(bstr, {type:"binary"});

    /* DO SOMETHING WITH workbook HERE */
    let firstSheet = workbook.SheetNames[0];
    let parsed = XLSX.utils.sheet_to_csv(firstSheet);
    console.log(parsed);
  }

  oReq.send();

The current error I am getting is: Error: Unsupported file NaN at the when I try to read the file at: let workbook = XLSX.read(bstr, {type:"binary"});

我得到的当前错误是:错误:当我尝试读取文件时不支持的文件NaN:let workbook = XLSX.read(bstr,{type:“binary”});

I'm unsure on the easiest way to read that external link. Any ideas? If it helps, I am using Meteor.

我不确定读取外部链接的最简单方法。有任何想法吗?如果它有帮助,我正在使用Meteor。

4 个解决方案

#1


2  

This is a tried-and-true answer.

There are two problems with your code:

您的代码有两个问题:

  1. for binary files, it should be let arraybuffer = oReq.response;, not let arraybuffer = oReq.responseText;

    对于二进制文件,应该让arraybuffer = oReq.response;,不要让arraybuffer = oReq.responseText;

  2. You should enabled Cross-Origin Resource Sharing on your Amazon S3 instance. Just follow the official tutorial here.

    您应该在Amazon S3实例上启用跨源资源共享。请按照此处的官方教程。

Here is a working codepen:

这是一个工作的codepen:

http://codepen.io/KevinWang15/pen/GZXJKj

http://codepen.io/KevinWang15/pen/GZXJKj

Are you using nodeJS?

note: The above code just uses the web browser's (chrome) XMLHttpRequest, I'm noticing that you are using

注意:上面的代码只使用了Web浏览器的(chrome)XMLHttpRequest,我注意到你正在使用

XMLHttpRequest = require("xmlhttprequest").XMLHttpRequest

Are you using something like nodejs? (Sorry I'm not familiar with Meteor)

你在使用像nodejs这样的东西吗? (对不起,我不熟悉Meteor)

More specifically, are you using driverdan/node-XMLHttpRequest ?

更具体地说,您使用的是driverdan / node-XMLHttpRequest吗?

I experimented with it and your code, and it led to exactly the same error message. I think it's because this XMLHttpRequest still has compatibility problem with oReq.response and oReq.responseText

我试验了它和你的代码,它导致了完全相同的错误信息。我认为这是因为这个XMLHttpRequest仍然与oReq.response和oReq.responseText存在兼容性问题

If you are using nodeJS, I recommend another library: ykzts/node-xmlhttprequest

如果您使用的是nodeJS,我推荐另一个库:ykzts / node-xmlhttprequest

Install it with

安装它

npm i w3c-xmlhttprequest

Change your XMLHttpRequest with

使用更改XMLHttpRequest

let XMLHttpRequest = require('w3c-xmlhttprequest').XMLHttpRequest;

And it instantly solves the problem!

它立即解决了这个问题!

#2


2  

A better idea might be to use the Meteor's HTTP package to get the file. The docs are here

更好的想法可能是使用Meteor的HTTP包来获取文件。文档在这里

Add the package using

使用添加包

meteor add http

And then use :

然后使用:

let result = HTTP.get(input_file,function (error,result){
//process result here
});

result.data will contain your Excel file which you can comfortably parse using SheetJS.

result.data将包含您可以使用SheetJS轻松解析的Excel文件。

However, make sure that you have allowed Cross Origin on Amazon S3 or you'll receive an error of the form :

但是,请确保您已在Amazon S3上允许Cross Origin,否则您将收到以下格式的错误:

"No 'Access-Control-Allow-Origin' header is present on the requested resource. Origin 'blah blah' is therefore not allowed access."

“请求资源上没有'Access-Control-Allow-Origin'标题。因此不允许来源'blah blah'访问。”

#3


1  

XMLHttpRequest is restricted by the Same Origin Policy, meaning you can only access content directly from your own domain.

XMLHttpRequest受同源策略的限制,这意味着您只能直接从您自己的域访问内容。

But you can create a service on your server which would load the sheet for you and pass it back to the client.

但是您可以在服务器上创建一个服务,该服务将为您加载工作表并将其传递回客户端。

Here is a straight forward tutorial.

这是一个直接的教程。

But please be aware that a general approach of loading third-party files can be a serious security issue. So if your URL to the sheet is constant you may consider only loading this specific link through a php script and not allowing any other URLs.

但请注意,加载第三方文件的一般方法可能是一个严重的安全问题。因此,如果您的工作表的URL是常量,您可以考虑只通过PHP脚本加载此特定链接,不允许任何其他URL。

#4


0  

I ended up using a combination of a few of these answers. I want to post it here just in case it helps anyone else.

我最后使用了一些这些答案的组合。我想在这里发布,以防它帮助其他人。

I started at using the Meteor HTTP package as mentioned by Achal.

我开始使用Achal提到的Meteor HTTP包。

meteor add http

I also added an additional package from the Meteor community that allowed for response type to be added in.

我还从Meteor社区添加了一个额外的包,允许添加响应类型。

meteor add aldeed:http

Then, I used the following code to convert to binary and could proceed with reading the sheet:

然后,我使用以下代码转换为二进制文件,并继续阅读工作表:

HTTP.get(input_file, {responseType: 'arraybuffer'}, function(error, result) {
  let data = new Uint8Array(result.content);
  let arr = new Array();
  for(let i = 0; i != data.length; ++i) arr[i] = String.fromCharCode(data[i]);
  let bstr = arr.join("");

  let workbook = XLSX.read(bstr, {type:"binary"});
  var first_sheet_name = workbook.SheetNames[0];
  let sheet = workbook.Sheets[first_sheet_name];
  let parsed = XLSX.utils.sheet_to_json(sheet);
});

#1


2  

This is a tried-and-true answer.

There are two problems with your code:

您的代码有两个问题:

  1. for binary files, it should be let arraybuffer = oReq.response;, not let arraybuffer = oReq.responseText;

    对于二进制文件,应该让arraybuffer = oReq.response;,不要让arraybuffer = oReq.responseText;

  2. You should enabled Cross-Origin Resource Sharing on your Amazon S3 instance. Just follow the official tutorial here.

    您应该在Amazon S3实例上启用跨源资源共享。请按照此处的官方教程。

Here is a working codepen:

这是一个工作的codepen:

http://codepen.io/KevinWang15/pen/GZXJKj

http://codepen.io/KevinWang15/pen/GZXJKj

Are you using nodeJS?

note: The above code just uses the web browser's (chrome) XMLHttpRequest, I'm noticing that you are using

注意:上面的代码只使用了Web浏览器的(chrome)XMLHttpRequest,我注意到你正在使用

XMLHttpRequest = require("xmlhttprequest").XMLHttpRequest

Are you using something like nodejs? (Sorry I'm not familiar with Meteor)

你在使用像nodejs这样的东西吗? (对不起,我不熟悉Meteor)

More specifically, are you using driverdan/node-XMLHttpRequest ?

更具体地说,您使用的是driverdan / node-XMLHttpRequest吗?

I experimented with it and your code, and it led to exactly the same error message. I think it's because this XMLHttpRequest still has compatibility problem with oReq.response and oReq.responseText

我试验了它和你的代码,它导致了完全相同的错误信息。我认为这是因为这个XMLHttpRequest仍然与oReq.response和oReq.responseText存在兼容性问题

If you are using nodeJS, I recommend another library: ykzts/node-xmlhttprequest

如果您使用的是nodeJS,我推荐另一个库:ykzts / node-xmlhttprequest

Install it with

安装它

npm i w3c-xmlhttprequest

Change your XMLHttpRequest with

使用更改XMLHttpRequest

let XMLHttpRequest = require('w3c-xmlhttprequest').XMLHttpRequest;

And it instantly solves the problem!

它立即解决了这个问题!

#2


2  

A better idea might be to use the Meteor's HTTP package to get the file. The docs are here

更好的想法可能是使用Meteor的HTTP包来获取文件。文档在这里

Add the package using

使用添加包

meteor add http

And then use :

然后使用:

let result = HTTP.get(input_file,function (error,result){
//process result here
});

result.data will contain your Excel file which you can comfortably parse using SheetJS.

result.data将包含您可以使用SheetJS轻松解析的Excel文件。

However, make sure that you have allowed Cross Origin on Amazon S3 or you'll receive an error of the form :

但是,请确保您已在Amazon S3上允许Cross Origin,否则您将收到以下格式的错误:

"No 'Access-Control-Allow-Origin' header is present on the requested resource. Origin 'blah blah' is therefore not allowed access."

“请求资源上没有'Access-Control-Allow-Origin'标题。因此不允许来源'blah blah'访问。”

#3


1  

XMLHttpRequest is restricted by the Same Origin Policy, meaning you can only access content directly from your own domain.

XMLHttpRequest受同源策略的限制,这意味着您只能直接从您自己的域访问内容。

But you can create a service on your server which would load the sheet for you and pass it back to the client.

但是您可以在服务器上创建一个服务,该服务将为您加载工作表并将其传递回客户端。

Here is a straight forward tutorial.

这是一个直接的教程。

But please be aware that a general approach of loading third-party files can be a serious security issue. So if your URL to the sheet is constant you may consider only loading this specific link through a php script and not allowing any other URLs.

但请注意,加载第三方文件的一般方法可能是一个严重的安全问题。因此,如果您的工作表的URL是常量,您可以考虑只通过PHP脚本加载此特定链接,不允许任何其他URL。

#4


0  

I ended up using a combination of a few of these answers. I want to post it here just in case it helps anyone else.

我最后使用了一些这些答案的组合。我想在这里发布,以防它帮助其他人。

I started at using the Meteor HTTP package as mentioned by Achal.

我开始使用Achal提到的Meteor HTTP包。

meteor add http

I also added an additional package from the Meteor community that allowed for response type to be added in.

我还从Meteor社区添加了一个额外的包,允许添加响应类型。

meteor add aldeed:http

Then, I used the following code to convert to binary and could proceed with reading the sheet:

然后,我使用以下代码转换为二进制文件,并继续阅读工作表:

HTTP.get(input_file, {responseType: 'arraybuffer'}, function(error, result) {
  let data = new Uint8Array(result.content);
  let arr = new Array();
  for(let i = 0; i != data.length; ++i) arr[i] = String.fromCharCode(data[i]);
  let bstr = arr.join("");

  let workbook = XLSX.read(bstr, {type:"binary"});
  var first_sheet_name = workbook.SheetNames[0];
  let sheet = workbook.Sheets[first_sheet_name];
  let parsed = XLSX.utils.sheet_to_json(sheet);
});