使用R下载gzip压缩数据文件,提取和导入数据

时间:2022-10-02 20:34:05

A follow up to this question: How can I download and uncompress a gzipped file using R? For example (from the UCI Machine Learning Repository), I have a file of insurance data. How can I download it using R?

这个问题的后续内容:如何使用R下载和解压缩gzip压缩文件?例如(来自UCI机器学习库),我有一个保险数据文件。如何使用R下载?

Here is the data url: http://archive.ics.uci.edu/ml/databases/tic/tic.tar.gz.

这是数据网址:http://archive.ics.uci.edu/ml/databases/tic/tic.tar.gz。

3 个解决方案

#1


18  

I like Ramnath's approach, but I would use temp files like so:

我喜欢Ramnath的方法,但我会像这样使用临时文件:

tmpdir <- tempdir()

url <- 'http://archive.ics.uci.edu/ml/databases/tic/tic.tar.gz'
file <- basename(url)
download.file(url, file)

untar(file, compressed = 'gzip', exdir = tmpdir )
list.files(tmpdir)

The list.files() should produce something like this:

list.files()应该产生这样的东西:

[1] "TicDataDescr.txt" "dictionary.txt"   "ticdata2000.txt"  "ticeval2000.txt"  "tictgts2000.txt" 

which you could parse if you needed to automate this process for a lot of files.

如果你需要为很多文件自动化这个过程,你可以解析它。

#2


7  

Here is a quick way to do it.

这是一个快速的方法。

# create download directory and set it
.exdir = '~/Desktop/tmp'
dir.create(.exdir)
.file = file.path(.exdir, 'tic.tar.gz')

# download file
url = 'http://archive.ics.uci.edu/ml/databases/tic/tic.tar.gz'
download.file(url, .file)

# untar it
untar(.file, compressed = 'gzip', exdir = path.expand(.exdir))

#3


2  

Please the content of help(download.file) for that. If the file in question is merely a gzipped but otherwise readable file, you can feed the complete URL to read.table() et al too.

请帮助(download.file)的内容。如果有问题的文件只是一个gzip压缩但是可读的文件,您可以将完整的URL提供给read.table()等。

#1


18  

I like Ramnath's approach, but I would use temp files like so:

我喜欢Ramnath的方法,但我会像这样使用临时文件:

tmpdir <- tempdir()

url <- 'http://archive.ics.uci.edu/ml/databases/tic/tic.tar.gz'
file <- basename(url)
download.file(url, file)

untar(file, compressed = 'gzip', exdir = tmpdir )
list.files(tmpdir)

The list.files() should produce something like this:

list.files()应该产生这样的东西:

[1] "TicDataDescr.txt" "dictionary.txt"   "ticdata2000.txt"  "ticeval2000.txt"  "tictgts2000.txt" 

which you could parse if you needed to automate this process for a lot of files.

如果你需要为很多文件自动化这个过程,你可以解析它。

#2


7  

Here is a quick way to do it.

这是一个快速的方法。

# create download directory and set it
.exdir = '~/Desktop/tmp'
dir.create(.exdir)
.file = file.path(.exdir, 'tic.tar.gz')

# download file
url = 'http://archive.ics.uci.edu/ml/databases/tic/tic.tar.gz'
download.file(url, .file)

# untar it
untar(.file, compressed = 'gzip', exdir = path.expand(.exdir))

#3


2  

Please the content of help(download.file) for that. If the file in question is merely a gzipped but otherwise readable file, you can feed the complete URL to read.table() et al too.

请帮助(download.file)的内容。如果有问题的文件只是一个gzip压缩但是可读的文件,您可以将完整的URL提供给read.table()等。