在R中,我有一个没有列的数据框架,我想把它分成不同的列

时间:2021-08-11 15:40:35

I am writing a script that goes into our machine and parses a trace file which is a txt file. I am grepping for a particular value, in this example, "RP", and creating a dataframe from that data. Now I have all these rows, but no columns. I would want to split in columns. Here is how it looks like after the grep.

我正在编写一个脚本,该脚本进入我们的机器并解析一个跟踪文件,它是一个txt文件。在本例中,我正在获取一个特定的值“RP”,并从该数据创建一个dataframe。现在我有所有这些行,但是没有列。我想把它分成几列。这是在grep之后的样子。

1 2016-03-14 09:52:38> Microlab® STAR : Firmware Command (Single Step) - complete; > RP: rp+0004

1 2016-03-14 09:52:38 >麦蓝®明星:固件命令(一步)——完成;> RP:+ 0004卢比

What I would want is... Date Pressure 2016-03-14 09:52:38 rp+0004

我想要的是……日期压力2016-03-14 09:52:38 rp+0004

options(warn=-1)
#Select Copy From Dir, Change \\ to /
copyfrom<-gsub("\\\\","/",choose.dir(default = "", caption = "Select folder you wish to copy files from"))
#File names
listfiles<-list.files(copyfrom)
#Total amount of files
totalfiles=length(listfiles)
#Select Copy To Dir, Change \\ to /
copyto<-gsub("\\\\","/",choose.dir(default = "", caption = "Select folder you wish to copy files to"))


#Loop through all files in direct
for (totalfiles in 1:totalfiles)
{
  #Opening the file based on how many files present
  con <- file(paste0(copyfrom,"/",listfiles[totalfiles]))
  #open connection to file
  open(con); 
  #read file
  read <- readLines(con) 
  #search file for particular value
  searched_entries = grep("RP", read, value = T)
  #write file, remove .trc from file name and add _parsed
  writeLines(searched_entries, con = paste0(copyto,"/",gsub(".trc","",listfiles[totalfiles]),"_parsed.txt"))
  #close connection and print total files parsed
  close(con)
  print(totalfiles)
}

Here is the data frame:

这是数据框架:

2016-03-14 09:52:38> Microlab® STAR : Firmware Command (Single Step) - complete; > RP: rp+0004 2016-03-14 09:52:39> Microlab® STAR : Firmware Command (Single Step) - complete; > RP: rp+0000 2016-03-14 09:52:39> Microlab® STAR : Firmware Command (Single Step) - complete; > RP: rp+0000 2016-03-14 09:52:39> Microlab® STAR : Firmware Command (Single Step) - complete; > RP: rp+0000 etc..

2016-03-14 09:52:38 >麦蓝®明星:固件命令(一步)——完成;> RP:RP + 0004 0004 09:52:39 >麦蓝®明星:固件命令(一步)——完成;> RP:RP + 0000 0000 09:52:39 >麦蓝®明星:固件命令(一步)——完成;> RP:RP + 0000 0000 09:52:39 >麦蓝®明星:固件命令(一步)——完成;> RP:RP + 0000等。

I would like to end up with a 2 columns, one with the date (2016-03-14 09:52:3) the other with RP number (rp+0000) Let me know if you would like me to clarify further.

最后我想写两篇专栏文章,一篇是日期(2016-03-14 09:52:3),另一篇是RP编号(RP +0000),如果你想让我进一步澄清一下。

Here is the Trace file. You can copy paste this into notepad and save it as a .txt file Name of file: StarLineDailyMaintenance_8715f3804819481aae1cae3a479556aa_Trace.trc

这是跟踪文件。您可以将它复制到记事本中,并将其保存为文件的.txt文件名称:starlinedailymaintenance_8715f3804819481cae3a479556aa_trace.trc。

2016-03-14 09:52:38> Microlab® STAR : Firmware Command (Single Step) - complete; > RP: rp+0004 2016-03-14 09:52:39> Microlab® STAR : Firmware Command (Single Step) - complete; > RP: rp+0000 2016-03-14 09:52:39> Microlab® STAR : Firmware Command (Single Step) - complete; > RP: rp+0000 2016-03-14 09:52:39> Microlab® STAR : Firmware Command (Single Step) - complete; > RP: rp+0000 2016-03-14 09:52:40> Microlab® STAR : Firmware Command (Single Step) - complete; > RP: rp+0000 2016-03-14 09:52:40> Microlab® STAR : Firmware Command (Single Step) - complete; > RP: rp+0000 2016-03-14 09:52:40> Microlab® STAR : Firmware Command (Single Step) - complete; > RP: rp+0000 2016-03-14 09:52:41> Microlab® STAR : Firmware Command (Single Step) - complete; > RP: rp+0000 2016-03-14 09:52:41> Microlab® STAR : Firmware Command (Single Step) - complete; > RP: rp+0000 2016-03-14 09:52:41> Microlab® STAR : Firmware Command (Single Step) - complete; > RP: rp+0000 2016-03-14 09:52:42> Microlab® STAR : Firmware Command (Single Step) - complete; > RP: rp+0000 2016-03-14 09:52:42> Microlab® STAR : Firmware Command (Single Step) - complete; > RP: rp+0000 2016-03-14 09:52:42> Microlab® STAR : Firmware Command (Single Step) - complete; > RP: rp+0000 2016-03-14 09:52:43> Microlab® STAR : Firmware Command (Single Step) - complete; > RP: rp+0000 2016-03-14 09:52:43> Microlab® STAR : Firmware Command (Single Step) - complete; > RP: rp+0000 2016-03-14 09:52:43> Microlab® STAR : Firmware Command (Single Step) - complete; > RP: rp+0000 2016-03-14 09:52:44> Microlab® STAR : Firmware Command (Single Step) - complete; > RP: rp+0000 2016-03-14 09:52:44> Microlab® STAR : Firmware Command (Single Step) - complete; > RP: rp+0000 2016-03-14 09:52:45> Microlab® STAR : Firmware Command (Single Step) - complete; > RP: rp+0000 2016-03-14 09:52:45> Microlab® STAR : Firmware Command (Single Step) - complete; > RP: rp+0000 2016-03-14 09:52:45> Microlab® STAR : Firmware Command (Single Step) - complete; > RP: rp+0000 2016-03-14 09:52:46> Microlab® STAR : Firmware Command (Single Step) - complete; > RP: rp+0000 2016-03-14 09:52:46> Microlab® STAR : Firmware Command (Single Step) - complete; > RP: rp+0000 2016-03-14 09:52:46> Microlab® STAR : Firmware Command (Single Step) - complete; > RP: rp+0000 2016-03-14 09:52:47> Microlab® STAR : Firmware Command (Single Step) - complete; > RP: rp+0000 2016-03-14 09:52:47> Microlab® STAR : Firmware Command (Single Step) - complete; > RP: rp+0000 2016-03-14 09:52:47> Microlab® STAR : Firmware Command (Single Step) - complete; > RP: rp+0000 2016-03-14 09:52:48> Microlab® STAR : Firmware Command (Single Step) - complete; > RP: rp+0000 2016-03-14 09:52:48> Microlab® STAR : Firmware Command (Single Step) - complete; > RP: rp+0000 2016-03-14 09:52:48> Microlab® STAR : Firmware Command (Single Step) - complete; > RP: rp+0000 2016-03-14 09:52:49> Microlab® STAR : Firmware Command (Single Step) - complete; > RP: rp+4067 2016-03-14 09:52:50> Microlab® STAR : Firmware Command (Single Step) - complete; > RP: rp+4057 2016-03-14 09:52:50> Microlab® STAR : Firmware Command (Single Step) - complete; > RP: rp+4028 2016-03-14 09:52:51> Microlab® STAR : Firmware Command (Single Step) - complete; > RP: rp+4057 2016-03-14 09:52:52> Microlab® STAR : Firmware Command (Single Step) - complete; > RP: rp+4082 2016-03-14 09:52:52> Microlab® STAR : Firmware Command (Single Step) - complete; > RP: rp+4125 2016-03-14 09:52:53> Microlab® STAR : Firmware Command (Single Step) - complete; > RP: rp+4082

2016-03-14 09:52:38 >麦蓝®明星:固件命令(一步)——完成;> RP:RP + 0004 0004 09:52:39 >麦蓝®明星:固件命令(一步)——完成;> RP:RP + 0000 0000 09:52:39 >麦蓝®明星:固件命令(一步)——完成;> RP:RP + 0000 0000 09:52:39 >麦蓝®明星:固件命令(一步)——完成;> RP:RP + 0000 0000 09:52:40 >麦蓝®明星:固件命令(一步)——完成;> RP:RP + 0000 0000 09:52:40 >麦蓝®明星:固件命令(一步)——完成;> RP:RP + 0000 0000 09:52:40 >麦蓝®明星:固件命令(一步)——完成;> RP:RP + 0000 0000 09:52:41 >麦蓝®明星:固件命令(一步)——完成;> RP:RP + 0000 0000 09:52:41 >麦蓝®明星:固件命令(一步)——完成;> RP:RP + 0000 0000 09:52:41 >麦蓝®明星:固件命令(一步)——完成;> RP:RP + 0000 0000 09:52:42 >麦蓝®明星:固件命令(一步)——完成;> RP:RP + 0000 0000 09:52:42 >麦蓝®明星:固件命令(一步)——完成;> RP:RP + 0000 0000 09:52:42 >麦蓝®明星:固件命令(一步)——完成;> RP:RP + 0000 0000 09:52:43 >麦蓝®明星:固件命令(一步)——完成;> RP:RP + 0000 0000 09:52:43 >麦蓝®明星:固件命令(一步)——完成;> RP:RP + 0000 0000 09:52:43 >麦蓝®明星:固件命令(一步)——完成;> RP:RP + 0000 0000 09:52:44 >麦蓝®明星:固件命令(一步)——完成;> RP:RP + 0000 0000 09:52:44 >麦蓝®明星:固件命令(一步)——完成;> RP:RP + 0000 0000 09:52:45 >麦蓝®明星:固件命令(一步)——完成;> RP:RP + 0000 0000 09:52:45 >麦蓝®明星:固件命令(一步)——完成;> RP:RP + 0000 0000 09:52:45 >麦蓝®明星:固件命令(一步)——完成;> RP:RP + 0000 0000 09:52:46 >麦蓝®明星:固件命令(一步)——完成;> RP:RP + 0000 0000 09:52:46 >麦蓝®明星:固件命令(一步)——完成;> RP:RP + 0000 0000 09:52:46 >麦蓝®明星:固件命令(一步)——完成;> RP:RP + 0000 0000 09:52:47 >麦蓝®明星:固件命令(一步)——完成;> RP:RP + 0000 0000 09:52:47 >麦蓝®明星:固件命令(一步)——完成;> RP:RP + 0000 0000 09:52:47 >麦蓝®明星:固件命令(一步)——完成;> RP:RP + 0000 0000 09:52:48 >麦蓝®明星:固件命令(一步)——完成;> RP:RP + 0000 0000 09:52:48 >麦蓝®明星:固件命令(一步)——完成;> RP:RP + 0000 0000 09:52:48 >麦蓝®明星:固件命令(一步)——完成;> RP:RP + 0000 0000 09:52:49 >麦蓝®明星:固件命令(一步)——完成;> RP:RP + 4067 4067 09:52:50 >麦蓝®明星:固件命令(一步)——完成;> RP:RP + 4057 4057 09:52:50 >麦蓝®明星:固件命令(一步)——完成;> RP:RP + 4028 4028 09:52:51 >麦蓝®明星:固件命令(一步)——完成;> RP:RP + 4057 4057 09:52:52 >麦蓝®明星:固件命令(一步)——完成;> RP:RP + 4082 4082 09:52:52 >麦蓝®明星:固件命令(一步)——完成;> RP:RP + 4125 4125 09:52:53 >麦蓝®明星:固件命令(一步)——完成;> RP:+ 4082卢比

1 个解决方案

#1


0  

You can use str_match from stringr with a regex to parse the lines;

可以使用stringr中的str_match和regex来解析这些行;

> library(stringr)

> df

#                                                                                               V1
# 1 2016-03-14 09:52:38> Microlab® STAR : Firmware Command (Single Step) - complete; > RP: rp+0004 
# 2 2016-03-14 09:52:39> Microlab® STAR : Firmware Command (Single Step) - complete; > RP: rp+0000 
# 3 2016-03-14 09:52:39> Microlab® STAR : Firmware Command (Single Step) - complete; > RP: rp+0000 
# 4 2016-03-14 09:52:39> Microlab® STAR : Firmware Command (Single Step) - complete; > RP: rp+0000

> f <- as.data.frame(str_match(df$V1, '^([^>]*)>[^>]*> RP: ([^ ]*) *$')[,-1])
> colnames(f) = c('Date', 'Pressure')
> f

#                  Date Pressure
# 1 2016-03-14 09:52:38  rp+0004
# 2 2016-03-14 09:52:39  rp+0000
# 3 2016-03-14 09:52:39  rp+0000
# 4 2016-03-14 09:52:39  rp+0000

The somewhat complex looking regex basically grabs everything up to the first > to column 1, and everything efter > RP: to column 2.

看起来有点复杂的regex基本上可以将所有东西都捕获到第一个>到第1列,以及所有efter > RP:到第2列。

Assuming the file is line feed separated with the format given, you could also just parse the data straight from the file using read.pattern from gsubfn with the same regex;

假设该文件是与给定格式分隔的行提要,您还可以使用read直接从文件解析数据。使用相同regex的gsubfn模式;

> library(gsubfn)

> f = read.pattern('Test/test.txt', '^([^>]*)>[^>]*> RP: ([^ ]*) *$')
> colnames(f) = c('Date', 'Pressure')
> f

#                  Date Pressure
# 1 2016-03-14 09:52:38  rp+0004
# 2 2016-03-14 09:52:39  rp+0000
# 3 2016-03-14 09:52:39  rp+0000
# 4 2016-03-14 09:52:39  rp+0000

#1


0  

You can use str_match from stringr with a regex to parse the lines;

可以使用stringr中的str_match和regex来解析这些行;

> library(stringr)

> df

#                                                                                               V1
# 1 2016-03-14 09:52:38> Microlab® STAR : Firmware Command (Single Step) - complete; > RP: rp+0004 
# 2 2016-03-14 09:52:39> Microlab® STAR : Firmware Command (Single Step) - complete; > RP: rp+0000 
# 3 2016-03-14 09:52:39> Microlab® STAR : Firmware Command (Single Step) - complete; > RP: rp+0000 
# 4 2016-03-14 09:52:39> Microlab® STAR : Firmware Command (Single Step) - complete; > RP: rp+0000

> f <- as.data.frame(str_match(df$V1, '^([^>]*)>[^>]*> RP: ([^ ]*) *$')[,-1])
> colnames(f) = c('Date', 'Pressure')
> f

#                  Date Pressure
# 1 2016-03-14 09:52:38  rp+0004
# 2 2016-03-14 09:52:39  rp+0000
# 3 2016-03-14 09:52:39  rp+0000
# 4 2016-03-14 09:52:39  rp+0000

The somewhat complex looking regex basically grabs everything up to the first > to column 1, and everything efter > RP: to column 2.

看起来有点复杂的regex基本上可以将所有东西都捕获到第一个>到第1列,以及所有efter > RP:到第2列。

Assuming the file is line feed separated with the format given, you could also just parse the data straight from the file using read.pattern from gsubfn with the same regex;

假设该文件是与给定格式分隔的行提要,您还可以使用read直接从文件解析数据。使用相同regex的gsubfn模式;

> library(gsubfn)

> f = read.pattern('Test/test.txt', '^([^>]*)>[^>]*> RP: ([^ ]*) *$')
> colnames(f) = c('Date', 'Pressure')
> f

#                  Date Pressure
# 1 2016-03-14 09:52:38  rp+0004
# 2 2016-03-14 09:52:39  rp+0000
# 3 2016-03-14 09:52:39  rp+0000
# 4 2016-03-14 09:52:39  rp+0000