如何计算文本(字符串)中的单词数?

时间:2022-09-13 11:37:13

I have this string vector (for example):

我有这个字符串向量(例如)

str <- c("this is a string current trey",
    "feather rtttt",
    "tusla",
    "laq")

To count the number of words in this vector I used this (as given here Count the number of words in a string in R?, which is a possible duplicate but with another issue)

为了计算这个向量中单词的数量,我用了这个(如这里所给的,在R中计算一个字符串中单词的数量?,这是一种可能的重复,但另一个问题)

No_words <- sapply(gregexpr("\\W+", str), length) + 1

but it returns

但它返回

6 2 2 2

String has only 1 element in last two places (i.e. "tusla" and "laq")

字符串在最后两个位置只有1个元素(即:“tusla”和“hcho”)

so it should return

所以它应该返回

6 2 1 1

How do I get around this problem?

我如何解决这个问题?

2 个解决方案

#1


12  

You can try

你可以试着

sapply(gregexpr("\\S+", x), length)
## [1] 6 2 1 1

Or as suggested in comments you can try

或者像评论中建议的那样,你可以试试

sapply(strsplit(x, "\\s+"), length)
## [1] 6 2 1 1

#2


7  

Use the stringi package and stri_count:

使用stringi包和stri_count:

require(stringi)
str <- c(
"this is a string current trey",
"nospaces",
"multiple    spaces",
"   leadingspaces",
"trailingspaces    ",
"    leading and trailing    ",
"just one space each")

> stri_count(str,regex="\\S+")
[1] 6 1 2 1 1 3 4

#1


12  

You can try

你可以试着

sapply(gregexpr("\\S+", x), length)
## [1] 6 2 1 1

Or as suggested in comments you can try

或者像评论中建议的那样,你可以试试

sapply(strsplit(x, "\\s+"), length)
## [1] 6 2 1 1

#2


7  

Use the stringi package and stri_count:

使用stringi包和stri_count:

require(stringi)
str <- c(
"this is a string current trey",
"nospaces",
"multiple    spaces",
"   leadingspaces",
"trailingspaces    ",
"    leading and trailing    ",
"just one space each")

> stri_count(str,regex="\\S+")
[1] 6 1 2 1 1 3 4