If I am not wrong Chinese language (and other languages) doesn't use space ' '
as word delimiter.
如果我没有错,中文(和其他语言)不会使用空格''作为单词分隔符。
So which could be a good algorithm that works internationally?
那么这可能是一个在国际上有效的好算法?
1 个解决方案
#1
3
The technique I've seen used a lot is to simply count the number of characters used and divide this by the average characters per word in Chinese. A number that is often used for this is 1.5
我看过的很多技巧就是简单地统计使用的字符数,并将其除以中文中每个字的平均字符数。通常用于此的数字是1.5
If your Chinese text has 1500 characters, it's approximately 1000 words long.
如果您的中文文本有1500个字符,则长度约为1000字。
I am not aware of a more accurate way of counting words, except for interpreting the text itself. This would mean actually understanding the context of the words used, since a Chinese character can sometimes be used as a word by itself, but also as a component in a composite word.
除了解释文本本身之外,我不知道计算单词的更准确方法。这实际上意味着理解所用单词的上下文,因为中文字符有时可以单独用作单词,但也可以作为复合单词的一个组成部分。
#1
3
The technique I've seen used a lot is to simply count the number of characters used and divide this by the average characters per word in Chinese. A number that is often used for this is 1.5
我看过的很多技巧就是简单地统计使用的字符数,并将其除以中文中每个字的平均字符数。通常用于此的数字是1.5
If your Chinese text has 1500 characters, it's approximately 1000 words long.
如果您的中文文本有1500个字符,则长度约为1000字。
I am not aware of a more accurate way of counting words, except for interpreting the text itself. This would mean actually understanding the context of the words used, since a Chinese character can sometimes be used as a word by itself, but also as a component in a composite word.
除了解释文本本身之外,我不知道计算单词的更准确方法。这实际上意味着理解所用单词的上下文,因为中文字符有时可以单独用作单词,但也可以作为复合单词的一个组成部分。