从文本文件中排序字符串出现次数

I have stored strings from a file into an ArrayList, and used a HashSet to count the number of occurrences of each string.

我已将文件中的字符串存储到ArrayList中,并使用HashSet计算每个字符串的出现次数。

I am looking to list the top 5 words and their number of occurrences. I should be able to accomplish this w/o implementing a hashtable, treemap, etc. How can I go about achieving this?

我希望列出前5个单词及其出现次数。我应该能够实现这个,而不是实现哈希表,树形图等。我怎样才能实现这个目标?

Here is my ArrayList:

这是我的ArrayList:

List<String> word_list = new ArrayList<String>();

        while (INPUT_TEXT1.hasNext()) {
            String input_word = INPUT_TEXT1.next();
            word_list.add(input_word);

        }

        INPUT_TEXT1.close();

        int word_list_length = word_list.size();



        System.out.println("There are " + word_list_length + " words in the .txt file");
        System.out.println("\n\n");

        System.out.println("word_list's elements are: ");



        for (int i = 0; i<word_list.size(); i++) {
                System.out.print(word_list.get(i) + "  ");

            }

        System.out.println("\n\n");

Here is my HashSet:

这是我的HashSet:

Set<String> unique_word = new HashSet<String>(word_list);

    int number_of_unique = unique_word.size();

    System.out.println("unique worlds are: ");

    for (String e : unique_word) {
        System.out.print(e + " ");

    }

    System.out.println("\n\n");


    String [] word = new String[number_of_unique];
    int [] freq = new int[number_of_unique];

    int count = 0;

    System.out.println("Frequency counts : ");

    for (String e : unique_word) {
        word[count] = e;
        freq[count] = Collections.frequency(word_list, e);



        System.out.println(word[count] + " : "+ freq[count] + " time(s)");
        count++;

    }

Could it be that I am overthinking a step? Thanks in advance

难道我是在推翻一步吗?提前致谢

2 个解决方案

#1

You can do this using HashMap (holds with unique word as key and frequency as value) and then sorting the values in the reverse order as explained in the below steps:

您可以使用HashMap(使用唯一的单词作为键,频率作为值保存),然后按照相反的顺序对值进行排序,如下面的步骤所述:

(1) Load the word_list with the words

(1)用word加载word_list

(2) Find the unique words from word_list

(2)从word_list中找到唯一的单词

(3) Store the unique words into HashMap with unique word as key and frequency as value

(3)将唯一的单词存储到HashMap中,以唯一的单词为键,频率为值

(4) Sort the HashMap with value (frequency)

(4)用值(频率)对HashMap进行排序

You can refer the below code:

您可以参考以下代码:

public static void main(String[] args) {

        List<String> word_list = new ArrayList<>();
        //Load your words to the word_list here

        //Find the unique words now from list
        String[] uniqueWords = word_list.stream().distinct().
                                       toArray(size -> new String[size]);
        Map<String, Integer> wordsMap = new HashMap<>();
        int frequency = 0;

        //Load the words to Map with each uniqueword as Key and frequency as Value
        for (String uniqueWord : uniqueWords) {
            frequency = Collections.frequency(word_list, uniqueWord);
            System.out.println(uniqueWord+" occured "+frequency+" times");
            wordsMap.put(uniqueWord, frequency);
        }

       //Now, Sort the words with the reverse order of frequency(value of HashMap)
       Stream<Entry<String, Integer>> topWords = wordsMap.entrySet().stream().
         sorted(Map.Entry.<String,Integer>comparingByValue().reversed()).limit(5);

        //Now print the Top 5 words to console
        System.out.println("Top 5 Words:::");
        topWords.forEach(System.out::println);
 }

#2

Using java 8 and putting all code in one block.

使用java 8并将所有代码放在一个块中。

 Stream<Map.Entry<String,Long>> topWords =
            words.stream()
                    .map(String::toLowerCase)
                    .collect(groupingBy(identity(), counting()))
                    .entrySet().stream()
                    .sorted(Map.Entry.<String, Long> comparingByValue(reverseOrder())
                            .thenComparing(Map.Entry.comparingByKey()))
                    .limit(5);

Iterate over stream

迭代流

topWords.forEach(m -> {
            System.out.print(m.getKey() + " : "+ m.getValue() + "time(s)");
        });

#1