30. Substring with Concatenation of All Words

题目：

You are given a string, s, and a list of words, words, that are all of the same length. Find all starting indices of substring(s) in s that is a concatenation of each word in wordsexactly once and without any intervening characters.

For example, given:
s: "barfoothefoobarman"
words: ["foo", "bar"]

You should return the indices: [0,9].
(order does not matter).

链接： http://leetcode.com/problems/substring-with-concatenation-of-all-words/

题解：

第一反应是用Trie，像spell checker一样，在Trie中检查这段text是不是word。（待补充）

第二反应是做一个类似于DFA的状态机。（待补充）

第三反应是看答案。默默看答案，看到大家都用HashMap，所以也写了用两个HashMap的，提交就超时了。再想一想，还是要用sliding window，比如text = abca，单词为'a'，'b'和'c'，这样0和1都是有效index，要减少重复compare.

HashMap:

先把word以及word count放入一个wordMap中，然后对text进行遍历。遍历的时候，每次步长为wordLength，所以对于整个text我们只需要遍历wordLength次pass。对于每次pass，由于要找到所有复合条件的index，所以我们进行sliding window。为此我们还需要一个current map用来记录当前的window，一个lo变量来记录window的左边界，以及一个。之后对于每次pass，先看当前的单词是否存在于wordMap里，假如不存在则reset curMap，count和lo。假如存在，则把当前单词加入现在的window里。加入完毕后还需要检查重复情况，假如当前window里这个单词的计数大于wordMap里的计数，则从window左边界逐个取出单词，直到当前单词的计数等于wordMap里的计数为止。假如count == 单词总数，则 lo 是一个解，加入到结果list里，更新lo，count，并且window向右移动一个单词。代码写得很拖沓，有空要好好refactor。

Time Complexity - O(n)， Space Complexity - O(m * l)， m为单词数量，l为单词长度。

public class Solution {

    HashMap<String, Integer> wordMap;

    public List<Integer> findSubstring(String s, String[] words) {

        List<Integer> res = new ArrayList<>();

        if(s == null ||s.length() == 0 || words == null || words.length == 0)

            return res;

        wordMap = new HashMap<>();

        fillWordMap(words);         //put all words and their frequencey to wordMap

        int wordLen = words[0].length(), wordCount = words.length;

        HashMap<String, Integer> curMap = new HashMap<>();      //sliding window storing current word

        for(int i = 0; i < wordLen; i++) {            //we are going to process s word by word, so totally we need "wordLen" passes

            int j = i, lo = i, count = 0;           //lo is the index we need to record and add to result list

            curMap.clear();

            while(j <= s.length() - wordLen) {

                String curWord = s.substring(j, j + wordLen);

                if(!wordMap.containsKey(curWord)) {         // intervening characters found

                    curMap.clear();

                    count = 0;

                    lo = j + wordLen;

                } else {

                    if(curMap.containsKey(curWord))                     //put current word into current window

                        curMap.put(curWord, curMap.get(curWord) + 1);

                    else

                        curMap.put(curWord, 1);

                    count++;

                    while(curMap.get(curWord) > wordMap.get(curWord)) {   // remove words from left end of the window until valid

                        String rmvWord = s.substring(lo, lo + wordLen);

                        curMap.put(rmvWord, curMap.get(rmvWord) - 1);

                        count--;

                        lo += wordLen;

                    }

                    if(count == wordCount) {                    //if target string found

                        res.add(lo);

                        String loWord = s.substring(lo, lo + wordLen);

                        curMap.put(loWord, curMap.get(loWord) - 1);

                        count--;

                        lo += wordLen;

                    }

                }

                j += wordLen;

            }

        }

        return res;

    }

    private void fillWordMap(String[] words) {

        for(String word : words) {

            if(wordMap.containsKey(word))

                wordMap.put(word, wordMap.get(word) + 1);

            else

                wordMap.put(word, 1);

        }

    }

}

Trie:

DFA:

Histogram:

二刷：

这里主要还是跟第一遍相同。先建立一个global的wordMap，里面还有单词以及个数。接下来做双重循环，外循环是从0 到单词的长度，每次递增一个字符，内循环开始前我们clear curMap。内循环是从j = i开始，每次递增一个单词长度L。同时我们维护一个滑动窗口的左边界lo，以及当前复合条件的单词数目 count。每次我们先求出当前的单词 - s.substring(j, j + wordLen)，先判断其是否在wordMap里，假如不在，我们可以直接跳过L - 当前单词，从下一个单词其实为止开始查找 (这里我们要清空curMap以及count，更新lo)。假若当前单词在wordMap里，那么我们把它加入到curMap中，之后再拿curMap中这个单词的value与wordMap中这个单词的value进行比较。假如curMap.value小，那么我们继续下面的计算。假如curMap.get(curWord) > wordMap.get(curWord)，说明我们加入了多余的单词，这里我们要用类似"Sliding Window Maximum"中的方法，使用一个while循环，将这个window前部的单词一个一个poll出去。poll的过程就是先求出前部单词 s.substring(lo, lo + wordLen)，然后在curMap中将其value - 1，并且count--，之后再更新lo = lo + wordLen来比较下一个首部单词。直到我们把多加入的单词去掉，使得curMap.get(curWord) <= wordMap.get(curWord)为止。最后当count == words.length时，这时我们找到了一个解，把这个解的开头index lo加入到结果集中。然后我们要把window首部单词去掉，count--，并且增加lo = lo + wordLen，来继续进行下面的判断。

Java:

假如不考虑substring的话，应该是L次遍历，每次遍历 n / L个字符，这样应该算是 Time Complexity: O(n)， Space Complexity - O(L * m)， L为单词的长度，m为单词个数。

public class Solution {

    public List<Integer> findSubstring(String s, String[] words) {

        List<Integer> res = new ArrayList<>();

        if (s == null || s.length() == 0 || words == null || words.length == 0) {

            return res;

        }

        Map<String, Integer> wordMap = new HashMap<>();

        for (String word : words) {

            if (!wordMap.containsKey(word)) {

                wordMap.put(word, 1);

            } else {

                wordMap.put(word, wordMap.get(word) + 1);

            }

        }

        int wordLen = words[0].length();

        Map<String, Integer> curMap = new HashMap<>();

        for (int i = 0; i < wordLen; i++) {  // start from each char

            int lo = i, count = 0;

            curMap.clear();

            for (int j = i; j <= s.length() - wordLen; j += wordLen) {

                String curWord = s.substring(j, j + wordLen);

                if (!wordMap.containsKey(curWord)) {

                    curMap.clear();

                    count = 0;

                    lo = j + wordLen;

                } else {

                    if (!curMap.containsKey(curWord)) {

                        curMap.put(curWord, 1);

                    } else {

                        curMap.put(curWord, curMap.get(curWord) + 1);

                    }

                    count++;

                    while (curMap.get(curWord) > wordMap.get(curWord)) {        // poll from front

                        String wordToRemove = s.substring(lo, lo + wordLen);

                        curMap.put(wordToRemove, curMap.get(wordToRemove) - 1);

                        lo += wordLen;

                        count--;

                    }

                    if (count == words.length) { // found one solution

                        res.add(lo);

                        String loWord = s.substring(lo, lo + wordLen);

                        curMap.put(loWord, curMap.get(loWord) - 1);

                        lo += wordLen;

                        count--;

                    }

                }

            }

        }

        return res;

    }

}

有的时候HashMap的操作也可以简写，比如

curMap.put(curWord, curMap.get(curWord) == null ? 1 : curMap.get(curWord) + 1);

三刷:

跟二刷基本相同。

要注意的是 j的范围是 [i, s.length() - wordLen]，前后都是闭合的。

Java：

public class Solution {

    public List<Integer> findSubstring(String s, String[] words) {

        List<Integer> res = new ArrayList<>();

        if (s == null || words == null || words.length == 0) return res;

        Map<String, Integer> wordsMap = new HashMap<>();

        for (String word : words) {

            if (!wordsMap.containsKey(word)) wordsMap.put(word, 1);

            else wordsMap.put(word, wordsMap.get(word) + 1);

        }

        int wordLen = words[0].length();

        for (int i = 0; i < wordLen; i++) {

            Map<String, Integer> curMap = new HashMap<>();

            int lo = i;

            int count = 0;

            for (int j = i; j <= s.length() - wordLen; j += wordLen) {

                String word = s.substring(j, j + wordLen);

                if (!wordsMap.containsKey(word)) {

                    count = 0;

                    curMap.clear();

                    lo = j + wordLen;

                    continue;

                }

                if (!curMap.containsKey(word)) curMap.put(word, 1);

                else curMap.put(word, curMap.get(word) + 1);

                count++;

                while (curMap.get(word) > wordsMap.get(word)) {

                    String loWord = s.substring(lo, lo + wordLen);

                    curMap.put(loWord, curMap.get(loWord) - 1);

                    lo += wordLen;

                    count--;

                }

                if (count == words.length) {

                    res.add(lo);

                    String loWord = s.substring(lo, lo + wordLen);

                    curMap.put(loWord, curMap.get(loWord) - 1);

                    lo += wordLen;

                    count--;

                }

            }

        }

        return res;

    }

}

秒客网

30. Substring with Concatenation of All Words

相关文章