POJ 3693 Maximum repetition substring(后缀数组)

时间:2021-09-24 15:01:04

Description

The repetition number of a string is defined as the maximum number R such that the string can be partitioned into R same consecutive substrings. For example, the repetition number of "ababab" is 3 and "ababa" is 1.

Given a string containing lowercase letters, you are to find a substring of it with maximum repetition number.

Input

The input consists of multiple test cases. Each test case contains exactly one line, which
gives a non-empty string consisting of lowercase letters. The length of the string will not be greater than 100,000.

The last test case is followed by a line containing a '#'.

Output

For each test case, print a line containing the test case number( beginning with 1) followed by the substring of maximum repetition number. If there are multiple substrings of maximum repetition number, print the lexicographically smallest one.

题目大意:给一个字符串,求重复次数最多的连续重复子串,如有多个答案输出字典序最小的。

思路:对于一个由长度为L的字符串重复R次形成的子串,那么对于s[0]、s[L]、s[2*L]……,该子串必然包含其中的两个字符

那么,我们从1~n穷举长度L

对于每一个s[i*L]、s[(i+1)*L]看看它们能往前和往后同时匹配多长

记这个长度为K,那么K/L+1就是可以重复的次数(对于一个字符串如果他是由R个长度为L的字符串重复形成的,那么必然有lcp(suffix(0), suffix(L))==L*(R-1))

然后,从两个点往后匹配好求,但往前匹配就不好办了,虽然可以把字符串反过来再弄一个后缀数组,但是这不够优美。

比较优美的方法就是,对于某个连续重复子串,假设它的包含的最前面的两个是s[i*L]和s[(i+1)*L],设p=L-lcp%L(当lcp mod L ≠ 0)

那么只需要测试lcp(i*L-p, (i+1)*L-p)就行了,因为再往前,重复的次数也不会增加。

然后我们就可以得到最大重复次数了。

但是题目要求的是字典序最小的答案耶……要在算重复次数的时候也算出来好像很有难度(应该说是很麻烦,大概就是DISCUSS里面那个说3个RMQ的人……)

但是,我们再算最大重复次数的时候,把长度也算出来,设为K

那么只要遍历一下字符串,把lcp(i, i + K / R) ≥ K - K / R的找出来,这些都是符合条件的答案开头,因为有后缀数组,直接找rank最小的就行了。

这样,这题就做完了。

代码(313MS):

 #include <cstdio>
#include <algorithm>
#include <iostream>
#include <cstring>
using namespace std; const int MAXN = ; char s[MAXN];
int sa[MAXN], height[MAXN], rank[MAXN], c[MAXN], tmp[MAXN];
int n; void makesa(int m) {
memset(c, , m * sizeof(int));
for(int i = ; i < n; ++i) ++c[rank[i] = s[i]];
for(int i = ; i < m; ++i) c[i] += c[i - ];
for(int i = ; i < n; ++i) sa[--c[rank[i]]] = i;
for(int k = ; k < n; k <<= ) {
for(int i = ; i < n; ++i) {
int j = sa[i] - k;
if(j < ) j += n;
tmp[c[rank[j]]++] = j;
}
int j = c[] = sa[tmp[]] = ;
for(int i = ; i < n; ++i) {
if(rank[tmp[i]] != rank[tmp[i - ]] || rank[tmp[i] + k] != rank[tmp[i - ] + k])
c[++j] = i;
sa[tmp[i]] = j;
}
memcpy(rank, sa, n * sizeof(int));
memcpy(sa, tmp, n * sizeof(int));
}
} void calheight() {
for(int i = , k = ; i < n; height[rank[i++]] = k) {
if(k > ) --k;
int j = sa[rank[i] - ];
while(s[i + k] == s[j + k]) ++k;
}
} int logn[MAXN];
int best[][MAXN]; void initRMQ() {
logn[] = -;
for(int i = ; i <= n; ++i)
logn[i] = (i & (i - )) == ? logn[i - ] + : logn[i - ];
for(int i = ; i <= n; ++i) best[][i] = height[i];
for(int i = ; i <= logn[n]; ++i) {
int ed = n - ( << i) + ;
for(int j = ; j <= ed; ++j)
best[i][j] = min(best[i - ][j], best[i - ][j + ( << (i - ))]);
}
} int lcp(int a, int b) {
a = rank[a], b = rank[b];
if(a > b) swap(a, b);
++a;
int t = logn[b - a + ];
return min(best[t][a], best[t][b - ( << t) + ]);
} void solve() {
int ans = , ansL = , ansR = ;
for(int i = ; i < n - ; ++i) if(s[i] < s[ans]) ans = i;
for(int i = ; i < n; ++i) {
for(int j = ; j + i < n - ; j += i) {
int t = lcp(j, j + i), p = ;
if(t % i) {
p = i - t % i;
if(j < p) p = ;
t = max(t, lcp(j - p, j + i - p));
}
if(t / i + > ansR || (t / i + == ansR && rank[j] < rank[ans])) {
ans = j - p;
ansR = t / i + ;
ansL = ansR * i;
}
}
}
for(int i = ; i < n - ; ++i)
if(lcp(i, i + ansL / ansR) >= ansL - ansL / ansR && rank[i] < rank[ans]) ans = i;
for(int i = ans; i < ans + ansL; ++i) putchar(s[i]);
puts("");
} int main() {
int kase = ;
while(scanf("%s", s) != EOF) {
if(*s == '#') break;
n = strlen(s) + ;
makesa();
calheight();
initRMQ();
printf("Case %d: ", ++kase);
solve();
}
}