一个简单算法题引发的思考<DNA sorting>（about cin／template／new etc）

首先是昨天在北京大学ｏｊ网上看到一个简单的算法题目，虽然简单，但是如何完成一段高效、简洁、让人容易看懂的代码对于我这个基础不好，刚刚进入计算机行业的小白来说还是有意义的。而且在写代码的过程中，会发现自己平时学习中不会发现的问题，所以想写下这个博客，主要是便于自己对算法的理解。

来，上题。

DNA Sorting

Time Limit: 1000MS		Memory Limit: 10000K
Total Submissions: 91599		Accepted: 36781

Description

One measure of ``unsortedness'' in a sequence is the number of pairs of entries that are out of order with respect to each other. For instance, in the letter sequence ``DAABEC'', this measure is 5, since D is greater than four letters to its right and E is greater than one letter to its right. This measure is called the number of inversions in the sequence. The sequence ``AACEDGG'' has only one inversion (E and D)---it is nearly sorted---while the sequence ``ZWQM'' has 6 inversions (it is as unsorted as can be---exactly the reverse of sorted).

You are responsible for cataloguing a sequence of DNA strings (sequences containing only the four letters A, C, G, and T). However, you want to catalog them, not in alphabetical order, but rather in order of ``sortedness'', from ``most sorted'' to ``least sorted''. All the strings are of the same length.

Input

The first line contains two integers: a positive integer n (0 < n <= 50) giving the length of the strings; and a positive integer m (0 < m <= 100) giving the number of strings. These are followed by m lines, each containing a string of length n.

Output

Output the list of input strings, arranged from ``most sorted'' to ``least sorted''. Since two strings can be equally sorted, then output them according to the orginal order.

Sample Input

10 6
AACATGAAGG
TTTTGGCCAA
TTTGGCCAAA
GATCAGATTT
CCCGGGGGGA
ATCGATGCAT

Sample Output

CCCGGGGGGA
AACATGAAGG
GATCAGATTT
ATCGATGCAT
TTTTGGCCAA
TTTGGCCAAA

＃０１、这是一个很简单的题目，基本上就是看完题目，思路就出来的那种。这个题目主要要解决以下几个小麻烦。
0X001：存放DNA序列的数据结构，存放这种类型的数据，最好的当然是二维数组啦。m,n是在程序运行过程中用户输入的，显然不能直接char　N_DNA[n][m]来定义。
虽然题目中给出了ｍ和ｎ都要小于等于50，但是设一个太大的数组，还有很多空间浪费，作为一个较真的程序员，显然是心里过不去的。
如是，我们可以采用下面的方法定义数组，并分配空间。

 1 template <typename T>
 2 //定义Arr数据类型
 3 class Arr
 4 {
 5 public:
 6     Arr(int length)
 7     {
 8         this->length = length;
 9         data = new T[length];
10     }
11     T *data;
12     unsigned int length;
13 };
14 typedef Arr<char>* P_of_Arr;//P_of_Arr是一维Arr数组的指针类型
15 void main()
16 {
17     int m, n;
18     scanf("%d %d", &m, &n);
19     //构造N_DNA数组
20      Arr<P_of_Arr> N_DNA(n);
21    //给N_DNA的data数组的每个指针分配空间
22      int t;
23      for (t = 0; t < n; t++)
24      {
25          N_DNA.data[t] = new Arr<char>(m);
26      }
27 }

这里面主要用到两个c++的知识，写出来，加强一下自己的理解。

其一就是temptale的使用，关键字template在这里的作用让class Arr的类型多向化，就是让class Arr可以有多种理解，就是让class Arr成为一个模板，当在其他一些相似但是又不相同的环境下可以被再次使用。通俗点讲，就是先假装，T是已经定义的类型。让编译器认可它，不报错。在Arr定义变量的时候，再来补上T的类型。因此，这样用template模板定义的类型，可以在不同的类型T的环境中使用。

其二是new的使用，首先我们定义一个Arr数组Arr<P_of_Arr> N_DNA(n);我们可以看到Arr构造函数里面，this->length = length;data = new T[length];将n传给Arr域里面的length，并且分配一个T[n]空间的数组，并把指针传给data（注意，这里data是二重指针，也就是数组是data[n]，其中data[0,1....n]也是指针，因为定义Arr时，T是P_of_Arr,而typedef Arr<char>* P_of_Arr;//P_of_Arr是一维Arr数组的指针类型）。

 int t;
for (t = 0; t < n; t++)
     {
         N_DNA.data[t] = new Arr<char>(m);
     }

然后我们循环，依次给data[0,1...n]分配空间.每一个data[i]指向一个一维的Arr类型的数据。

至此，我们就分配了N_DNA.data[ｎ]->data[ｍ]的数组。

＃０２数据的输入，cin的理解，cin是C++库函数里面的一系列函数，cin>>/cin.get／cin.getline...这些函数的用法在这里不再赘述。

可以参考以下两篇博客，里面讲的比较清楚

个人体会就是在使用和理解这些函数时，了解两个方面问题。

第一、space，tab，Enter是否被从缓冲区中舍弃。＜space，tab，Enter＞

第二、cin.getline在超出范围时，通过怎样影响输入标志位，来影响后续的输入。＜failbit，eofbit，badbit／／goodbit＞

参考博客：　

　　　　　http://www.cnblogs.com/A-Song/archive/2012/01/29/2331204.html

　　　　　http://blog.****.net/dongtingzhizi/article/details/2299365

之后就是非常常规简单的代码了，定义一个数组rel记录DNA的逆值，然后依次按照逆值从小到大输出DNA序列。

完全的代码如下：

 1 #define _CRT_SECURE_NO_WARNINGS
 2 #include<iostream>
 3 using namespace std;
 4 #define MAX 12768
 5 template <typename T>
 6 class Arr
 7 {
 8 public:
 9     Arr(int length)
10     {
11         this->length = length;
12         data = new T[length];
13     }
14     T *data;
15     unsigned int length;
16 };
17 typedef Arr<char>* P_of_Arr;
18 int main(void)
19 {
20     int m, n;
21     scanf("%d %d", &m, &n);
22     //构造N_DNA数组
23      Arr<P_of_Arr> N_DNA(n);
24      int t;
25      for (t = 0; t < n; t++)
26      {
27          N_DNA.data[t] = new Arr<char>(m);
28      }
29      int i,j,k;
30      int *rel = new int[n];
31      //输入DNA序列
32      cin.getline(N_DNA.data[0]->data, m + 1);
33      for (i = 0; i < n; i++)
34          cin.getline(N_DNA.data[i]->data, m+1);
35      //循环遍历，用rel记录各个DNA序列的逆值
36      for (k = 0; k < n; k++)
37      {
38          rel[k] = 0;
39          for (i = 0; i < m; i++)
40              for (j = i; j < m; j++)
41                  if (N_DNA.data[k]->data[i]>N_DNA.data[k]->data[j])
42                      rel[k]++;
43      }
44      int *usedrel = new int[n];//标志rel是否被用
45      //初始化全为0
46      for (i = 0; i < n; i++)
47          usedrel[i] = 0;
48      //查找最小的逆值得DNA序列,并输出
49      for (i = 0; i < n; i++)
50      {
51          k = -1;//记录最小逆值的地址
52          int min=MAX;//记录最小的逆值
53          for (j = 0; j < n; j++)
54              if (rel[j] < min&&usedrel[j]==0)
55              {
56                  min = rel[j];
57                  k = j;
58              }
59          usedrel[k] = 1;//标记已经被访问
60          for (j = 0; j < m; j++)
61              cout << N_DNA.data[k]->data[j];
62          cout << endl;
63      }
64      return 0;
65 }

运行结果如下：

由于题目对于输入输出有要求，所以没有将输入输出分开。一个简单算法题引发的思考<DNA sorting>（about cin／template／new etc）

秒客网

一个简单算法题引发的思考（about cin／template／new etc）

相关文章