python从这个有序数组中获取非规范化数组的最佳方法是什么?

时间:2023-01-14 11:17:20

I have this array:

我有这个数组:

>>> print raw_data
['LEVEL 1',
'SUBJECT A',
'GROUP X',
'COMMENT i',
'COMMENT ii',
'COMMENT iii',
'GROUP Y',
'COMMENT iv',
'COMMENT v',
'COMMENT vi',
'LEVEL 2',
'SUBJECT B',
'GROUP Z',
'COMMENT vii',
'COMMENT viii',
'COMMENT ix',
'SUBJECT C',
'GROUP X2',
'COMMENT x',
'COMMENT xi',
'COMMENT xii',
'COMMENT xiii',
'GROUP Y2',
'COMMENT xiv',
'COMMENT xv',
'COMMENT xvi']

Where the obvious hierarchy is:

明显的层次结构是:

  1. Level
    1. Subject
      1. Group
        1. Comments
        2. 注释
      2. 小组评论
    2. 主题小组评论
  2. 级别主题组评论

My objective is to get the array as a denormalized array to be store on a database:

我的目标是将数组作为非规范化数组存储在数据库中:

>>> print result
[
    ['LEVEL 1', 'SUBJECT A', 'GROUP X', 'COMMENT i'],
    ['LEVEL 1', 'SUBJECT A', 'GROUP X', 'COMMENT ii'],
    ['LEVEL 1', 'SUBJECT A', 'GROUP X', 'COMMENT iii'],
    ['LEVEL 1', 'SUBJECT A', 'GROUP Y', 'COMMENT iv'],
    ['LEVEL 1', 'SUBJECT A', 'GROUP Y', 'COMMENT v'],
    ['LEVEL 1', 'SUBJECT A', 'GROUP Y', 'COMMENT vi'],
    ['LEVEL 2', 'SUBJECT B', 'GROUP Z', 'COMMENT vi'],
    ['LEVEL 2', 'SUBJECT B', 'GROUP Z', 'COMMENT vii'],
    ['LEVEL 2', 'SUBJECT B', 'GROUP Z', 'COMMENT viii'],
    ['LEVEL 2', 'SUBJECT B', 'GROUP Z', 'COMMENT ix'],
    ['LEVEL 2', 'SUBJECT C', 'GROUP X1', 'COMMENT x'],
    ['LEVEL 2', 'SUBJECT C', 'GROUP X1', 'COMMENT xi'],
    ['LEVEL 2', 'SUBJECT C', 'GROUP X1', 'COMMENT xii'],
    ['LEVEL 2', 'SUBJECT C', 'GROUP X1', 'COMMENT xiii],'
    ['LEVEL 2', 'SUBJECT C', 'GROUP Y2', 'COMMENT xiv'],
    ['LEVEL 2', 'SUBJECT C', 'GROUP Y2', 'COMMENT xv'],
    ['LEVEL 2', 'SUBJECT C', 'GROUP Y2', 'COMMENT xi']
]

I was trying to solve this, but I am quite lost, I think this problem has to be usual, so I would like to know if someone has a efficient approach, this seems to be something like nested sets, but I don't know a lot of this on python, getting the level is easy, but I am getting " headaches" getting this further.

我试图解决这个问题,但我很遗憾,我认为这个问题必须是平常的,所以我想知道是否有人有一个有效的方法,这似乎就像嵌套集,但我不知道在python上有很多这样的,获得关卡很容易,但是让我感到头疼的是“让人头疼”。

>>> def addlevel(a):
    if a.startswith('LEVEL'):
        return [1, a]
    elif a.startswith('SUBJECT'):
        return [2, a]
    elif a.startswith('GROUP'):
        return [3, a]
    elif a.startswith('COMMENT'):
        return [4, a]
>>> map(addlevel, raw_data)
[[1, 'LEVEL 1'],
 [2, 'SUBJECT A'],
 [3, 'GROUP X'],
 [4, 'COMMENT i'],
 [4, 'COMMENT ii'],
 [4, 'COMMENT iii'],
 [3, 'GROUP Y'],
 [4, 'COMMENT iv'],
 [4, 'COMMENT v'],
 [4, 'COMMENT vi'],
 [1, 'LEVEL 2'],
 [2, 'SUBJECT B'],
 [3, 'GROUP Z'],
 [4, 'COMMENT vii'],
 [4, 'COMMENT viii'],
 [4, 'COMMENT ix'],
 [2, 'SUBJECT C'],
 [3, 'GROUP X2'],
 [4, 'COMMENT x'],
 [4, 'COMMENT xi'],
 [4, 'COMMENT xii'],
 [4, 'COMMENT xiii'],
 [3, 'GROUP Y2'],
 [4, 'COMMENT xiv'],
 [4, 'COMMENT xv'],
 [4, 'COMMENT xvi']]

I would appreciate any clues !

我会很感激任何线索!

2 个解决方案

#1


3  

You could try something like this:

你可以尝试这样的事情:

raw_data = [ 'LEVEL 1', 'SUBJECT A', 'GROUP X', 'COMMENT i', 'COMMENT ii',
'COMMENT iii', 'GROUP Y', 'COMMENT iv', 'COMMENT v', 'COMMENT vi', 'LEVEL 2',
'SUBJECT B', 'GROUP Z', 'COMMENT vii', 'COMMENT viii', 'COMMENT ix', 
'SUBJECT C', 'GROUP X2', 'COMMENT x', 'COMMENT xi', 'COMMENT xii', 
'COMMENT xiii', 'GROUP Y2', 'COMMENT xiv', 'COMMENT xv', 'COMMENT xvi' ]

level, subject, group, comment = '', '', '', ''

result = []

for item in raw_data:

    if item.startswith('COMMENT'): 
        comment = item
    elif item.startswith('GROUP'): 
        group = item
        comment = ''
    elif item.startswith('SUBJECT'): 
        subject = item
        group = ''
    elif item.startswith('LEVEL'): 
        level = item
        subject = ''

    if level and subject and group and comment:
        result.append([level, subject, group, comment])

import pprint
pprint.pprint(result)

Which would yield:

哪个会产生:

[['LEVEL 1', 'SUBJECT A', 'GROUP X', 'COMMENT i'],
 ['LEVEL 1', 'SUBJECT A', 'GROUP X', 'COMMENT ii'],
 ['LEVEL 1', 'SUBJECT A', 'GROUP X', 'COMMENT iii'],
 ['LEVEL 1', 'SUBJECT A', 'GROUP Y', 'COMMENT iv'],
 ['LEVEL 1', 'SUBJECT A', 'GROUP Y', 'COMMENT v'],
 ['LEVEL 1', 'SUBJECT A', 'GROUP Y', 'COMMENT vi'],
 ['LEVEL 2', 'SUBJECT B', 'GROUP Z', 'COMMENT vii'],
 ['LEVEL 2', 'SUBJECT B', 'GROUP Z', 'COMMENT viii'],
 ['LEVEL 2', 'SUBJECT B', 'GROUP Z', 'COMMENT ix'],
 ['LEVEL 2', 'SUBJECT C', 'GROUP X2', 'COMMENT x'],
 ['LEVEL 2', 'SUBJECT C', 'GROUP X2', 'COMMENT xi'],
 ['LEVEL 2', 'SUBJECT C', 'GROUP X2', 'COMMENT xii'],
 ['LEVEL 2', 'SUBJECT C', 'GROUP X2', 'COMMENT xiii'],
 ['LEVEL 2', 'SUBJECT C', 'GROUP Y2', 'COMMENT xiv'],
 ['LEVEL 2', 'SUBJECT C', 'GROUP Y2', 'COMMENT xv'],
 ['LEVEL 2', 'SUBJECT C', 'GROUP Y2', 'COMMENT xvi']]

#2


5  

Pseudocode, don't have a handy python interpreter right now:

Pseudocode,现在没有方便的python解释器:

Set LEVEL, SUBJECT, GROUP to None, results to []

Loop over the list
  if its a 'LEVEL', set LEVEL to it
  if its a 'SUBJECT', set SUBJECT to it
  if its a 'GROUP', set GROUP to it
  if its a "COMMENT", append [LEVEL SUBJECT GROUP and COMMENT] to results
Ta-da.

It just relies on the ordering...

它只依赖于订购......

#1


3  

You could try something like this:

你可以尝试这样的事情:

raw_data = [ 'LEVEL 1', 'SUBJECT A', 'GROUP X', 'COMMENT i', 'COMMENT ii',
'COMMENT iii', 'GROUP Y', 'COMMENT iv', 'COMMENT v', 'COMMENT vi', 'LEVEL 2',
'SUBJECT B', 'GROUP Z', 'COMMENT vii', 'COMMENT viii', 'COMMENT ix', 
'SUBJECT C', 'GROUP X2', 'COMMENT x', 'COMMENT xi', 'COMMENT xii', 
'COMMENT xiii', 'GROUP Y2', 'COMMENT xiv', 'COMMENT xv', 'COMMENT xvi' ]

level, subject, group, comment = '', '', '', ''

result = []

for item in raw_data:

    if item.startswith('COMMENT'): 
        comment = item
    elif item.startswith('GROUP'): 
        group = item
        comment = ''
    elif item.startswith('SUBJECT'): 
        subject = item
        group = ''
    elif item.startswith('LEVEL'): 
        level = item
        subject = ''

    if level and subject and group and comment:
        result.append([level, subject, group, comment])

import pprint
pprint.pprint(result)

Which would yield:

哪个会产生:

[['LEVEL 1', 'SUBJECT A', 'GROUP X', 'COMMENT i'],
 ['LEVEL 1', 'SUBJECT A', 'GROUP X', 'COMMENT ii'],
 ['LEVEL 1', 'SUBJECT A', 'GROUP X', 'COMMENT iii'],
 ['LEVEL 1', 'SUBJECT A', 'GROUP Y', 'COMMENT iv'],
 ['LEVEL 1', 'SUBJECT A', 'GROUP Y', 'COMMENT v'],
 ['LEVEL 1', 'SUBJECT A', 'GROUP Y', 'COMMENT vi'],
 ['LEVEL 2', 'SUBJECT B', 'GROUP Z', 'COMMENT vii'],
 ['LEVEL 2', 'SUBJECT B', 'GROUP Z', 'COMMENT viii'],
 ['LEVEL 2', 'SUBJECT B', 'GROUP Z', 'COMMENT ix'],
 ['LEVEL 2', 'SUBJECT C', 'GROUP X2', 'COMMENT x'],
 ['LEVEL 2', 'SUBJECT C', 'GROUP X2', 'COMMENT xi'],
 ['LEVEL 2', 'SUBJECT C', 'GROUP X2', 'COMMENT xii'],
 ['LEVEL 2', 'SUBJECT C', 'GROUP X2', 'COMMENT xiii'],
 ['LEVEL 2', 'SUBJECT C', 'GROUP Y2', 'COMMENT xiv'],
 ['LEVEL 2', 'SUBJECT C', 'GROUP Y2', 'COMMENT xv'],
 ['LEVEL 2', 'SUBJECT C', 'GROUP Y2', 'COMMENT xvi']]

#2


5  

Pseudocode, don't have a handy python interpreter right now:

Pseudocode,现在没有方便的python解释器:

Set LEVEL, SUBJECT, GROUP to None, results to []

Loop over the list
  if its a 'LEVEL', set LEVEL to it
  if its a 'SUBJECT', set SUBJECT to it
  if its a 'GROUP', set GROUP to it
  if its a "COMMENT", append [LEVEL SUBJECT GROUP and COMMENT] to results
Ta-da.

It just relies on the ordering...

它只依赖于订购......