从文本文件中提取文本并以不同的格式写入

时间:2022-10-29 20:24:57

Hi I'm trying to extract some lines of text from a file generated from a program and write to another text file in a different format using python.

嗨,我正在尝试从程序生成的文件中提取一些文本行,并使用python以不同的格式写入另一个文本文件。

Here is what I have so far:

这是我到目前为止:

import os
import glob



path="D:\Programming\Python\Examples\Home\GainWizard\MassLynx\VxWorks\TargetRegistryFiles"
os.chdir(path)
print os.getcwd()
print os.listdir(path)


filelist = os.listdir(os.getcwd())
filelist = filter(lambda x: not os.path.isdir(x), filelist)
newest = max(filelist, key=lambda x: os.stat(x).st_mtime)

print newest
f = open(newest,'r')

data = f.readlines()
print data

This adds all the text to a list

这会将所有文本添加到列表中

What I have is

我拥有的是什么

Autotune Ion Energy:Fixed Ion Energy 1,2.000000,Autotune Ion Energy:Fixed Ion Energy      2,2.000000,Autotune Ion Energy:MS1-Neg Opt,0.3,Autotune Ion Energy:MS1-Pos Opt,-0.2,Autotune Ion Energy:MS2-Neg Opt,0.4,Autotune Ion Energy:MS2-Pos Opt,0.6,Autotune Ion Energy:MSMS Mode Fixed Ion Energy 1,0.500000,Autotune Ion Energy:MSMS Mode Fixed Ion Energy 2,2.000000,Autotune Ion Energy:OptimumValuesSet,true,Debug:Use old bunching method,true,Detector Gain Negative:High Gain,368.861012,Detector Gain Negative:Low Gain,73.523644,Detector Gain Negative:a,1.865677e-021,Detector Gain Negative:b,8.441605,Detector Gain Postitve:High Gain,613.662847,Detector Gain Postitve:Low Gain,124.065398,Detector Gain Postitve:a,4.973557e-021,Detector Gain Postitve:b,8.367407,DivertValve:ValveZone,0,Engineers Settings:MS1 DC Balance -,0.300000,Engineers Settings:MS1 DC Polarity,1,Engineers Settings:MS1 High Mass Position,174.000000,Engineers Settings:MS1 High Mass Resolution,1801.000000,Engineers Settings:MS1 Low Mass Position,519.000000,Engineers Settings:MS1 Low Mass Resolution,511.000000,Engineers Settings:MS1 Resolution Linearity,873.000000,Engineers Settings:MS2 DC Balance -,-0.200000,Engineers Settings:MS2 DC Polarity,0,Engineers Settings:MS2 High Mass Position,190.000000,Engineers Settings:MS2 High Mass Resolution,1744.000000,Engineers Settings:MS2 Low Mass Position,519.000000,Engineers Settings:MS2 Low Mass Resolution,514.000000,Engineers Settings:MS2 Resolution Linearity,857.000000,Engineers Settings:PIC MS Scan CE,4.000000,Engineers Settings:PIC Threshold Calc Scan Delay,3,Engineers Settings:PIC decreasing data points,3,Engineers Settings:PIC nonDefault Scan Speed,5000.000000,Engineers Settings:PMT Type,Hamamatsu,Engineers Settings:RF Offset Negative,0.000000,Engineers Settings:RF Offset Positive,0.000000,Failure:Gas failed state,OK,Failure:Leak detected state,Tripped,Fluidics:AcknowledgeCountThreshold,5,Fluidics:ActiveReservoir,2,Fluidics:Aspirate Rate,1000,Fluidics:Draw Rate,1000,Fluidics:Fill Volume,250,Fluidics:Flow Rate,10,Fluidics:Flow State,Waste,Fluidics:Inject-Flow Rate,400,Fluidics:Inject-MethodType,4,Fluidics:Inject-Pump Time1,5,Fluidics:Inject-Pump Time2,6,Fluidics:Inject-Pump Time3,10,Fluidics:Max Flow Rate,1500,Fluidics:Pending Active TimeOut,10,Fluidics:Pending Complete TimeOut,1200,Fluidics:Pending Response TimeOut,10,Fluidics:Power Cycle Delay,3.000000,Fluidics:Precompression Dispense Rate,300,Fluidics:Precompression Dispense Volume,30,Fluidics:Precompression Enable,TRUE,Fluidics:Precompression Max Fill Volume,280,Fluidics:Purge Delay Length,1,Fluidics:Refill Wait Time,60.000000,Fluidics:Sample Purge Count,0,Fluidics:Wash Purge Count,1,Instrument:Collision gas status,off,Instrument:EPC Version,Feb 15 2012,Instrument:Serial Number,QCA331,Instrument:Unique Name,,Ion Energy Settings:Fixed Ion Energy 1,3.000000,Ion Energy Settings:Fixed Ion Energy 2,3.000000,Maintenance Counters:DAYS_SINCE_LAST_SERVICE_THRESHOLD,0,Maintenance Counters:OPERATE_SWITCHES,28,Maintenance Counters:OPERATE_SWITCHES_THRESHOLD,0,Maintenance Counters:OPERATE_TIME,141233,Maintenance Counters:OPERATE_TIME_THRESHOLD,0,Maintenance Counters:POLARITY_SWITCHES,187,Maintenance Counters:POLARITY_SWITCHES_THRESHOLD,0,Maintenance Counters:VACUUM_TIME,763973,Maintenance Counters:VACUUM_TIME_THRESHOLD,0,Protective Actions:ENABLE_DIVERT_TO_WASTE,1,Scan Parameters:Interchannel Delay,0.020000,Scan Parameters:Interscan Delay,0.020000,Scan Parameters:Manual Mode,true,Scan Parameters:Polarity Switching Interscan Delay,0.020000,Scan Parameters:Scan Speed Options,1000\,2000\,5000\,10000,Scan speed adjust::DefaultsVersionLevel,2,Scan speed adjust:HIGH_SCALE_MASS_ADJUST_MS1_SETTING,-60.000000,Scan speed adjust:HIGH_SCALE_MASS_ADJUST_MS2_SETTING,-32.000000,Scan speed adjust:ION_ENERGY_1_RAMP_SETTING,2.000000,Scan speed adjust:ION_ENERGY_2_RAMP_SETTING,2.000000,Scan speed adjust:LINEARITY_ADJUST_MS1_SETTING,0.000000,Scan speed adjust:LINEARITY_ADJUST_MS2_SETTING,0.000000,Scan speed adjust:LOW_MASS_RESOLUTION_MS1_SETTING,10.000000,Scan speed adjust:LOW_MASS_RESOLUTION_MS2_SETTING,20.000000,Scan speed adjust:LOW_SCALE_MASS_ADJUST_MS1_SETTING,-15.000000,Scan speed adjust:LOW_SCALE_MASS_ADJUST_MS2_SETTING,-15.000000,Scan speed adjust:MS1_ION_ENERGY_SETTING,1.000000,Scan speed adjust:MS1_ION_ENERGY_WRITE_SETTING,1.000000,Scan speed adjust:MS2_ION_ENERGY_SETTING,0.700000,Scan speed adjust:MS2_ION_ENERGY_WRITE_SETTING,0.700000,Scan speed adjust:RESOLUTION_ADJUST_MS1_SETTING,-15.000000,Scan speed adjust:RESOLUTION_ADJUST_MS2_SETTING,0.000000

What I need is

我需要的是

START_TARGET_REGISTRY
Detector Gain Negative:a,1.087668e-021
Detector Gain Negative:b,8.536190
Detector Gain Negative:High Gain,392.233021 
Detector Gain Negative:Low Gain,76.782164
Detector Gain Postitve:a,4.061385e-021 
Detector Gain Postitve:b,8.398445
Detector Gain Postitve:High Gain,610.368775
Detector Gain Postitve:Low Gain,122.669833
END_TARGET_REGISTRY

Thanks

1 个解决方案

#1


0  

Some things aren't quite clear, like whether you need more parameters than just the "Detector Gain" ones or where the numbers come from (since they don't appear in your example).

有些事情并不十分清楚,比如你是否需要更多的参数而不仅仅是“探测器增益”或者数字的来源(因为它们没有出现在你的例子中)。

However, this might get you to where you need to be:

但是,这可能会让您到达您需要的位置:

from collections import OrderedDict

D = OrderedDict()
for field in data.split(','):    
    if ':' in field:
        k = field
    else:
        D[k]= field.strip()

with open(r"C:\temp\detector_gain.txt", 'w') as outfile:
    print("START_TARGET_REGISTRY", file=outfile)
    for k, v in D.items():
        if "Detector Gain" in k:
           print(k, v, sep=',', file=outfile)
    print("END_TARGET_REGISTRY", file=outfile)

Since the format of the data seems to be CATEGORY_1:KEY_1,VALUE_1,CATEGORY_2:KEY_2,VALUE_2... we break the data into fields at each comma with the split method.

由于数据的格式似乎是CATEGORY_1:KEY_1,VALUE_1,CATEGORY_2:KEY_2,VALUE_2 ......我们使用split方法将数据分成每个逗号的字段。

Then we loop through each field, looking for a : character, which tells us that we're reading a CATEGORY:KEY field.

然后我们遍历每个字段,寻找一个:字符,它告诉我们我们正在读一个CATEGORY:KEY字段。

Once we have the CATEGORY:KEY field, we know the next field will be the associated value. So we add that to a Python dictionary, which maps keys to values. I chose the OrderedDict dictionary in case the order of the fields is important.

一旦我们有了CATEGORY:KEY字段,我们就知道下一个字段将是相关的值。所以我们将它添加到Python字典中,该字典将键映射到值。我选择了OrderedDict字典,以防字段的顺序很重要。

At the end we read through the dictionary we constructed, looking for the "Detector Gain" fields. Then we print them to an outfile - you can see how we open it for writing with a context manager.

最后,我们读完了我们构建的字典,寻找“Detector Gain”字段。然后我们将它们打印到outfile - 您可以看到我们如何打开它以便使用上下文管理器进行编写。

If you're on Python 2 also do from __future__ import print_function.

如果您使用的是Python 2,也可以使用__future__ import print_function。

#1


0  

Some things aren't quite clear, like whether you need more parameters than just the "Detector Gain" ones or where the numbers come from (since they don't appear in your example).

有些事情并不十分清楚,比如你是否需要更多的参数而不仅仅是“探测器增益”或者数字的来源(因为它们没有出现在你的例子中)。

However, this might get you to where you need to be:

但是,这可能会让您到达您需要的位置:

from collections import OrderedDict

D = OrderedDict()
for field in data.split(','):    
    if ':' in field:
        k = field
    else:
        D[k]= field.strip()

with open(r"C:\temp\detector_gain.txt", 'w') as outfile:
    print("START_TARGET_REGISTRY", file=outfile)
    for k, v in D.items():
        if "Detector Gain" in k:
           print(k, v, sep=',', file=outfile)
    print("END_TARGET_REGISTRY", file=outfile)

Since the format of the data seems to be CATEGORY_1:KEY_1,VALUE_1,CATEGORY_2:KEY_2,VALUE_2... we break the data into fields at each comma with the split method.

由于数据的格式似乎是CATEGORY_1:KEY_1,VALUE_1,CATEGORY_2:KEY_2,VALUE_2 ......我们使用split方法将数据分成每个逗号的字段。

Then we loop through each field, looking for a : character, which tells us that we're reading a CATEGORY:KEY field.

然后我们遍历每个字段,寻找一个:字符,它告诉我们我们正在读一个CATEGORY:KEY字段。

Once we have the CATEGORY:KEY field, we know the next field will be the associated value. So we add that to a Python dictionary, which maps keys to values. I chose the OrderedDict dictionary in case the order of the fields is important.

一旦我们有了CATEGORY:KEY字段,我们就知道下一个字段将是相关的值。所以我们将它添加到Python字典中,该字典将键映射到值。我选择了OrderedDict字典,以防字段的顺序很重要。

At the end we read through the dictionary we constructed, looking for the "Detector Gain" fields. Then we print them to an outfile - you can see how we open it for writing with a context manager.

最后,我们读完了我们构建的字典,寻找“Detector Gain”字段。然后我们将它们打印到outfile - 您可以看到我们如何打开它以便使用上下文管理器进行编写。

If you're on Python 2 also do from __future__ import print_function.

如果您使用的是Python 2,也可以使用__future__ import print_function。