Python笔记(读取txt文件中的数据)

时间:2022-05-20 10:53:01

在机器学习中,常常需要读取txt文本中的数据,这里主要整理了两种读取数据的方式

数据内容

  • 共有四列数据,前三列为特征值,最后一列为数据标签
40920	8.326976	0.953952	3
14488 7.153469 1.673904 2
26052 1.441871 0.805124 1
75136 13.147394 0.428964 1
38344 1.669788 0.134296 1
72993 10.141740 1.032955 1
35948 6.830792 1.213192 3
42666 13.276369 0.543880 3
67497 8.631577 0.749278 1
35483 12.273169 1.508053 3

方式一:手动读取

from numpy import *
import operator
from os import listdir def file2matrix(filename):
fr = open(filename)
numberOfLines = len(fr.readlines()) #get the number of lines in the file
returnMat = zeros((numberOfLines,3)) #prepare matrix to return
classLabelVector = [] #prepare labels return
fr = open(filename)
index = 0
for line in fr.readlines():
line = line.strip()
listFromLine = line.split('\t')
returnMat[index,:] = listFromLine[0:3]
classLabelVector.append(int(listFromLine[-1]))
index += 1
return returnMat,classLabelVector dataMat,dataLabel = file2matrix("datingTestSet2.txt") print(dataMat, dataLabel)

方式二:使用pandas

import numpy as np
import pandas as pd
df_news = pd.read_table('datingTestSet2.txt',header = None)
df_news

详细可以查看下面文档