有效地读取SO的数据转储

时间:2022-02-14 10:44:57

I use currently Vim to read SO's data dump. However, my Macbook slows down when I roll down just a few rows. This suggests me that there must be more efficient ways to read the data.

我目前使用Vim来读取SO的数据转储。但是,当我向下滚动几行时,我的Macbook速度变慢了。这表明我必须有更有效的方法来读取数据。

I know little MySQL. The files are in .xml -format. It is rather hard to read the data at the moment in .xml. It may be more efficient to convert the xml -files to MySQL and then read the files. I know only MS db -tool for such actions. However, I would like to know another tool too.

我知道MySQL很少。文件位于.xml -format中。目前在.xml中读取数据相当困难。将xml文件转换为MySQL然后读取文件可能更有效。我只知道MS db -tool用于此类操作。但是,我也想知道另一种工具。

Problems

  1. to parse .xml to SQL -queries such that MySQL understand it. We need to know data structures of the data.
  2. 解析.xml到SQL -queries,以便MySQL了解它。我们需要知道数据的数据结构。

  3. to run the data in MySQL
  4. 在MySQL中运行数据

  5. to find some tool similar to MS db -tool by which we can read the data effectively
  6. 找到一些类似于MS db -tool的工具,通过它我们可以有效地读取数据

How do you read SO's data dump effectively?

你如何有效地阅读SO的数据转储?

--

[edit]

  1. How can you run the 523 SQL queries to create the database in your terminal? I have the commands at the moment in a text -file.
  2. 如何运行523 SQL查询在终端中创建数据库?我目前在文本文件中有命令。

  3. How can you "switch to [the recovery mode] to a simple recovery mode in the database?
  4. 如何“切换到[恢复模式]到数据库中的简单恢复模式?

1 个解决方案

#1


I made my first ever python program to read them and output SQL insert statements for use with mysql (It's ugly but worked). You'll need to create the tables first though by hand.

我制作了我的第一个python程序来读取它们并输出SQL插入语句以便与mysql一起使用(它很难看但很有效)。您需要先手动创建表格。

import xml.sax.handler
import xml.sax
import sys
class SOHandler(xml.sax.handler.ContentHandler):
        def __init__(self):
                self.errParse = 0

        def startElement(self, name, attributes):
                if name != "row":
                        self.table = name;
                        self.outFile = open(name+".sql","w")
                        self.errfile = open(name+".err","w")
                else:
                        skip = 0
                        currentRow = u"insert into "+self.table+"("
                        for attr in attributes.keys():
                                currentRow += str(attr) + ","
                        currentRow = currentRow[:-1]
                        currentRow += u") values ("
                        for attr in attributes.keys():
                                try:
                                        currentRow += u'"{0}",'.format(attributes[attr].replace('\\','\\\\').replace('"', '\\"').replace("'", "\\'"))
                                except UnicodeEncodeError:
                                        self.errParse += 1;
                                        skip = 1;
                                        self.errfile.write(currentRow)
                        if skip != 1:
                                currentRow = currentRow[:-1]
                                currentRow += u");"
                                #print len(attributes.keys())
                                self.outFile.write(currentRow.encode("utf-8"))
                                self.outFile.write("\n")
                                self.outFile.flush()
                                print currentRow.encode("utf-8");

        def characters(self, data):
                pass

        def endElement(self, name):
                pass

if len(sys.argv) < 2:
        print "Give me an xml file argument!"
        sys.exit(1)

parser = xml.sax.make_parser()
handler = SOHandler()
parser.setContentHandler(handler)
parser.parse(sys.argv[1])
print handler.errParse

#1


I made my first ever python program to read them and output SQL insert statements for use with mysql (It's ugly but worked). You'll need to create the tables first though by hand.

我制作了我的第一个python程序来读取它们并输出SQL插入语句以便与mysql一起使用(它很难看但很有效)。您需要先手动创建表格。

import xml.sax.handler
import xml.sax
import sys
class SOHandler(xml.sax.handler.ContentHandler):
        def __init__(self):
                self.errParse = 0

        def startElement(self, name, attributes):
                if name != "row":
                        self.table = name;
                        self.outFile = open(name+".sql","w")
                        self.errfile = open(name+".err","w")
                else:
                        skip = 0
                        currentRow = u"insert into "+self.table+"("
                        for attr in attributes.keys():
                                currentRow += str(attr) + ","
                        currentRow = currentRow[:-1]
                        currentRow += u") values ("
                        for attr in attributes.keys():
                                try:
                                        currentRow += u'"{0}",'.format(attributes[attr].replace('\\','\\\\').replace('"', '\\"').replace("'", "\\'"))
                                except UnicodeEncodeError:
                                        self.errParse += 1;
                                        skip = 1;
                                        self.errfile.write(currentRow)
                        if skip != 1:
                                currentRow = currentRow[:-1]
                                currentRow += u");"
                                #print len(attributes.keys())
                                self.outFile.write(currentRow.encode("utf-8"))
                                self.outFile.write("\n")
                                self.outFile.flush()
                                print currentRow.encode("utf-8");

        def characters(self, data):
                pass

        def endElement(self, name):
                pass

if len(sys.argv) < 2:
        print "Give me an xml file argument!"
        sys.exit(1)

parser = xml.sax.make_parser()
handler = SOHandler()
parser.setContentHandler(handler)
parser.parse(sys.argv[1])
print handler.errParse