快速关系数据库,简单使用Python

时间:2022-12-04 16:56:07

For my link scraping program (written in python3.3) I want to use a database to store around 100.000 websites:

对于我的链接抓取程序(用python3.3编写),我想使用一个数据库来存储大约10万个网站:

  • just the URL,
  • 的网址,
  • a time stamp
  • 一个时间戳
  • and for each website a list of several properties
  • 每个网站都有几个属性的列表

I don't have knowledge about databases, but found the following may fit my purpose:

我对数据库并不了解,但我发现下面这些可能符合我的目的:

  • Postgresql
  • Postgresql
  • SQLite
  • SQLite
  • Firebird
  • 火鸟

I'm interested in speed (to access the database and to get the wanted information). For example: for website x does property y exist and if yes read it. The speed of writing is of course also important.

我对速度感兴趣(访问数据库和获取所需信息)。例如:网站x是否存在属性y,如果存在,请阅读。写作的速度当然也很重要。

My question: Are there big differences in speed or does it not matter for my small program? Maybe someone can tell which database fits my requirements (and is easy to handle with Python).

我的问题是:在速度上有很大的差异吗?也许有人能告诉我哪个数据库符合我的要求(而且很容易处理Python)。

2 个解决方案

#1


4  

The size and scale of your database is not particularly large, and it's well within the scope of almost any off-the-shelf database solution.

您的数据库的大小和规模不是特别大,而且它在几乎所有现成的数据库解决方案的范围内都很好。

Basically, what you're going to do is install the database server on your machine and it will come up on a given port. You then can install a library in Python to access it.

基本上,您要做的是在您的机器上安装数据库服务器,它将出现在给定的端口上。然后可以在Python中安装一个库来访问它。

For example, if you want to use Postgresql, you'll install it on your machine and it will come up attached to some port like 5000, or port 5432.

例如,如果您想使用Postgresql,您将在您的机器上安装它,它将连接到某个端口,比如5000,或者端口5432。

But if you just have the information you're talking about to store and retrieve, you probably want to go with a NoSQL solution because it's very easy.

但是,如果您只有要存储和检索的信息,您可能需要使用NoSQL解决方案,因为它非常简单。

For example, you can install mongodb on your server, then install pymongo. The tutorial for pymongo will teach you pretty much everything you need for your application.

例如,您可以在服务器上安装mongodb,然后安装pymongo。pymongo教程将教会您应用程序所需的几乎所有东西。

#2


5  

If speed is the main criteria, then i would suggest to go with a in-memory database. Take a look at http://docs.python.org/2/library/sqlite3.html

如果速度是主要的标准,那么我建议使用内存中的数据库。看看http://docs.python.org/2/library/sqlite3.html

it can be used as a normal database too, for the in-memory mode use the below and the db should get created in the RAM itself and hence much faster run-time access.

它也可以作为一个普通的数据库使用,因为内存模式使用下面的命令,并且db应该在RAM本身中创建,因此可以更快地访问运行时。

import sqlite3
conn = sqlite3.connect(':memory:')

#1


4  

The size and scale of your database is not particularly large, and it's well within the scope of almost any off-the-shelf database solution.

您的数据库的大小和规模不是特别大,而且它在几乎所有现成的数据库解决方案的范围内都很好。

Basically, what you're going to do is install the database server on your machine and it will come up on a given port. You then can install a library in Python to access it.

基本上,您要做的是在您的机器上安装数据库服务器,它将出现在给定的端口上。然后可以在Python中安装一个库来访问它。

For example, if you want to use Postgresql, you'll install it on your machine and it will come up attached to some port like 5000, or port 5432.

例如,如果您想使用Postgresql,您将在您的机器上安装它,它将连接到某个端口,比如5000,或者端口5432。

But if you just have the information you're talking about to store and retrieve, you probably want to go with a NoSQL solution because it's very easy.

但是,如果您只有要存储和检索的信息,您可能需要使用NoSQL解决方案,因为它非常简单。

For example, you can install mongodb on your server, then install pymongo. The tutorial for pymongo will teach you pretty much everything you need for your application.

例如,您可以在服务器上安装mongodb,然后安装pymongo。pymongo教程将教会您应用程序所需的几乎所有东西。

#2


5  

If speed is the main criteria, then i would suggest to go with a in-memory database. Take a look at http://docs.python.org/2/library/sqlite3.html

如果速度是主要的标准,那么我建议使用内存中的数据库。看看http://docs.python.org/2/library/sqlite3.html

it can be used as a normal database too, for the in-memory mode use the below and the db should get created in the RAM itself and hence much faster run-time access.

它也可以作为一个普通的数据库使用,因为内存模式使用下面的命令,并且db应该在RAM本身中创建,因此可以更快地访问运行时。

import sqlite3
conn = sqlite3.connect(':memory:')