mysql和postgresql中的诸多不同 logistic回归P 2016.04.13回顾

1、在现在数据库的基础上完善了v3_model的sql，不包括自己计算和商业规则部分，其中理解了一下logistic regression最后求值的时候用1-(1/1+exp(-Z))，我问同事这是good的概率还是bad的概率，同事说这个是互补的，带回到训练集去看，和训练集标记的1和0无关，说1-(1/(1+exp(-Z)))往往是bad的概率，我有些存疑！

2、主要着手修改parse.py以前是解析了写到postgresql，现在我要改了写到mysql，这个过程让我发现了，mysql和postgresql的诸多的区别，回顾一下

0x01. python用于mysql的库叫MySQLdb，psycopg2十分类似，包括他们的方法也很类似，都有connect、然后用conn生成一个游标cur，然后用cur.execute去执行SQL语句，用commit去提交变化（但是我发现创建表的话mysql不需要调用这个方法，而postgresql则需要），关闭等等方法也是相同的；

0x02. MySQLdb.connect的参数名称略微有一些不同，像这样

MySQLdb.connect(host=options.db_host, port=int(options.db_port), db=options.db_database, user=options.db_user, passwd=options.db_password)

port还必须转成int

psycopg2.connect形如这样：

psycopg2.connect(host=options.db_host, port=options.db_port, database=options.db_database, user=options.db_user, password=options.db_password)

要捕获的错误也不同了，但是都是类似的，MySQLdb对应的异常捕获就是MySQLdb.Error

0x03. 当然两个数据库的系统表上也有一些差异，

select ordinal_position,column_name,data_type from information_schema.columns where table_schema = '%s' and table_name = '%s'

information_schema.columns这个表的三个字段在两个数据库都是相同的

0x04. 创建表涉及到字段类型和自增字段的设置等有比较多的不同，postgresql是这样

create table {table_schema}.{table_name} (
                id serial primary key,
                created_at timestamp with time zone default now() not null,
                national_id text not null,
                requested_at timestamp with time zone not null,
                responded_at timestamp with time zone not null,
                filename text
            );
            create index {table_schema}_{table_name}__national_id on {table_schema}.{table_name} (national_id);
            create index {table_schema}_{table_name}__created_at on {table_schema}.{table_name} (created_at);
            create index {table_schema}_{table_name}__filename on {table_schema}.{table_name} (filename);
            create index {table_schema}_{table_name}__national_id__responded_at on {table_schema}.{table_name} (national_id, responded_at);
            create index {table_schema}_{table_name}__national_id__requested_at on {table_schema}.{table_name} (national_id, requested_at);

改成mysql对应是这样

create table {table_schema}.{table_name} (
                id int auto_increment primary key,
                created_at timestamp not null default now(),
                national_id VARCHAR(20) not null,
                requested_at datetime not null,
                responded_at datetime not null,
                filename VARCHAR(100)
            );
            create index {table_schema}_{table_name}__national_id on {table_schema}.{table_name} (national_id);
            create index {table_schema}_{table_name}__created_at on {table_schema}.{table_name} (created_at);
            create index {table_schema}_{table_name}__filename on {table_schema}.{table_name} (filename);
            create index {table_schema}_{table_name}__national_id__responded_at on {table_schema}.{table_name} (national_id, responded_at);
            create index {table_schema}_{table_name}__national_id__requested_at on {table_schema}.{table_name} (national_id, requested_at);

可以看到serial = auto_increment , timestamp with time zone = timestamp ，一般来说mysql表中只有一个timestamp字段，听说是系统缺省取当前时间，我并没测试，其他又含日期和时间的字段就用datetime，mysql中也有test这个字段，但是一般是存大文本的，而且我偷懒用text建立索引的时候会报错，原因不明，我最后只好用更合适的varchar跟长度，not null仍然是not null，default也是相同

对于另外一个系统information_schema.tables，mysql有各个表的类型，如base_table，但是没有这个表是否可写入的字段，我直接省去了这个检测，只检测teble_type，两个数据库，我昨天发现的不同以上！

秒客网

mysql和postgresql中的诸多不同 logistic回归P 2016.04.13回顾

相关文章