[转] Making GTFS query more convenient

时间:2022-09-13 14:30:58

url:http://ontrakinfo.wordpress.com/2012/10/29/making-gtfs-query-more-convenient/

这简直说出了我的心声。

I have been spending a lot of time parsing the GTFS database. On the surface it is just a simple CSV files. But to extract useful information from GTFS is often unexpected difficult. For example, find the stops from a bus line in sequential order might sounds like basic thing to do. But it is actually non-trivial with GTFS.

One reason is transit service is more complex it seems. It might seems a bus service just hit all the stops in sequence. But the actual service has a lot of variables. The schedule is often different in weekend compare to weekdays. And so does the exact route that it covers. Sometimes a bus is scheduled to run a short route rather than covering the whole length. In more complex case there can be branching where there is a common main trunk and then the buses split to serve two or more alternative destination.

This is the reason why in GTFS one “route” may associate with multiple “shapes”. To find out what shapes are associate with a route, we will have to make a query like this

SELECT
shape_id
FROM route
JOIN trips
JOIN shape
GROUP BY shape_id;

To find out the stops is even more complex. Here we need to join one more table the stop_times. It is also the biggest tables in the GTFS. So this is also the most computation intensive query to do.

SELECT
shape_id, stop_id
FROM route
JOIN trips
JOIN stop_times
JOIN stops
GROUP BY shape_id, stop_id;

Still most people have a clear concept of what a transit line is where it runs. It shouldn’t be such a pain to compute. A more useful structure should look like below.

    GTFS             More Useful
  Structure           Structure     route              line
     |                   |
     |                   V
     |                 route*
     |                   | \
     |    shape          |  +-> route_shape
     |     ^             |  |
     |    /              |  +-> route_stops*
     |   /               |
     V  /                V
    trips              trips
     |                   |
     |        stops      |          stops
     |        ^          |
     |       /           |
     V      /            V
    stop_times         stop_times

Here a shift the terminology a bit. The top level entity is a line (i.e. GTFS’ route). This is service that people know of, like a numbered bus line or a metro line. Below that is routes. These are the collection of alternative routes a line may run. The routes are not explicitly represented in GTFS. You can find that by querying all unique shape_id using the first SQL. Another missing piece is the stops. If we can pre-compute all the route_stops using the second SQL once, for the most part we don’t need the giant stop_times table. For applications that do not deal with scheduled time, this is a huge saver. The is one assumption my structure makes though. It is that different lines do not shape that same route. If should be a reasonable assumption. And if there is indeed share route and shape, we should just replicated them as two separate entities.

The original GTFS structure seems to have a transit operator centric view. It allows them maximum flexibility to author and publish their service data. But for application developers, it is not structured for easy traversal. By adding the route and route_stops tables as indicated, it will greatly facilitate the query and operation of transit information.

[转] Making GTFS query more convenient的更多相关文章

  1. Spring Boot Reference Guide

    Spring Boot Reference Guide Authors Phillip Webb, Dave Syer, Josh Long, Stéphane Nicoll, Rob Winch,  ...

  2. Using dojo/query(翻译)

    In this tutorial, we will learn about DOM querying and how the dojo/query module allows you to easil ...

  3. Query classification; understanding user intent

    http://vervedevelopments.com/Blog/query-classification-understanding-user-intent.html What exactly i ...

  4. The 5th tip of DB Query Analyzer

    The 5th tip of DB Query Analyzer             Ma Genfeng   (Guangdong UnitollServices incorporated, G ...

  5. Data access between different DBMS and other txt/csv data source by DB Query Analyzer

        1 About DB Query Analyzer DB Query Analyzer is presented by Master Genfeng,Ma from Chinese Mainl ...

  6. How to generate the complex data regularly to Ministry of Transport of P.R.C by DB Query Analyzer

    How to generate the complex data regularly to Ministry of Transport of P.R.C by DB Query Analyzer 1 ...

  7. Install and run DB Query Analyzer 6.04 on Microsoft Windows 10

          Install and run DB Query Analyzer 6.04 on Microsoft Windows 10  DB Query Analyzer is presented ...

  8. DB Query Analyzer 6.04 is distributed, 78 articles concerned have been published

        DB Query Analyzer 6.04 is distributed,78 articles concerned have been published  DB Query Analyz ...

  9. FluentData -Micro ORM with a fluent API that makes it simple to query a database 【MYSQL】

    官方地址:http://fluentdata.codeplex.com/documentation MYSQL: MySQL through the MySQL Connector .NET driv ...

随机推荐

  1. 【Net跨平台第一步】逆天带你零基础Linux入门【更新完毕】

    部分讲义:(视频已删,后期以文档形式发布)

  2. vnc

    Xvnc, Xvnc-core, vncagent, vncinitconfig, vnclicense, vnclicensehelper, vnclicensewiz, vncpasswd, vn ...

  3. 有关git的换行符的处理问题

    签入签出时对换行符的操作: #签出时将LF转换为CRLF,签入时将CRLF转换为LF git config --global core.autocrlf true #签出时不转换,签入时转换为LF g ...

  4. Luogu3373【模板】线段树2

    P3373[模板]线段树2 题目描述 如题,已知一个数列,你需要进行下面两种操作: 1.将某区间每一个数加上x 2.将某区间每一个数乘上x 3.求出某区间每一个数的和 输入输出格式 输入格式: 第一行 ...

  5. Centos7.2 编译安装方式搭建 phpMyAdmin

    1. 下载 编译 安装 pcre tar zxvf pcre-8.41.tar.gz cd pcre-8.41 ./configure --prefix=/opt/local/pcre-8.41 ma ...

  6. JLINK(SEGGER)灯不亮 USB不识别固件修复、clone修改

    今天调SMT32插拔几下,JLINK竟然挂掉了网上找了这个教程,搞了半天才搞好,驱动没装好!WIN7系统,自动安装的驱动是GPS.COM10,郁闷,错误来的.应该是:atm6124.sys.要手动选择 ...

  7. VUE中使用geetest滑动验证码

    一,准备工作:服务端部署 下载文件gt.gs: https://github.com/GeeTeam/gt3-python-sdk 需要说明的是这里的gt.js文件,它用于加载对应的验证JS库. 1. ...

  8. .Net Framework 下运行项目提示.dll类库程序集未能加载

     咨询个问题..项目可以生成成功,运行时总提示未能加载程序集,而且这个程序集就是当前webApi项目的dll,这是怎么回事.. 还一个奇怪的现象,刚开始报缺失xxx.dll, 那个dll是本解决方案里 ...

  9. week1 - Python基础1 介绍、基本语法、流程控制

    知识内容: 1.python介绍 2.变量及输入输出 3.分支结构 4.循环结构 一.python介绍 Python主要应用领域: 云计算: 云计算最火的语言, 典型应用OpenStack WEB开发 ...

  10. 【转载】Linux常用命令

    Linux常用命令大全(非常全!!!) 转载出处:https://www.cnblogs.com/yjd_hycf_space/p/7730690.html 系统信息 arch 显示机器的处理器架构( ...