PostgreSQL的正则表达式用于从URL /网站获取子域的域名

时间:2022-08-23 10:41:54

Basically, I need to get those rows which contain domain and subdomain name from a URL or the whole website name excluding www.

基本上,我需要从URL或除www以外的整个网站名称获取包含域和子域名的行。

My DB table looks like this:

我的数据库表看起来像这样:

+----------+------------------------+
|    id    |    website             |
+----------+------------------------+
| 1        | https://www.google.com |
+----------+------------------------+
| 2        | http://www.google.co.in|
+----------+------------------------+
| 3        | www.google.com         |
+----------+------------------------+
| 4        | www.google.co.in       |
+----------+------------------------+
| 5        | google.com             |
+----------+------------------------+
| 6        | google.co.in           |
+----------+------------------------+
| 7        | http://google.co.in    |
+----------+------------------------+

Expected output:

预期产量:

google.com
google.co.in
google.com
google.co.in
google.com
google.co.in
google.co.in

My Postgres Query looks like this:

我的Postgres查询看起来像这样:

select id, substring(website from '.*://([^/]*)') as website_domain from contacts

But above query give blank websites. So, how I can get the desired output?

但上面的查询给出了空白网站。那么,我如何获得所需的输出?

2 个解决方案

#1


2  

You may use

你可以用

SELECT REGEXP_REPLACE(website, '^(https?://)?(www\.)?', '') from tbl;

See the regex demo.

请参阅正则表达式演示。

Details

细节

  • ^ - start of string
  • ^ - 字符串的开头
  • (https?://)? - 1 or 0 occurrences of http:// or https://
  • (HTTPS://)? - 出现1或0次http://或https://
  • (www\.)? - 1 or 0 occurrences of www.
  • (万维网\。)? - 1或0次出现的www。

See the PostgreSQL demo:

查看PostgreSQL演示:

CREATE TABLE tb1
    (website character varying)
;

INSERT INTO tb1
    (website)
VALUES
    ('https://www.google.com'),
    ('http://www.google.co.in'),
    ('www.google.com'),
    ('www.google.co.in'),
    ('google.com'),
    ('google.co.in'),
    ('http://google.co.in')
;

SELECT REGEXP_REPLACE(website, '^(https?://)?(www\.)?', '') from tb1;

Result:

结果:

PostgreSQL的正则表达式用于从URL /网站获取子域的域名

#2


4  

you must use the "non capturing" match ?: to cope with the non "http://" websites

你必须使用“非捕获”匹配?:以应对非“http://”网站

like

喜欢

select 
id, 
substring(website from '(?:.*://)?(?:www\.)?([^/]*)')
as website_domain 

from contacts

http://sqlfiddle.com/#!17/197fb/14

http://sqlfiddle.com/#!17/197fb/14

https://www.postgresql.org/docs/9.3/static/functions-matching.html#POSIX-ATOMS-TABLE

https://www.postgresql.org/docs/9.3/static/functions-matching.html#POSIX-ATOMS-TABLE

#1


2  

You may use

你可以用

SELECT REGEXP_REPLACE(website, '^(https?://)?(www\.)?', '') from tbl;

See the regex demo.

请参阅正则表达式演示。

Details

细节

  • ^ - start of string
  • ^ - 字符串的开头
  • (https?://)? - 1 or 0 occurrences of http:// or https://
  • (HTTPS://)? - 出现1或0次http://或https://
  • (www\.)? - 1 or 0 occurrences of www.
  • (万维网\。)? - 1或0次出现的www。

See the PostgreSQL demo:

查看PostgreSQL演示:

CREATE TABLE tb1
    (website character varying)
;

INSERT INTO tb1
    (website)
VALUES
    ('https://www.google.com'),
    ('http://www.google.co.in'),
    ('www.google.com'),
    ('www.google.co.in'),
    ('google.com'),
    ('google.co.in'),
    ('http://google.co.in')
;

SELECT REGEXP_REPLACE(website, '^(https?://)?(www\.)?', '') from tb1;

Result:

结果:

PostgreSQL的正则表达式用于从URL /网站获取子域的域名

#2


4  

you must use the "non capturing" match ?: to cope with the non "http://" websites

你必须使用“非捕获”匹配?:以应对非“http://”网站

like

喜欢

select 
id, 
substring(website from '(?:.*://)?(?:www\.)?([^/]*)')
as website_domain 

from contacts

http://sqlfiddle.com/#!17/197fb/14

http://sqlfiddle.com/#!17/197fb/14

https://www.postgresql.org/docs/9.3/static/functions-matching.html#POSIX-ATOMS-TABLE

https://www.postgresql.org/docs/9.3/static/functions-matching.html#POSIX-ATOMS-TABLE