使用SQL拆分URL并添加到数据库

时间:2022-09-27 12:03:47

I am trying to split URL and get each part as domain, category, subcategory etc and insert each part into a table. For example:

我正在尝试拆分URL并将每个部分作为域,类别,子类别等,并将每个部分插入表中。例如:

"www.mydomain.com/toolsanddownloads/dailymealplanner.html?languageid=6"

The purpose is to do 404 redirect if page don't exist. I am tryomg to write SQL statement usinng CTE and get each part of the domain

如果页面不存在,目的是进行404重定向。我是tryomg使用CTE编写SQL语句并获取域的每个部分

;with cte AS
(
 SELECT 
    CASE 
    WHEN RIGHT(RTRIM(URL),1) = '/' THEN LEFT(URL,LEN(URL)-1)
    WHEN RIGHT(RTRIM(URL),5) = '.html' THEN LEFT(URL,LEN(URL)-5)
    ELSE URL
    END AS URL1,
       StartPos = CharIndex('//', URL)+2
  FROM [dbo].[404RedirectTemp]
) 

SELECT URL1, SUBSTRING(URL1, 8, CHARINDEX('/', URL1, 9) - 8) AS DomainName,
       REVERSE(SUBSTRING(REVERSE(URL1), CHARINDEX('?', REVERSE(URL1)) + 1,
       CHARINDEX('/', REVERSE(URL1)) - CHARINDEX('?', REVERSE(URL1)) -1)) AS CategoryName,
       SUBSTRING(URL1, CHARINDEX('?', URL1) + 1, LEN(URL1)) AS QueryParameter
FROM cte;

I an getting always the last bit for category name and is wrong because some URL's are http://www.mydomain.com/toolsanddownloads/dailymealplanner.html?languageid=6

我总是得到类别名称的最后一点,并且是错误的,因为一些URL是http://www.mydomain.com/toolsanddownloads/dailymealplanner.html?languageid=6

some

一些

"www.mydomain.com/toolsanddownloads"
"www.mydomain.com/toolsanddownloads/dailymealplanner.html"

What i want to achieve is is no matter how many sections URL has I want to get them all as columns: domain, categories, subcategories, brand, product

我想要实现的是,无论URL有多少部分,我都希望将它们全部作为列:域,类别,子类别,品牌,产品

If domain has only categories to get categories, if categories and subcategories to get subcategories

如果域只有类别来获取类别,如果类别和子类别获取子类别

i have over 4000 URL in temp table which i want to loop through each one and update other table for 404 redirect

我在临时表中有超过4000个URL,我想遍历每个URL并更新其他表以进行404重定向

1 个解决方案

#1


2  

How about converting to rows and treating like an array index. For example:

如何转换为行并像处理数组索引一样处理。例如:

Lets setup sample environment

让我们设置样本环境

create table #url (id int, url varchar(500));
insert into #url select 1, 'http://*.com/questions/18660573/split-url-using-sql-and-add-to-database';
insert into #url select 2, 'www.mydomain.com/toolsanddownloads';
insert into #url select 3, 'www.mydomain.com/toolsanddownloads?test=2&b=4';
insert into #url select 4, 'www.mydomain.com/toolsanddownloads/dailymealplanner.html'

Clean the data up a bit (prob on a temp table to leave raw logs alone)

稍微清理一下数据(临时表上的问题,只留下原始日志)

update #url set url = replace(url, 'http://','');
update #url set url = replace(url, '?','/^');
update #url set url = replace(url, '&','^');

now the fun stuff

现在有趣的东西

with rslt as (
SELECT row_number() OVER( partition by id ORDER BY (SELECT 1)) depth 
, value = y.i.value('.', 'nvarchar(4000)')
  FROM 
  ( 
    SELECT id, x = CONVERT(XML, '<i>' 
      + REPLACE(url, '/', '</i><i>') 
      + '</i>').query('.')
      from #url  
  ) AS a CROSS APPLY x.nodes('i') AS y(i)
)
select case  
    when value like '^%' then 'querystring'
    when depth= 1 then 'Domain' 
    when depth=2 then 'categories'  
    when depth=3 then 'subcategories'  
    when depth=4 then 'brand' 
    when depth=5 then 'product' 
    end section
    , case when depth>1 and charindex('.', value)>0 
        then left(value,charindex('.', value)-1) 
         else value end section
    from rslt;

Results look like this:

结果如下:

Domain              *.com
categories          questions
subcategories       18660573
brand               split-url-using-sql-and-add-to-database
Domain              www.mydomain.com
categories          toolsanddownloads
Domain              www.mydomain.com
categories          toolsanddownloads
querystring         ^test=2^b=4
Domain              www.mydomain.com
categories          toolsanddownloads
subcategories       dailymealplanner

#1


2  

How about converting to rows and treating like an array index. For example:

如何转换为行并像处理数组索引一样处理。例如:

Lets setup sample environment

让我们设置样本环境

create table #url (id int, url varchar(500));
insert into #url select 1, 'http://*.com/questions/18660573/split-url-using-sql-and-add-to-database';
insert into #url select 2, 'www.mydomain.com/toolsanddownloads';
insert into #url select 3, 'www.mydomain.com/toolsanddownloads?test=2&b=4';
insert into #url select 4, 'www.mydomain.com/toolsanddownloads/dailymealplanner.html'

Clean the data up a bit (prob on a temp table to leave raw logs alone)

稍微清理一下数据(临时表上的问题,只留下原始日志)

update #url set url = replace(url, 'http://','');
update #url set url = replace(url, '?','/^');
update #url set url = replace(url, '&','^');

now the fun stuff

现在有趣的东西

with rslt as (
SELECT row_number() OVER( partition by id ORDER BY (SELECT 1)) depth 
, value = y.i.value('.', 'nvarchar(4000)')
  FROM 
  ( 
    SELECT id, x = CONVERT(XML, '<i>' 
      + REPLACE(url, '/', '</i><i>') 
      + '</i>').query('.')
      from #url  
  ) AS a CROSS APPLY x.nodes('i') AS y(i)
)
select case  
    when value like '^%' then 'querystring'
    when depth= 1 then 'Domain' 
    when depth=2 then 'categories'  
    when depth=3 then 'subcategories'  
    when depth=4 then 'brand' 
    when depth=5 then 'product' 
    end section
    , case when depth>1 and charindex('.', value)>0 
        then left(value,charindex('.', value)-1) 
         else value end section
    from rslt;

Results look like this:

结果如下:

Domain              *.com
categories          questions
subcategories       18660573
brand               split-url-using-sql-and-add-to-database
Domain              www.mydomain.com
categories          toolsanddownloads
Domain              www.mydomain.com
categories          toolsanddownloads
querystring         ^test=2^b=4
Domain              www.mydomain.com
categories          toolsanddownloads
subcategories       dailymealplanner