hive提供了直接处理url的函数 parse_url
desc funtion 的解释是:
parse_url(url, partToExtract[, key]) - extracts a part from a URL 提取url的一部分。
partToExtract的选项包含[HOST,PATH,QUERY,REF,PROTOCOL,FILE,AUTHORITY,USERINFO]
使用方法
SELECT parse_url("https://i.cnblogs.com/EditPosts.aspx?postid=10489595","HOST");
--i.cnblogs.com
SELECT parse_url("https://i.cnblogs.com/EditPosts.aspx?postid=10489595","PATH");
--/EditPosts.aspx
SELECT parse_url("https://i.cnblogs.com/EditPosts.aspx?postid=10489595","QUERY");
--postid=10489595
SELECT parse_url("https://i.cnblogs.com/EditPosts.aspx?postid=10489595","REF");
--NULL
SELECT parse_url("https://i.cnblogs.com/EditPosts.aspx?postid=10489595","PROTOCOL");
--https
SELECT parse_url("https://i.cnblogs.com/EditPosts.aspx?postid=10489595","FILE");
--/EditPosts.aspx?postid=10489595
SELECT parse_url("https://i.cnblogs.com/EditPosts.aspx?postid=10489595","AUTHORITY");
--i.cnblogs.com
SELECT parse_url("https://i.cnblogs.com/EditPosts.aspx?postid=10489595","USERINFO");
--NULL
常用的参数有 "HOST" 和 "PATH"