如何在SPARQL中过滤DBpedia结果

时间:2023-01-13 23:34:38

I have a little problem... if I have this simple SPARQL query

我有一点问题...如果我有这个简单的SPARQL查询

SELECT ?abstract 
WHERE {
<http://dbpedia.org/resource/Mitsubishi> <http://dbpedia.org/ontology/abstract> ?abstract.
FILTER langMatches( lang(?abstract), 'en')}

I have this result: SPARQL Result and it has a non-English character... is there any idea how to remove them and retrieve just English words?

我有这样的结果:SPARQL结果,它有一个非英文字符...有任何想法如何删除它们并只检索英文单词?

1 个解决方案

#1


You'll need to define exactly what characters you want and don't want in your result, but you can use replace to replace characters outside of a range with, e.g., empty strings. If you wanted to exclude all but the Basic Latin, Latin-1 Supplement, Latin Extended-A, and Latin Extended-B ranges, (which ends up being \u0000–\u024f) you could do the following:

您需要在结果中准确定义您想要和不想要的字符,但是您可以使用replace来替换范围之外的字符,例如空字符串。如果您想要排除除Basic Basic,Latin-1 Supplement,Latin Extended-A和Latin Extended-B范围之外的所有内容(最终为\ u0000 \ u024f),您可以执行以下操作:

SELECT ?abstract ?cleanAbstract
WHERE {
  dbpedia:Mitsubishi dbpedia-owl:abstract ?abstract 
  FILTER langMatches( lang(?abstract), 'en')
  bind(replace(?abstract,"[^\\x{0000}-\\x{024f}]","") as ?cleanAbstract)
}

SPARQL results

Or even simpler:

甚至更简单:

SELECT (replace(?abstract_,"[^\\x{0000}-\\x{024f}]","") as ?abstract)
WHERE {
  dbpedia:Mitsubishi dbpedia-owl:abstract ?abstract_
  FILTER langMatches(lang(?abstract_), 'en')
}

SPARQL results

The Mitsubishi Group (, Mitsubishi Gurūpu) (also known as the Mitsubishi Group of Companies or Mitsubishi Companies) is a group of autonomous Japanese multinational companies covering a range of businesses which share the Mitsubishi brand, trademark, and legacy.The Mitsubishi group of companies form a loose entity, the Mitsubishi Keiretsu, which is often referenced in Japanese and US media and official reports; in general these companies all descend from the zaibatsu of the same name. The top 25 companies are also members of the Mitsubishi Kin'yōkai, or "Friday Club", and meet monthly. In addition the Mitsubishi.com Committee exists to facilitate communication and access of the Mitsubishi brand through a portal web site.

三菱集团(MitsubishiGurūpu)(也称为三菱集团公司或三菱公司)是一组日本自治的跨国公司,业务范围涵盖三菱品牌,商标和遗产。三菱集团公司形成一个松散的实体,三菱Keiretsu,在日本和美国的媒体和官方报告中经常被引用;一般来说,这些公司都来自同名的zaibatsu。前25家公司也是三菱Kin'yōkai的成员,或“星期五俱乐部”,每月见面。此外,Mitsubishi.com委员会的存在是为了通过门户网站促进三菱品牌的交流和访问。

You may find the Latin script in Unicode Wikipedia article useful.

您可能会发现Unicode Wikipedia文章中的拉丁文脚本很有用。

#1


You'll need to define exactly what characters you want and don't want in your result, but you can use replace to replace characters outside of a range with, e.g., empty strings. If you wanted to exclude all but the Basic Latin, Latin-1 Supplement, Latin Extended-A, and Latin Extended-B ranges, (which ends up being \u0000–\u024f) you could do the following:

您需要在结果中准确定义您想要和不想要的字符,但是您可以使用replace来替换范围之外的字符,例如空字符串。如果您想要排除除Basic Basic,Latin-1 Supplement,Latin Extended-A和Latin Extended-B范围之外的所有内容(最终为\ u0000 \ u024f),您可以执行以下操作:

SELECT ?abstract ?cleanAbstract
WHERE {
  dbpedia:Mitsubishi dbpedia-owl:abstract ?abstract 
  FILTER langMatches( lang(?abstract), 'en')
  bind(replace(?abstract,"[^\\x{0000}-\\x{024f}]","") as ?cleanAbstract)
}

SPARQL results

Or even simpler:

甚至更简单:

SELECT (replace(?abstract_,"[^\\x{0000}-\\x{024f}]","") as ?abstract)
WHERE {
  dbpedia:Mitsubishi dbpedia-owl:abstract ?abstract_
  FILTER langMatches(lang(?abstract_), 'en')
}

SPARQL results

The Mitsubishi Group (, Mitsubishi Gurūpu) (also known as the Mitsubishi Group of Companies or Mitsubishi Companies) is a group of autonomous Japanese multinational companies covering a range of businesses which share the Mitsubishi brand, trademark, and legacy.The Mitsubishi group of companies form a loose entity, the Mitsubishi Keiretsu, which is often referenced in Japanese and US media and official reports; in general these companies all descend from the zaibatsu of the same name. The top 25 companies are also members of the Mitsubishi Kin'yōkai, or "Friday Club", and meet monthly. In addition the Mitsubishi.com Committee exists to facilitate communication and access of the Mitsubishi brand through a portal web site.

三菱集团(MitsubishiGurūpu)(也称为三菱集团公司或三菱公司)是一组日本自治的跨国公司,业务范围涵盖三菱品牌,商标和遗产。三菱集团公司形成一个松散的实体,三菱Keiretsu,在日本和美国的媒体和官方报告中经常被引用;一般来说,这些公司都来自同名的zaibatsu。前25家公司也是三菱Kin'yōkai的成员,或“星期五俱乐部”,每月见面。此外,Mitsubishi.com委员会的存在是为了通过门户网站促进三菱品牌的交流和访问。

You may find the Latin script in Unicode Wikipedia article useful.

您可能会发现Unicode Wikipedia文章中的拉丁文脚本很有用。