在PHP中将UTF-8字符串转换为7位XML

时间:2022-10-24 23:35:08

How can UTF-8 strings (i.e. 8-bit string) be converted to/from XML-compatible 7-bit strings (i.e. printable ASCII with numeric entities)?

如何将UTF-8字符串(即8位字符串)转换为XML兼容的7位字符串(即带有数字实体的可打印ASCII)?

i.e. an encode() function such that:

即encode()函数,使得:

encode("“£”") -> "“£”"

decode() would also be useful:

decode()也很有用:

decode("“£”") -> "“£”"

PHP's htmlenties()/html_entity_decode() pair does not do the right thing:

PHP的htmlenties()/ html_entity_decode()对没有做正确的事情:

htmlentities(html_entity_decode("“£”")) ->
  "“£”"

Laboriously specifying types helps a little, but still returns XML-incompatible named entities, not numeric ones:

费力地指定类型会有所帮助,但仍会返回与XML不兼容的命名实体,而不是数字实体:

htmlentities(html_entity_decode("“£”", ENT_QUOTES, "UTF-8"), ENT_QUOTES, "UTF-8") ->
  "“£”"

2 个解决方案

#1


6  

mb_encode_numericentity does that exactly.

mb_encode_numericentity完全做到了。

#2


0  

It's a bit of a workaround, but I read a bit about iconv() and i don't think it'll give you numeric entities (not put to the test)

这是一个解决方法,但我读了一下关于iconv(),我不认为它会给你数字实体(没有进行测试)

function decode( $string )
{
  $doc = new DOMDocument( "1.0", "UTF-8" ); 
  $doc->LoadXML( '<?xml version="1.0" encoding="UTF-8"?>'."\n".'<x />', LIBXML_NOENT );
  $doc->documentElement->appendChild( $doc->createTextNode( $string ) );
  $output = $doc->saveXML( $doc );
  $output = preg_replace( '/<\?([^>]+)\?>/', '', $output ); 
  $output = str_replace( array( '<x>', '</x>' ), array( '', '' ), $output );
  return trim( $output );
}

This however, I have put to the test. I might do the reverse later, just don't hold your breath ;-)

但是,我已经进行了测试。我可能会反过来做,只是不要屏住呼吸;-)

#1


6  

mb_encode_numericentity does that exactly.

mb_encode_numericentity完全做到了。

#2


0  

It's a bit of a workaround, but I read a bit about iconv() and i don't think it'll give you numeric entities (not put to the test)

这是一个解决方法,但我读了一下关于iconv(),我不认为它会给你数字实体(没有进行测试)

function decode( $string )
{
  $doc = new DOMDocument( "1.0", "UTF-8" ); 
  $doc->LoadXML( '<?xml version="1.0" encoding="UTF-8"?>'."\n".'<x />', LIBXML_NOENT );
  $doc->documentElement->appendChild( $doc->createTextNode( $string ) );
  $output = $doc->saveXML( $doc );
  $output = preg_replace( '/<\?([^>]+)\?>/', '', $output ); 
  $output = str_replace( array( '<x>', '</x>' ), array( '', '' ), $output );
  return trim( $output );
}

This however, I have put to the test. I might do the reverse later, just don't hold your breath ;-)

但是,我已经进行了测试。我可能会反过来做,只是不要屏住呼吸;-)