{"id":657,"date":"2006-08-27T22:12:46","date_gmt":"2006-08-27T19:12:46","guid":{"rendered":"http:\/\/www.mummila.net\/nuudelisoppa\/index.php\/?p=657"},"modified":"2006-08-27T22:12:46","modified_gmt":"2006-08-27T19:12:46","slug":"utf-8-substr-fot-php","status":"publish","type":"post","link":"https:\/\/mummila.net\/nuudelisoppa\/2006\/08\/27\/utf-8-substr-fot-php\/","title":{"rendered":"UTF-8 substr fot PHP"},"content":{"rendered":"<blockquote><p>&#8220;[Pulling] out an arbitrary substring which happens to cut a 2 byte UTF-8 sequence breaks the string;<\/p>\n<pre><span class=\"kw2\">&lt;?php<\/span>\n<a target=\"_blank\" href=\"http:\/\/www.php.net\/header\"><span class=\"kw3\">header<\/span><\/a> <span class=\"br0\">(<\/span><span class=\"st0\">'Content-type: text\/html; charset=utf-8'<\/span><span class=\"br0\">)<\/span>;\n&nbsp;\n<span class=\"re0\">$haystack<\/span> = <span class=\"st0\">'I\u00f1t\u00ebrn\u00e2ti\u00f4n\u00e0liz\u00e6ti\u00f8n'<\/span>;\n&nbsp;\n<span class=\"co1\">\/\/ Position 13 is in the middle of the \u00f4 char<\/span>\n<span class=\"re0\">$substr<\/span> = <a target=\"_blank\" href=\"http:\/\/www.php.net\/substr\"><span class=\"kw3\">substr<\/span><\/a><span class=\"br0\">(<\/span><span class=\"re0\">$haystack<\/span>, <span class=\"nu0\">0<\/span>, <span class=\"nu0\">13<\/span><span class=\"br0\">)<\/span>;\n&nbsp;\n<a target=\"_blank\" href=\"http:\/\/www.php.net\/print\"><span class=\"kw3\">print<\/span><\/a> <span class=\"st0\">\"Substr: $substr&lt;br&gt;\"<\/span>;<\/pre>\n<p><code>$substr<\/code> now contains badly formed UTF-8 and your browser should display something wierd as a result (probably a ?)&#8221;<\/p>\n<p><a href=\"http:\/\/www.phpwact.org\/php\/i18n\/utf-8#strpos\">Handling UTF-8 with PHP&#8221;<\/a><br \/>\n<a href=\"http:\/\/www.phpwact.org\/\">phpwact.org<\/a><\/p>\n<p><em>a comment moved due to layout issues<\/em><\/p><\/blockquote>\n<p>To go around this limitation, I used the following replacement substr code, which I extracted from <a href=\"http:\/\/www.google.fi\/search?hl=fi&amp;q=%22UTF-8+friendly+replacement+functions%22&amp;btnG=Google-haku&amp;meta=lr%3Dlang_en%7Clang_fi\">&#8220;UTF-8 friendly replacement functions&#8221;<\/a> v0.2, by Niels Leenheer &amp; Andy Matsubara. For some reason, at the time of writing this, Google only seems to find a PDF version of this document.<\/p>\n<pre><code>function substr($str, $start , $length = NULL) {\n\n             preg_match_all('\/[\\x01-\\x7F]|[\\xC0-\\xDF][\\x80-\\xBF]|[\\xE0-\\xEF][\\x80-\\xBF][\\x80-\\xBF]\/', $str, $arr);\n\n             if (is_int($length))\n\n                   return implode('', array_slice($arr[0], $start, $length));\n\n             else\n\n                   return implode('', array_slice($arr[0], $start));\n\n      }<\/code><\/pre>\n","protected":false},"excerpt":{"rendered":"<p>&#8220;[Pulling] out an arbitrary substring which happens to cut a 2 byte UTF-8 sequence breaks the string; &lt;?php header (&#8216;Content-type: text\/html; charset=utf-8&#8217;); &nbsp; $haystack = &#8216;I\u00f1t\u00ebrn\u00e2ti\u00f4n\u00e0liz\u00e6ti\u00f8n&#8217;; &nbsp; \/\/ Position 13 is in the middle of the \u00f4 char $substr = substr($haystack, 0, 13); &nbsp; print &#8220;Substr: $substr&lt;br&gt;&#8221;; $substr now contains badly formed UTF-8 and your [&hellip;]<\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[1],"tags":[30],"class_list":["post-657","post","type-post","status-publish","format-standard","hentry","category-uncategorized","tag-bugsglitches"],"_links":{"self":[{"href":"https:\/\/mummila.net\/nuudelisoppa\/wp-json\/wp\/v2\/posts\/657","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/mummila.net\/nuudelisoppa\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/mummila.net\/nuudelisoppa\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/mummila.net\/nuudelisoppa\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/mummila.net\/nuudelisoppa\/wp-json\/wp\/v2\/comments?post=657"}],"version-history":[{"count":0,"href":"https:\/\/mummila.net\/nuudelisoppa\/wp-json\/wp\/v2\/posts\/657\/revisions"}],"wp:attachment":[{"href":"https:\/\/mummila.net\/nuudelisoppa\/wp-json\/wp\/v2\/media?parent=657"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/mummila.net\/nuudelisoppa\/wp-json\/wp\/v2\/categories?post=657"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/mummila.net\/nuudelisoppa\/wp-json\/wp\/v2\/tags?post=657"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}