UTF-8 substr fot PHP

“[Pulling] out an arbitrary substring which happens to cut a 2 byte UTF-8 sequence breaks the string;
<?php
header ('Content-type: text/html; charset=utf-8');
 
$haystack = 'Iñtërnâtiônàlizætiøn';
 
// Position 13 is in the middle of the ô char
$substr = substr($haystack, 0, 13);
 
print "Substr: $substr<br>";
$substr now contains badly formed UTF-8 and your browser should display something wierd as a result (probably a ?)”

Handling UTF-8 with PHP”
phpwact.org

a comment moved due to layout issues

To go around this limitation, I used the following replacement substr code, which I extracted from “UTF-8 friendly replacement functions” v0.2, by Niels Leenheer & Andy Matsubara. For some reason, at the time of writing this, Google only seems to find a PDF version of this document.

function substr($str, $start , $length = NULL) {

             preg_match_all('/[\x01-\x7F]|[\xC0-\xDF][\x80-\xBF]|[\xE0-\xEF][\x80-\xBF][\x80-\xBF]/', $str, $arr);

             if (is_int($length))

                   return implode('', array_slice($arr[0], $start, $length));

             else

                   return implode('', array_slice($arr[0], $start));

      }

More posts

Systemd and top-level drop-ins for user units

How to have email status from ~/Maildir under motd (Ubuntu 24.04)

How to have Mutt thread together messages with the same subject, even without Re:

Crossgrading Ubuntu 18.04 to Debian 10