Please note that utf8_encode only converts a string encoded in ISO-8859-1 to UTF-8. A more appropriate name for it would be "iso88591_to_utf8". If your text is not encoded in ISO-8859-1, you do not need this function. If your text is already in UTF-8, you do not need this function. In fact, applying this function to text that is not encoded in ISO-8859-1 will most likely simply garble that text.
If you need to convert text from any encoding to any other encoding, look at iconv() instead.utf8_encode
(PHP 4, PHP 5, PHP 7, PHP 8)
utf8_encode — Converts a string from ISO-8859-1 to UTF-8
This function has been DEPRECATED as of PHP 8.2.0. Relying on this function is highly discouraged.
Description
This function converts the string string from the
ISO-8859-1 encoding to UTF-8.
Note:
This function does not attempt to guess the current encoding of the provided string, it assumes it is encoded as ISO-8859-1 (also known as "Latin 1") and converts to UTF-8. Since every sequence of bytes is a valid ISO-8859-1 string, this never results in an error, but will not result in a useful string if a different encoding was intended.
Many web pages marked as using the
ISO-8859-1character encoding actually use the similarWindows-1252encoding, and web browsers will interpretISO-8859-1web pages asWindows-1252.Windows-1252features additional printable characters, such as the Euro sign (€) and curly quotes (“”), instead of certainISO-8859-1control characters. This function will not convert suchWindows-1252characters correctly. Use a different function ifWindows-1252conversion is required.
Parameters
string-
An ISO-8859-1 string.
Return Values
Returns the UTF-8 translation of string.
Changelog
| Version | Description |
|---|---|
| 8.2.0 | This function has been deprecated. |
| 7.2.0 | This function has been moved from the XML extension to the core of PHP. In previous versions, it was only available if the XML extension was installed. |
Examples
Example #1 Basic example
<?php
// Convert the string 'Zoë' from ISO 8859-1 to UTF-8
$iso8859_1_string = "\x5A\x6F\xEB";
$utf8_string = utf8_encode($iso8859_1_string);
echo bin2hex($utf8_string), "\n";
?>The above example will output:
5a6fc3ab
Notes
Note: Deprecation and alternatives
This function is deprecated as of PHP 8.2.0, and will be removed in a future version. Existing uses should be checked and replaced with appropriate alternatives.
Similar functionality can be achieved with mb_convert_encoding(), which supports ISO-8859-1 and many other character encodings.
<?php $iso8859_1_string = "\xEB"; // 'ë' (e with diaeresis) in ISO-8859-1 $utf8_string = mb_convert_encoding($iso8859_1_string, 'UTF-8', 'ISO-8859-1'); echo bin2hex($utf8_string), "\n"; $iso8859_7_string = "\xEB"; // the same string in ISO-8859-7 represents 'λ' (Greek lower-case lambda) $utf8_string = mb_convert_encoding($iso8859_7_string, 'UTF-8', 'ISO-8859-7'); echo bin2hex($utf8_string), "\n"; $windows_1252_string = "\x80"; // '€' (Euro sign) in Windows-1252, but not in ISO-8859-1 $utf8_string = mb_convert_encoding($windows_1252_string, 'UTF-8', 'Windows-1252'); echo bin2hex($utf8_string), "\n"; ?>The above example will output:
c3ab cebb e282acOther options which may be available depending on the extensions installed are UConverter::transcode() and iconv().
The following all give the same result:
<?php $iso8859_1_string = "\x5A\x6F\xEB"; // 'Zoë' in ISO-8859-1 $utf8_string = utf8_encode($iso8859_1_string); echo bin2hex($utf8_string), "\n"; $utf8_string = mb_convert_encoding($iso8859_1_string, 'UTF-8', 'ISO-8859-1'); echo bin2hex($utf8_string), "\n"; $utf8_string = UConverter::transcode($iso8859_1_string, 'UTF8', 'ISO-8859-1'); echo bin2hex($utf8_string), "\n"; $utf8_string = iconv('ISO-8859-1', 'UTF-8', $iso8859_1_string); echo bin2hex($utf8_string), "\n"; ?>The above example will output:
5a6fc3ab 5a6fc3ab 5a6fc3ab 5a6fc3ab
See Also
- utf8_decode() - Converts a string from UTF-8 to ISO-8859-1, replacing invalid or unrepresentable characters
- mb_convert_encoding() - Convert a string from one character encoding to another
- UConverter::transcode() - Convert a string from one character encoding to another
- iconv() - Convert a string from one character encoding to another
User Contributed Notes 3 notes
Here's some code that addresses the issue that Steven describes in the previous comment;
<?php
/* This structure encodes the difference between ISO-8859-1 and Windows-1252,
as a map from the UTF-8 encoding of some ISO-8859-1 control characters to
the UTF-8 encoding of the non-control characters that Windows-1252 places
at the equivalent code points. */
$cp1252_map = array(
"\xc2\x80" => "\xe2\x82\xac", /* EURO SIGN */
"\xc2\x82" => "\xe2\x80\x9a", /* SINGLE LOW-9 QUOTATION MARK */
"\xc2\x83" => "\xc6\x92", /* LATIN SMALL LETTER F WITH HOOK */
"\xc2\x84" => "\xe2\x80\x9e", /* DOUBLE LOW-9 QUOTATION MARK */
"\xc2\x85" => "\xe2\x80\xa6", /* HORIZONTAL ELLIPSIS */
"\xc2\x86" => "\xe2\x80\xa0", /* DAGGER */
"\xc2\x87" => "\xe2\x80\xa1", /* DOUBLE DAGGER */
"\xc2\x88" => "\xcb\x86", /* MODIFIER LETTER CIRCUMFLEX ACCENT */
"\xc2\x89" => "\xe2\x80\xb0", /* PER MILLE SIGN */
"\xc2\x8a" => "\xc5\xa0", /* LATIN CAPITAL LETTER S WITH CARON */
"\xc2\x8b" => "\xe2\x80\xb9", /* SINGLE LEFT-POINTING ANGLE QUOTATION */
"\xc2\x8c" => "\xc5\x92", /* LATIN CAPITAL LIGATURE OE */
"\xc2\x8e" => "\xc5\xbd", /* LATIN CAPITAL LETTER Z WITH CARON */
"\xc2\x91" => "\xe2\x80\x98", /* LEFT SINGLE QUOTATION MARK */
"\xc2\x92" => "\xe2\x80\x99", /* RIGHT SINGLE QUOTATION MARK */
"\xc2\x93" => "\xe2\x80\x9c", /* LEFT DOUBLE QUOTATION MARK */
"\xc2\x94" => "\xe2\x80\x9d", /* RIGHT DOUBLE QUOTATION MARK */
"\xc2\x95" => "\xe2\x80\xa2", /* BULLET */
"\xc2\x96" => "\xe2\x80\x93", /* EN DASH */
"\xc2\x97" => "\xe2\x80\x94", /* EM DASH */
"\xc2\x98" => "\xcb\x9c", /* SMALL TILDE */
"\xc2\x99" => "\xe2\x84\xa2", /* TRADE MARK SIGN */
"\xc2\x9a" => "\xc5\xa1", /* LATIN SMALL LETTER S WITH CARON */
"\xc2\x9b" => "\xe2\x80\xba", /* SINGLE RIGHT-POINTING ANGLE QUOTATION*/
"\xc2\x9c" => "\xc5\x93", /* LATIN SMALL LIGATURE OE */
"\xc2\x9e" => "\xc5\xbe", /* LATIN SMALL LETTER Z WITH CARON */
"\xc2\x9f" => "\xc5\xb8" /* LATIN CAPITAL LETTER Y WITH DIAERESIS*/
);
function cp1252_to_utf8($str) {
global $cp1252_map;
return strtr(utf8_encode($str), $cp1252_map);
}
?>If you haven't guessed already: If the UTF-8 character has no representation in the ISO-8859-1 codepage, a ? will be returned. You might want to wrap a function around this to make sure you aren't saving a bunch of ???? into your database.