PDA

View Full Version : RSS, UTF-8 - embedding problems - need help!


Tini72
7th of April 2008 (Mon), 14:42
I managed running exibition engine with UTF-8 encoding - this works within EE quite well including Umlaute.

But now I tried embedding the rss feed to my frontpage (which isn't EE but run with the cms Drupal) and there I discovered an encoding problem. Drupal is set up in UTF-8 as well, so there shouldn't be an encoding problem - but there is: All words from the EE rss feed (latest exhibitions) display strange characters - they doesn't seem to be neither UTF-8 nor ISO-8859-1 ( I tried to swith the encoding manually via the browser interface but the characters are displayed incorrectly both with UTF-8 and ISO-8859-1). The titles are displayed correctly within EE as exhibition names (UTF-8). I wonder what is going wrong? Can't you embedd EE rss feeds into other pages/sites as it is possible with other rss feeds? Is EE really using UTF-8 or some pseudo UTF-8?

The display of the embedded RSS feeds results in chararacters like this (UTF-8):

Latest Galleries
Potsdam
Schlösser und Herrenhäuser
Häuser und Gebäude

when set to ISO-8859-1 it looks like this
Schlösser und Herrenhäuser
Häuser und Gebäude

Unfortunately I can't use the snippets since they cause problems with Drupal (probably due to some variable names used by EE and Drupal).

By the way for embedding the ee rss feed I tried MAGPIE rss/php parser.

MMCM
8th of April 2008 (Tue), 08:43
It looks like the rss feed parser does assume your feed is iso-8859-1 and converts the code from iso-8859-1 to utf8, even if it is already utf8.
I converted the wrong text using iconv from utf-8 to iso-8859-1 and then it's displayed correctly with utf8.


# iconv -f utf-8 -t iso8859-1
Schlösser und Herrenhäuser
Häuser und Gebäude


Output:

Schlösser und Herrenhäuser
Häuser und Gebäude


Maybe you can adjust rss properties/parameters of your parser?

best regards
Martin

Tini72
8th of April 2008 (Tue), 08:57
I don't think it's the parser. Because this problem also occurs when I try to call the rss directly. I have tried it with the rss reader sage (Firefox) and there it is displayed the same incorrect way. Because of that I suppose it has to do with the rss code of EE. But I have no idea what is going wrong.

Tini72
8th of April 2008 (Tue), 15:40
Some hours of thinking and trying let me to the solution. As I supposed the problem was caused by the code of EE. It is possible to run EE with UTF-8 and ISO 8859-1 encoding. But the rss part of EE doesn't check which encoding is used. So if you run EE in UTF-8 encoding already EE apperently tries to encode your text again to UTF-8 as it expects ISO 8859-1 encoding as standard. This probably should be changed for the next version of EE.

Meanwhile I could solve the problem by changing following line within the basecode/SCRIPT_rss2.php
ob_start();
$description = strip_tags(utf8_encode(htmlspecialchars($latest_ex hibitions_html)));to
ob_start();
$description = strip_tags(htmlspecialchars($latest_exhibitions_ht ml));