Unicode support
imified doesn't seem to handle unicode very well. I sent this: (hex dump follows in case help forum also doesn't do unicode properly):
ùņîćȍḓḝ C3 B9 C5 86 C3 AE C4 87 C8 8D E1 B8 93 E1 B8 9D
and received this in Adium on OSX:
ùņîć�?ḓ�? C3 B9 C5 86 C3 AE C4 87 EF BF BD 3F E1 B8 93 EF BF BD 3F (the O and E from the original word are lost)
Have sent the same representation between various Jabber clients without any mangling of character data. My bot is ***@bot.im, and you can solicit this response by sending 'unicodetest'.
Andrew
Comments are currently closed for this discussion. You can start a new one.
2 Posted by Anthony Webb on 07 Jun, 2009 02:05 PM
Thanks for the heads up andrew, we're looking into the issue now. The unicode test you wrote will come in handy as well.
Anthony Webb closed this discussion on 07 Jun, 2009 02:05 PM.
System re-opened this discussion on 07 Jun, 2009 02:44 PM
3 Posted by System on 07 Jun, 2009 02:44 PM
A Lighthouse ticket was created for this discussion
4 Posted by Andrew on 08 Jun, 2009 06:40 AM
Also worth mentioning that you can send the request directly to the backend over http:
http://turmeric.assanka.com/andrew/imified/?msg=unicodetest
Which returns valid unicode. It also works perfectly on the debug console.
Support Staff 5 Posted by Adam Kalsey on 09 Jun, 2009 04:31 AM
This is fixed in development and will be in the next release.
6 Posted by Andrew on 09 Jun, 2009 01:22 PM
Any idea when that might be? :-)
7 Posted by Dave Hoff on 09 Jun, 2009 01:25 PM
Most likely this Saturday evening Andrew. We'll let you know on the blog and twitter
8 Posted by mpahic on 09 Jun, 2009 01:54 PM
A couple of days ago I got an idea on a bot. It is a it is basically a chat room and every one receives everything the other user types. I found a lot of errors. First the croatian signs (šđčćž) are not pushed right nore are shown right if I return them (šđ�?ćž). Also html tags are not shown right, and empty messages are pushed. Bots name: ***@bot.im I tried to push messages before this update, and it worked fine for croatian signs as I can remember.
Then I tried to test the coding: echo "ŠĐČĆŽšđčćž
ŠĐČĆŽšđčćž
bold
italic";
And received:Š�?ČĆŽšđ�?ćž Š�?ČĆŽšđ�?ćž ŠŠ�?ČĆŽšđ�?ćž ŠĐŠ�?ČĆŽšđ�?ćž ŠĐČŠ�?ČĆŽšđ�?ćž ŠĐČĆŠ�?ČĆŽšđ�?ćž ŠĐČĆŽŠ�?ČĆŽšđ�?ćž ŠĐČĆŽšŠ�?ČĆŽšđ�?ćž ŠĐČĆŽšđŠ�?ČĆŽšđ�?ćž ŠĐČĆŽšđčŠ�?ČĆŽšđ�?ćž ŠĐČĆŽšđčćŠ�?ČĆŽšđ�?ćž ŠĐČĆŽšđčćž bold italic
This I tried on meebo and pidgin and the result is the same (for both MSN and jabber). Apparently html encoding breaks the text and tries to type it again. Hope I didn't give you too much to work on :/ EDIT: It looks like I can use html tags here and are not shown in the post. the text is not bolded there and there are line breaks in the code.
Support Staff 9 Posted by Adam Kalsey on 09 Jun, 2009 02:58 PM
I've confirmed that the HTML entities cause a repeating string. That's
a bug. The rest of your test message appears correctly in our
development servers. This will be in our next release, tentatively
scheduled for this weekend.
10 Posted by Andrew on 16 Jun, 2009 04:27 PM
Has this gone out yet? I'm now seeing a change in behaviour, which does not solve the problem.
Previously if I submitted URL encoded text it was decoded before transmission to the bot. Now everything is encoded. This is good, because it means my bot can tell the difference between a URL encoded input and a nonurl encoded one (my bot likes to talk about text encoding a lot!), but it would be useful to get confirmation that this change was intended.
it has not however, fixed the issue raised by this thread - that is, if you send 'uniodetest' to my bot (***@bot.im), you get an incorrectly encoded response. Any thoughts on that?
Andrew
11 Posted by Anthony Webb on 17 Jun, 2009 04:54 AM
Hi andrew, can you test this again. We've fixed a ton of unicode and encoding issues today. Wanted to know if you are good now.
Anthony Webb closed this discussion on 17 Jun, 2009 04:54 AM.
Andrew re-opened this discussion on 17 Jun, 2009 07:16 AM
12 Posted by Andrew on 17 Jun, 2009 07:16 AM
Sorry, no. Aren't you able to see the problem if you send 'unicodetest' to my bot using Adium? I you don't have Adium to hand, use Meebo - just add ***@bot.im as a friend and say 'unicodetest' to it.
Support Staff 13 Posted by Adam Kalsey on 17 Jun, 2009 07:24 AM
If I send ùņîćȍḓḝ to one of my test bots, I get a proper response. See attached image for a screenshot of it in Adium.
If I send the string
ŠĐČĆŽšđčćž<br/> ŠĐČĆŽšđčćž<br/> <b>bold</b><br/> <i>italic</i>'to the same bot, I get the results in the other attached image.Both of these results are what I would expect.
The only possible difference here is that I replaced your
<br>tags with the XHTML version<br/>. Without that replacement, I get the same text, just without the bold and italics appearing. This is because messages that aren't valid XHTML are converted to plain text.Support Staff 14 Posted by Adam Kalsey on 17 Jun, 2009 07:28 AM
Also, when I attached your bot URL to my test bot so I could watch the actual network traffic and sent "unicodetest" to it. Attached is a screenshot of what I received in Adium
Looks correct to me.
Support Staff 15 Posted by Adam Kalsey on 17 Jun, 2009 07:32 AM
Whoops, sorry. The long string of HTML wasn't yours, Andrew. That was mpahic's
Support Staff 16 Posted by Adam Kalsey on 17 Jun, 2009 07:34 AM
Whoops, sorry. The long string of HTML wasn't yours, Andrew. That was mpahic's
17 Posted by Andrew on 17 Jun, 2009 07:37 AM
Adam - do you get the expected response if you use my bot directly? I just configured it to send back whatever it receives, and I get the same problem. So that we are testing using the same environment, I've taken a screenshot of meebo, but the effect is still evident for me in Adium.
This is all my bot does:
<?php if (strpos($REQUEST['msg'], 'parrot') === 0) exit(urldecode($REQUEST['msg'])); if ($_REQUEST['msg'] == 'unicodetest') exit('Unicode test: ùņîćȍḓḝ'); ?>
18 Posted by Anthony Webb on 17 Jun, 2009 03:59 PM
I think I know the issue here, adam, you are testing the API push, he is talking about the 2 way stuff, let me take a look at fixing this.
Anthony Webb closed this discussion on 17 Jun, 2009 03:59 PM.
Andrew re-opened this discussion on 17 Jun, 2009 05:21 PM
19 Posted by Andrew on 17 Jun, 2009 05:21 PM
OK, let's be a bit more scientific about this. There are four test cases - send the following strings to ***@bot.im to replicate:
So it looks like you can achieve correct encoding on messages going from bot to user by pushing them rather than replying directly to the request.
Support Staff 20 Posted by Adam Kalsey on 17 Jun, 2009 05:27 PM
We're on it, Andrew. We have several dozen test cases for various
character encodings. Everything works fine in QA, but your specific
case is not working in production. We're currently trying to determine
if there's something different in the production systems.
21 Posted by mpahic on 22 Jun, 2009 10:59 AM
Again i found the broken text. This only happens when I use it through MSN, gtalk is fine. Also when i echo the test sting šđčćž... it is shown correctly and I have no idea why i'm getting this response message like this:
DobrodoDobrodošli na ***@bot.im!!! Samim dodavanjem ovog kontakta meDobrodošli na ***@bot.im!!! Samim dodavanjem ovog kontakta među svoje kontakte potvrDobrodošli na ***@bot.im!!! Samim dodavanjem ovog kontakta među svoje kontakte potvrđujete da se slaDobrodošli na ***@bot.im!!! Samim dodavanjem ovog kontakta među svoje kontakte potvrđujete da se slažete sa pravilima privatnosti koje moDobrodošli na ***@bot.im!!! Samim dodavanjem ovog kontakta među svoje kontakte potvrđujete da se slažete sa pravilima privatnosti koje možete dobiti utipkavanjem naredbe "privatnost". Ukoliko trebate pomo. Ukoliko trebate pomoć sa naredbama utipkajte "pomoc", a ukoliko imate primjedbe, prijedloge ili pitanje ostavite komentar na mojim stranicama http://marko.pahic.co.cc ili mi po, a ukoliko imate primjedbe, prijedloge ili pitanje ostavite komentar na mojim stranicama http://marko.pahic.co.cc ili mi pošaljite mail na ***@gmail.com
Support Staff 22 Posted by Adam Kalsey on 23 Jun, 2009 06:28 AM
Is that Hungarian? Could you send me a sample of the text you're sending that causes this?
23 Posted by mpahic on 23 Jun, 2009 11:43 AM
No, it's croatian. The text is:
Dobrodošli na ***@bot.im!!!
Samim dodavanjem ovog kontakta među svoje kontakte potvrđujete da se slažete sa pravilima privatnosti koje možete dobiti utipkavanjem naredbe "privatnost".
Ukoliko trebate pomoć sa naredbama utipkajte "pomoc", a ukoliko imate primjedbe, prijedloge ili pitanje ostavite komentar na mojim stranicama http://marko.pahic.co.cc ili mi pošaljite mail na ***@gmail.com
24 Posted by Anthony Webb on 23 Jun, 2009 03:33 PM
Hi mpahic, can you try your tests again?
Anthony Webb closed this discussion on 23 Jun, 2009 03:33 PM.