public inbox for gentoo-user@lists.gentoo.org
 help / color / mirror / Atom feed
* [gentoo-user] utf8_general_ci
@ 2015-05-05 15:32 Joseph
  2015-05-05 16:32 ` Fernando Rodriguez
  2015-05-06 22:14 ` [gentoo-user] MySQL utf8 support (was Re: utf8_general_ci) Harm Geerts
  0 siblings, 2 replies; 5+ messages in thread
From: Joseph @ 2015-05-05 15:32 UTC (permalink / raw
  To: gentoo-user

I have my mysql database "Collation" set as: utf8_general_ci

but when a customer from for example Japan places an order all I see is:

竹鼻立原町5-5

Do I need to change "Collation" setting to something else or something else?

-- 
Joseph


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [gentoo-user] utf8_general_ci
  2015-05-05 15:32 [gentoo-user] utf8_general_ci Joseph
@ 2015-05-05 16:32 ` Fernando Rodriguez
  2015-05-05 17:03   ` Joseph
  2015-05-06 22:14 ` [gentoo-user] MySQL utf8 support (was Re: utf8_general_ci) Harm Geerts
  1 sibling, 1 reply; 5+ messages in thread
From: Fernando Rodriguez @ 2015-05-05 16:32 UTC (permalink / raw
  To: gentoo-user

On Tuesday, May 05, 2015 9:32:15 AM Joseph wrote:
> I have my mysql database "Collation" set as: utf8_general_ci
> 
> but when a customer from for example Japan places an order all I see is:
> 
> 
竹鼻立原町5-5
> 
> Do I need to change "Collation" setting to something else or something else?
> 
> 

I think that's because the web applications runs the data through something 
like php's htmlspecialchars() or similar to help prevent SQL injections. So 
you'll need to either decode it before using it (I think you can use the app-
text/recode), or use a different method to filter anything that could be 
malicious SQL.



-- 
Fernando Rodriguez


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [gentoo-user] utf8_general_ci
  2015-05-05 16:32 ` Fernando Rodriguez
@ 2015-05-05 17:03   ` Joseph
  2015-05-05 17:53     ` Fernando Rodriguez
  0 siblings, 1 reply; 5+ messages in thread
From: Joseph @ 2015-05-05 17:03 UTC (permalink / raw
  To: gentoo-user

On 05/05/15 12:32, Fernando Rodriguez wrote:
>On Tuesday, May 05, 2015 9:32:15 AM Joseph wrote:
>> I have my mysql database "Collation" set as: utf8_general_ci
>>
>> but when a customer from for example Japan places an order all I see is:
>>
>>
>竹鼻立原町5-5
>>
>> Do I need to change "Collation" setting to something else or something else?
>>
>>
>
>I think that's because the web applications runs the data through something
>like php's htmlspecialchars() or similar to help prevent SQL injections. So
>you'll need to either decode it before using it (I think you can use the app-
>text/recode), or use a different method to filter anything that could be
>malicious SQL.

I've saved the relevant information into a TXT file (address.txt) and tried to run: recode ISO-8859-9..UTF8 < address.txt > address2.txt

&amp;#31481;&amp;#40763;&amp;#31435;&amp;#21407;&amp;#30010;&amp;#65301;&amp;#65293;&amp;#65301;
&amp;#23665;&amp;#31185;&amp;#21306;
&amp;#20140;&amp;#37117;&amp;#24066;, 601-8015
&amp;#20140;&amp;#37117;&amp;#24220;, Japan

It didn't help.  How do you run "recode" correctly?
Yes, the customer is using oscommerce php addlication to provide information.

-- 
Joseph


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [gentoo-user] utf8_general_ci
  2015-05-05 17:03   ` Joseph
@ 2015-05-05 17:53     ` Fernando Rodriguez
  0 siblings, 0 replies; 5+ messages in thread
From: Fernando Rodriguez @ 2015-05-05 17:53 UTC (permalink / raw
  To: gentoo-user

[-- Attachment #1: Type: text/plain, Size: 1567 bytes --]

On Tuesday, May 05, 2015 11:03:38 AM Joseph wrote:
> On 05/05/15 12:32, Fernando Rodriguez wrote:
> >On Tuesday, May 05, 2015 9:32:15 AM Joseph wrote:
> >> I have my mysql database "Collation" set as: utf8_general_ci
> >>
> >> but when a customer from for example Japan places an order all I see is:
> >>
> >>
> 
>&amp;#31481;&amp;#40763;&amp;#31435;&amp;#21407;&amp;#30010;&amp;#65301;&amp;#65293;&amp;#65301;
> >>
> >> Do I need to change "Collation" setting to something else or something 
else?
> >>
> >>
> >
> >I think that's because the web applications runs the data through something
> >like php's htmlspecialchars() or similar to help prevent SQL injections. So
> >you'll need to either decode it before using it (I think you can use the 
app-
> >text/recode), or use a different method to filter anything that could be
> >malicious SQL.
> 
> I've saved the relevant information into a TXT file (address.txt) and tried 
to run: recode ISO-8859-9..UTF8 < address.txt > address2.txt
> 
> 
&amp;#31481;&amp;#40763;&amp;#31435;&amp;#21407;&amp;#30010;&amp;#65301;&amp;#65293;&amp;#65301;
> &amp;#23665;&amp;#31185;&amp;#21306;
> &amp;#20140;&amp;#37117;&amp;#24066;, 601-8015
> &amp;#20140;&amp;#37117;&amp;#24220;, Japan
> 
> It didn't help.  How do you run "recode" correctly?
> Yes, the customer is using oscommerce php addlication to provide 
information.

It looks like they ran it through the encoding function twice. This worked for 
me:

recode html..utf8 < test.txt | recode html..utf8


-- 
Fernando Rodriguez

[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [gentoo-user] MySQL utf8 support (was Re: utf8_general_ci)
  2015-05-05 15:32 [gentoo-user] utf8_general_ci Joseph
  2015-05-05 16:32 ` Fernando Rodriguez
@ 2015-05-06 22:14 ` Harm Geerts
  1 sibling, 0 replies; 5+ messages in thread
From: Harm Geerts @ 2015-05-06 22:14 UTC (permalink / raw
  To: gentoo-user

On Tuesday 05 May 2015 09:32:15 Joseph wrote:
> I have my mysql database "Collation" set as: utf8_general_ci
> 
> but when a customer from for example Japan places an order all I see is:
> 
> &amp;#31481;&amp;#40763;&amp;#31435;&amp;#21407;&amp;#30010;&amp;#65301;&amp
> ;#65293;&amp;#65301;
> 
> Do I need to change "Collation" setting to something else or something else?

I'm not sure which character codes are used for Japanese but it's worth noting 
that mysql's utf8 encoding is a partial implementation which only supports 3 
bytes per character.

For full utf8 support you'll need to use the utf8mb4 encoding.

https://dev.mysql.com/doc/refman/5.7/en/charset-unicode-utf8mb4.html
https://mathiasbynens.be/notes/mysql-utf8mb4

Note that this is not relevant to your problem which is covered by 
Fernando Rodriguez' reply.


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2015-05-06 22:14 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-05-05 15:32 [gentoo-user] utf8_general_ci Joseph
2015-05-05 16:32 ` Fernando Rodriguez
2015-05-05 17:03   ` Joseph
2015-05-05 17:53     ` Fernando Rodriguez
2015-05-06 22:14 ` [gentoo-user] MySQL utf8 support (was Re: utf8_general_ci) Harm Geerts

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox