Welcome to dbForumz.com!
FAQFAQ    SearchSearch      ProfileProfile    Private MessagesPrivate Messages   Log inLog in

Encoding conversion problem

 
Goto page Previous  1, 2
   Database Forums (Home) -> Java RSS
Next:  Database tampering check  
Author Message
Roedy Green

External


Since: Nov 04, 2007
Posts: 35



(Msg. 16) Posted: Wed Feb 13, 2008 4:36 pm
Post subject: Re: Encoding conversion problem [Login to view extended thread Info.]
Archived from groups: comp>lang>java>databases (more info?)

On Wed, 13 Feb 2008 06:22:22 -0800 (PST), Andrea
wrote, quoted or indirectly quoted
someone who said :

>Hi Silvio,
>the settings are taken from the DB2 instance of a customer (and I
>can't change them). The very same code works, of course, without
>problems with a DB2 instance configured with ISO-8859-1.

Is is possible to ask the database driver to do the conversions for
you? Perhaps internally it is Unicode or some other encoding that can
deal with Euros. We have the clue that C++ programs seem to store
euro s and get them back out.

I am puzzled. I thought JDBC always talked to you in Unicode.
--

Roedy Green Canadian Mind Products
The Java Glossary
http://mindprod.com

 >> Stay informed about: Encoding conversion problem 
Back to top
Login to vote
Roedy Green

External


Since: Nov 04, 2007
Posts: 35



(Msg. 17) Posted: Wed Feb 13, 2008 4:38 pm
Post subject: Re: Encoding conversion problem [Login to view extended thread Info.]
Archived from groups: per prev. post (more info?)

On Wed, 13 Feb 2008 16:39:59 +0100, Silvio Bierman
wrote, quoted or indirectly quoted
someone who said :

>I do not really understand why a Euro sign would work with 8859-1 since
>that does not contain that character as far as I am aware of.

You could do an experiment. Try feeding your database all possible
unicode chars in a set of 1-char records, and see which ones come back
unmangled. This is a kludge, but you could preconvert your Euro to
one of those invariant unused chars.
--

Roedy Green Canadian Mind Products
The Java Glossary
http://mindprod.com

 >> Stay informed about: Encoding conversion problem 
Back to top
Login to vote
Andrea

External


Since: Dec 19, 2007
Posts: 12



(Msg. 18) Posted: Thu Feb 14, 2008 8:00 am
Post subject: Re: Encoding conversion problem [Login to view extended thread Info.]
Archived from groups: per prev. post (more info?)

Hi everyone,
sorry for my previous double-post (a mistake).

>Is is possible to ask the database driver to do the conversions for
>you? Perhaps internally it is Unicode or some other encoding that can
>deal with Euros.
I've checked the properties of the JDBC driver I use (http://
publib.boulder.ibm.com/infocenter/db2luw/v8/index.jsp?topic=/
com.ibm.db2.udb.doc/ad/rjvdsprp.htm) but there's nothing concerning
encoding conversions.

>We have the clue that C++ programs seem to store euro s and get them back out.
Yes we have C and COBOL programs that can store and write non-IBM850
chars without problems too.
As pointed out by Sabine in her post the reason may be that C programs
work with the pure sequences of bytes, without performing any encoding
conversion.

>>I do not really understand why a Euro sign would work with 8859-1 since
>>that does not contain that character as far as I am aware of.

SORRY SORRY SORRY SORRY SORRY
I tried to insert (through JDBC) the EURO character in a DB2
configured with
....
Database territory = C
Database code page = 819
Database code set = ISO8859-1
....
and I can't neither write nor read in Java the EURO character
correctly Sad
A COBOL program works instead correctly.

Then I tried the same thing on a SQL-Server 2000 instance with
collation compatibility_51_409_30003 (correponding to a 1252 codepage,
i.e. Latin 1) and I can store and read the EURO character via
Java&JDBC.

That doesn't work in Java with Oracle 10g configured with
....
NLS_LANGUAGE = AMERICAN
NLS_TERRITORY = AMERICA
NLS_CHARACTERSET = US7ASCII
NLS_LENGTH_SEMANTICS = BYTE
....
store&read through COBOL is ok, and in Java I can even write&read
accented vowels... even if those characters are outside USASCII7...

>You could do an experiment. Try feeding your database all possible
>unicode chars in a set of 1-char records, and see which ones come back
>unmangled. This is a kludge, but you could preconvert your Euro to
>one of those invariant unused chars.
The EURO character is just an example and part of the problem, I can't
use this type of kludges.
The specific problem is much more complex: a password is crypted and
stored to DB with a C program but the crypted chars fall outside
IBM850 range and in Java I'm unable to read and decrypt back the
string... this works if the database is ISO-8859-1 (that's why I
though I were able to write another 'weird' char, the euro char, on an
ISO-8859-1 DB, sorry...). I've also the more general problem of data
entry: I don't know wich characters users will insert so I can't
substitute chars.
I've found a workaround for my crypting problem but I'm just trying to
understand the reason of the problem.

Now it's clear to me that with a CHAR field Java performs an encoding
conversion using the encodings of the JVM and of the DBMS: if some
characters fall outside the destination encoding then they are lost
(i.e. converted in something completely different).
The only 'mysterious' thing for me now is the behavior on Oracle (JDBC
can read&write accented vowels even if they are outside ascii7)... any
idea? Maybe the Oracle driver is smarter than the DB2 Universal
Driver...

Thanks everyone,
Andrea
 >> Stay informed about: Encoding conversion problem 
Back to top
Login to vote
Lothar Kimmeringer

External


Since: Nov 06, 2007
Posts: 3



(Msg. 19) Posted: Thu Feb 14, 2008 5:01 pm
Post subject: Re: Encoding conversion problem [Login to view extended thread Info.]
Archived from groups: per prev. post (more info?)

Andrea wrote:

>> What is source? How do you create that from the JDBC-
>> resultset?
>
> I tried:
> InputStream source = new
> ByteArrayInputStream(stringFetchedFromDB.getBytes());

getBytes() uses the system-encoding for generating the
byte-array. Why do you generate an InputStream anyway?

What you want to do is
OutputStreamWriter osw = new OutputStreamWriter(
output, "8859_1");
osw.write(resultset.getString("mycolumn"));

BTW: The Euro is not part of ISO-8859-1, so it will get
lost that way anyway.


Regards, Lothar
--
Lothar Kimmeringer E-Mail: spamfang.RemoveThis@kimmeringer.de
PGP-encrypted mails preferred (Key-ID: 0x8BC3CD81)

Always remember: The answer is forty-two, there can only be wrong
questions!
 >> Stay informed about: Encoding conversion problem 
Back to top
Login to vote
Silvio Bierman

External


Since: Sep 27, 2007
Posts: 10



(Msg. 20) Posted: Thu Feb 14, 2008 5:01 pm
Post subject: Re: Encoding conversion problem [Login to view extended thread Info.]
Archived from groups: per prev. post (more info?)

Andrea wrote:
> Hi everyone,
> sorry for my previous double-post (a mistake).
>
>> Is is possible to ask the database driver to do the conversions for
>> you? Perhaps internally it is Unicode or some other encoding that can
>> deal with Euros.
> I've checked the properties of the JDBC driver I use (http://
> publib.boulder.ibm.com/infocenter/db2luw/v8/index.jsp?topic=/
> com.ibm.db2.udb.doc/ad/rjvdsprp.htm) but there's nothing concerning
> encoding conversions.
>
>> We have the clue that C++ programs seem to store euro s and get them back out.
> Yes we have C and COBOL programs that can store and write non-IBM850
> chars without problems too.
> As pointed out by Sabine in her post the reason may be that C programs
> work with the pure sequences of bytes, without performing any encoding
> conversion.
>
>>> I do not really understand why a Euro sign would work with 8859-1 since
>>> that does not contain that character as far as I am aware of.
>
> SORRY SORRY SORRY SORRY SORRY
> I tried to insert (through JDBC) the EURO character in a DB2
> configured with
> ...
> Database territory = C
> Database code page = 819
> Database code set = ISO8859-1
> ...
> and I can't neither write nor read in Java the EURO character
> correctly Sad
> A COBOL program works instead correctly.
>
> Then I tried the same thing on a SQL-Server 2000 instance with
> collation compatibility_51_409_30003 (correponding to a 1252 codepage,
> i.e. Latin 1) and I can store and read the EURO character via
> Java&JDBC.
>
> That doesn't work in Java with Oracle 10g configured with
> ...
> NLS_LANGUAGE = AMERICAN
> NLS_TERRITORY = AMERICA
> NLS_CHARACTERSET = US7ASCII
> NLS_LENGTH_SEMANTICS = BYTE
> ...
> store&read through COBOL is ok, and in Java I can even write&read
> accented vowels... even if those characters are outside USASCII7...
>
>> You could do an experiment. Try feeding your database all possible
>> unicode chars in a set of 1-char records, and see which ones come back
>> unmangled. This is a kludge, but you could preconvert your Euro to
>> one of those invariant unused chars.
> The EURO character is just an example and part of the problem, I can't
> use this type of kludges.
> The specific problem is much more complex: a password is crypted and
> stored to DB with a C program but the crypted chars fall outside
> IBM850 range and in Java I'm unable to read and decrypt back the
> string... this works if the database is ISO-8859-1 (that's why I
> though I were able to write another 'weird' char, the euro char, on an
> ISO-8859-1 DB, sorry...). I've also the more general problem of data
> entry: I don't know wich characters users will insert so I can't
> substitute chars.
> I've found a workaround for my crypting problem but I'm just trying to
> understand the reason of the problem.
>
> Now it's clear to me that with a CHAR field Java performs an encoding
> conversion using the encodings of the JVM and of the DBMS: if some
> characters fall outside the destination encoding then they are lost
> (i.e. converted in something completely different).
> The only 'mysterious' thing for me now is the behavior on Oracle (JDBC
> can read&write accented vowels even if they are outside ascii7)... any
> idea? Maybe the Oracle driver is smarter than the DB2 Universal
> Driver...
>
> Thanks everyone,
> Andrea


Hello Andrea,

Even if you set a database encoding to ASCII it is very unlikely that
the DB will strip non-ASCII characters. Actually, most databases treat
every byte-size (ie 8-bit) encoding almost identically internally. They
may sometimes have different default collations but that is about it.
The codepage attribute is mostly important for programs interfacing with
the DB. As most of those (especially older ones) are encoding unaware
also bytes pass in and out inharmed. In the end all 8-bit encodings are
equal until actually interpreted to represent characters, aren't they?

I have seen application running on cp-1252 platforms using 8859-1
encoded databases for years without anyone noticing. Same for cp-1257 on
a cp-1252 database. Nobody realy cares when the same data that was put
in comes out again.

This is not unlike SMTP which is supposed to be 7-bit only but since the
transport encoding passes 8-bit characters freely people are used to
sending non-ascii characters in plain-text emails although this is not
supported. This all works great until someone from Lithuania sends me an
email (I am in the Netherlands).

Regards,

Silvio
 >> Stay informed about: Encoding conversion problem 
Back to top
Login to vote
Andrea

External


Since: Dec 19, 2007
Posts: 12



(Msg. 21) Posted: Fri Feb 15, 2008 3:26 am
Post subject: Re: Encoding conversion problem [Login to view extended thread Info.]
Archived from groups: per prev. post (more info?)

Hi Lothar,
>> I tried:
>> InputStream source = new
>> ByteArrayInputStream(stringFetchedFromDB.getBytes());
>getBytes() uses the system-encoding for generating the
>byte-array. Why do you generate an InputStream anyway?
I was just a "desperate" attempt... Smile
I tried almost everything...

>What you want to do is
>OutputStreamWriter osw = new OutputStreamWriter(output, "8859_1");
>osw.write(resultset.getString("mycolumn"));
doesn't work...

Hi Silvio,
>Hello Andrea,
>Even if you set a database encoding to ASCII it is very unlikely that
>the DB will strip non-ASCII characters.
> ....
Yes now this is clear to me, thanks!

I was thinking only about the DB encoding while the problem is mainly
in the JVM encoding (now it's clear to me that Java can't handle
characters outside the encoding of the JVM, I wasn't thinking about
it, sorry...).

I've made another test: I've exported the content of the table with
the crypted password and I've found that the password I can't decrypt
back contains characters between 0x80 and 0x9F, which are control
characters in ISO-8859-1 and Java
- reads garbage with DB2 configured with IBM850 and JVM ISO-8859-1
- reads correctly with both DB2 and JVM configured with ISO-8859-1

I understand the first behavior but the last point is strange... Java
(with some "magic" Smile is doing the proper conversion if the db is
iso-8859-1 but I can't understand how... I will test it again and let
you know if I find something.

Thanks again everyone!

Andrea
 >> Stay informed about: Encoding conversion problem 
Back to top
Login to vote
Andrea

External


Since: Dec 19, 2007
Posts: 12



(Msg. 22) Posted: Fri Feb 15, 2008 6:51 am
Post subject: Re: Encoding conversion problem [Login to view extended thread Info.]
Archived from groups: per prev. post (more info?)

Hi Sabine,
>I'm guessing, but maybe if the databse tells the JDBC driver it's
>ISO-8859-1 *and* your application tells it the same encoding, it won't
>bother trying to transform anything...
Yes that's what I was thinking too... but I tried to change the
encoding of the JVM (tried Cp850, ...) but it keeps on working...

Hi Lew,
>> I was thinking only about the DB encoding while the problem is mainly
>> in the JVM encoding (now it's clear to me that Java can't handle
>> characters outside the encoding of the JVM, I wasn't thinking about
>> it, sorry...).
>"The encoding of the JVM" is Unicode-16 with surrogate pairs; every Unicode
>character is representable in the JVM, including the Euro character. There is
>no Unicode character that the JVM cannot represent.
With "encoding of the JVM" I was referring to the file.encoding
property used by the JVM. If the JVM runs with:
- ISO-8859-1 then I can't read or write the EURO character to DB (it
becomes garbage) and ISO-8859-1 doesn't include that character;
- Cp1252 then I can read and write the EURO character to DB and Cp1252
includes that character.

Andrea
 >> Stay informed about: Encoding conversion problem 
Back to top
Login to vote
Lew

External


Since: Aug 30, 2007
Posts: 57



(Msg. 23) Posted: Fri Feb 15, 2008 8:10 am
Post subject: Re: Encoding conversion problem [Login to view extended thread Info.]
Archived from groups: per prev. post (more info?)

Andrea wrote:
> I was thinking only about the DB encoding while the problem is mainly
> in the JVM encoding (now it's clear to me that Java can't handle
> characters outside the encoding of the JVM, I wasn't thinking about
> it, sorry...).

"The encoding of the JVM" is Unicode-16 with surrogate pairs; every Unicode
character is representable in the JVM, including the Euro character. There is
no Unicode character that the JVM cannot represent.

--
Lew
 >> Stay informed about: Encoding conversion problem 
Back to top
Login to vote
Andrea

External


Since: Dec 19, 2007
Posts: 12



(Msg. 24) Posted: Fri Feb 15, 2008 9:02 am
Post subject: Re: Encoding conversion problem [Login to view extended thread Info.]
Archived from groups: per prev. post (more info?)

Hi Lew,

> I uinderstand my confusion now - it stemmed from the phrase "the encoding of
> the JVM". The JVM itself only uses one encoding; it translates to and from
> other encoding on I/O. So to make sure I understood you correctly, were you
> referring to the encoding specified by the I/O call?
Maybe I don't understand Sad
In my posts I tried to specify the encoding of the DBMS and the JVM
encoding (i.e. the system property file.encoding) in the different
cases and, as you stated, the JVM performs the necessary translations.
In my JDBC calls I don't specify/force any encoding.


> Generally if the encoding you specify for I/O is different from the encoding
> in your data store, it will cause trouble. This is not limited to Java. Over
> in the Postgres newsgroups one finds people have trouble with character
> encoding from all sorts of platforms, mostly stemming from trying to store
> characters in a column that are not part of the specified character encoding
> for the DB. If such things don't match, then problems will hatch.
I disagree...
For our application we keep the DBMS with a fixed encoding and the
application performs the necessary conversions. For instance for
polish installations we use an ISO-8859-1 database and an application
server configured with ISO-8859-2 where we store polish characters
without problems.

Andrea
 >> Stay informed about: Encoding conversion problem 
Back to top
Login to vote
Lew

External


Since: Aug 30, 2007
Posts: 57



(Msg. 25) Posted: Fri Feb 15, 2008 10:00 am
Post subject: Re: Encoding conversion problem [Login to view extended thread Info.]
Archived from groups: per prev. post (more info?)

Andrea wrote:
> Hi Sabine,
>> I'm guessing, but maybe if the databse tells the JDBC driver it's
>> ISO-8859-1 *and* your application tells it the same encoding, it won't
>> bother trying to transform anything...
> Yes that's what I was thinking too... but I tried to change the
> encoding of the JVM (tried Cp850, ...) but it keeps on working...
>
> Hi Lew,
>>> I was thinking only about the DB encoding while the problem is mainly
>>> in the JVM encoding (now it's clear to me that Java can't handle
>>> characters outside the encoding of the JVM, I wasn't thinking about
>>> it, sorry...).
>> "The encoding of the JVM" is Unicode-16 with surrogate pairs; every Unicode
>> character is representable in the JVM, including the Euro character. There is
>> no Unicode character that the JVM cannot represent.
> With "encoding of the JVM" I was referring to the file.encoding
> property used by the JVM. If the JVM runs with:
> - ISO-8859-1 then I can't read or write the EURO character to DB (it
> becomes garbage) and ISO-8859-1 doesn't include that character;
> - Cp1252 then I can read and write the EURO character to DB and Cp1252
> includes that character.

I uinderstand my confusion now - it stemmed from the phrase "the encoding of
the JVM". The JVM itself only uses one encoding; it translates to and from
other encoding on I/O. So to make sure I understood you correctly, were you
referring to the encoding specified by the I/O call?

Generally if the encoding you specify for I/O is different from the encoding
in your data store, it will cause trouble. This is not limited to Java. Over
in the Postgres newsgroups one finds people have trouble with character
encoding from all sorts of platforms, mostly stemming from trying to store
characters in a column that are not part of the specified character encoding
for the DB. If such things don't match, then problems will hatch.

--
Lew
 >> Stay informed about: Encoding conversion problem 
Back to top
Login to vote
Sabine Dinis Blochberger

External


Since: Apr 24, 2007
Posts: 7



(Msg. 26) Posted: Fri Feb 15, 2008 12:02 pm
Post subject: Re: Encoding conversion problem [Login to view extended thread Info.]
Archived from groups: per prev. post (more info?)

Andrea wrote:

>
> I've made another test: I've exported the content of the table with
> the crypted password and I've found that the password I can't decrypt
> back contains characters between 0x80 and 0x9F, which are control
> characters in ISO-8859-1 and Java
> - reads garbage with DB2 configured with IBM850 and JVM ISO-8859-1
> - reads correctly with both DB2 and JVM configured with ISO-8859-1
>
> I understand the first behavior but the last point is strange... Java
> (with some "magic" Smile is doing the proper conversion if the db is
> iso-8859-1 but I can't understand how... I will test it again and let
> you know if I find something.
>
> Thanks again everyone!
>
> Andrea
>
I'm guessing, but maybe if the databse tells the JDBC driver it's
ISO-8859-1 *and* your application tells it the same encoding, it won't
bother trying to transform anything...

--
Sabine Dinis Blochberger

Op3racional
www.op3racional.eu
 >> Stay informed about: Encoding conversion problem 
Back to top
Login to vote
Silvio Bierman

External


Since: Sep 27, 2007
Posts: 10



(Msg. 27) Posted: Fri Feb 15, 2008 6:16 pm
Post subject: Re: Encoding conversion problem [Login to view extended thread Info.]
Archived from groups: per prev. post (more info?)

Andrea wrote:
> Hi Lew,
>
>> I uinderstand my confusion now - it stemmed from the phrase "the encoding of
>> the JVM". The JVM itself only uses one encoding; it translates to and from
>> other encoding on I/O. So to make sure I understood you correctly, were you
>> referring to the encoding specified by the I/O call?
> Maybe I don't understand Sad
> In my posts I tried to specify the encoding of the DBMS and the JVM
> encoding (i.e. the system property file.encoding) in the different
> cases and, as you stated, the JVM performs the necessary translations.
> In my JDBC calls I don't specify/force any encoding.
>
>
>> Generally if the encoding you specify for I/O is different from the encoding
>> in your data store, it will cause trouble. This is not limited to Java. Over
>> in the Postgres newsgroups one finds people have trouble with character
>> encoding from all sorts of platforms, mostly stemming from trying to store
>> characters in a column that are not part of the specified character encoding
>> for the DB. If such things don't match, then problems will hatch.
> I disagree...
> For our application we keep the DBMS with a fixed encoding and the
> application performs the necessary conversions. For instance for
> polish installations we use an ISO-8859-1 database and an application
> server configured with ISO-8859-2 where we store polish characters
> without problems.
>
> Andrea

Hello Andrea,

Using an incomplete database encoding that does not match the
application is a dangerous practice that only works by a coincidence. It
is just because the encodings you mention are 8-bit complete that you
can pass in and out 8-bit values untouched, even though they really can
mean something different inside the DB from how you interpret them in
your application.
With little exception it is never a good idea to use a database encoding
other then a complete encoding like UTF-8 (to be honest you should
always use this in a DB if you are left the choice).
If the applications are Java based you will have a perfect match that
way and only when you do stuff like generating emails, plain-text files
or Office documents (shivers) where incomplete encodings (code pages)
come into the game you have to take extra measures. Using the platform
encoding then is usually a good idea unless it is a server application
in which case you have to make an educated guess.

Regards,

Silvio
 >> Stay informed about: Encoding conversion problem 
Back to top
Login to vote
Display posts from previous:   
Related Topics:
Character encoding problem Using Java 1.4.2 and Mysql 5.0 - Hi All, I am facing a problem for Character encoding that just made me crazy. Our user's copies text from different sites and pasting it . I m using jboss 3.2.3 and embedded tomcat 4.1.29 and MySql 5.0 as backend. I have used a filter that set..

ODBC text encoding API? - I need to extract text through ODBC from a variety of database systems (MySQL, MS SQL, IBM DB2 UDB, Oracle, Sybase ASE, ...). I currently use a legacy C++ program on Windows for this, and am not very familiar with ODBC. I've been looking through..

Access Database Read and Conversion - I have an Access DB that I can read into Java. I can access all text fields in the DB, with no problem, as long as I read them to a text field on a form. I was to be able to read the numberic fields and directly use them for computations without having...

OJB Problem - Hi, i'm a beginner with ojb. I had in the past an Debian Sarge with MYSQL Sserver, what is working fine with my java app (and ojb). Now we had change the server to an ubuntu system.The Java app has not been changegd, but i get errors like shown below...

getGeneratedKeys() problem - Hi, I am using IBM DB2 JDBC 2.0 Type 2 driver and understandably I am getting an exception when I call getGeneratedKeys() (Not supported on the webspehere java.sql.conenction implementation). Is there any other to get the keys other than doing another....
   Database Forums (Home) -> Java All times are: Pacific Time (US & Canada)
Goto page Previous  1, 2
Page 2 of 2

 
You can post new topics in this forum
You can reply to topics in this forum
You can edit your posts in this forum
You can delete your posts in this forum
You can vote in polls in this forum



[ Contact us | Terms of Service/Privacy Policy ]