Welcome to dbForumz.com!
FAQFAQ    SearchSearch      ProfileProfile    Private MessagesPrivate Messages   Log inLog in

Encoding conversion problem

 
Goto page 1, 2
   Database Forums (Home) -> Java RSS
Next:  Database tampering check  
Author Message
Andrea

External


Since: Dec 19, 2007
Posts: 12



(Msg. 1) Posted: Mon Feb 11, 2008 4:03 am
Post subject: Encoding conversion problem
Archived from groups: comp>lang>java>databases (more info?)

Hi,
I have a J2EE application which connects to a DB2 configured with code
set IBM-850. The application works with encoding ISO-8859-1.
If I save characters outside the range supported by IBM-850 (i.e. the
euro currency character EURO) then I read garbage...

I tried encoding conversions with InputStreamReader and
OutputStreamWriter:
....
BufferedReader reader = new BufferedReader(new
InputStreamReader(source, "IBM850"));
BufferedWriter writer = new BufferedWriter(new
OutputStreamWriter(output, "ISO-8859-1"));
....

but that didn't work...
My JVM Charset.availableCharsets() includes IBM850.

What can I do?

Thanks, in advance,
Andrea

 >> Stay informed about: Encoding conversion problem 
Back to top
Login to vote
Andrea

External


Since: Dec 19, 2007
Posts: 12



(Msg. 2) Posted: Tue Feb 12, 2008 12:25 am
Post subject: Re: Encoding conversion problem [Login to view extended thread Info.]
Archived from groups: per prev. post (more info?)

Hi Lothar,

> > I have a J2EE application which connects to a DB2 configured with code
> > set IBM-850. The application works with encoding ISO-8859-1.
>
> In general the JDBC-driver is aware of the encoding, the database
> is using and is doing the conversion already if you access the
> column by getString(columnName/index).
Yes I fetch the string with Resultset.getString(index). I use DB2
Universal Driver with a type 4 connection.


> > I tried encoding conversions with InputStreamReader and OutputStreamWriter:
> > ...
> > BufferedReader reader = new BufferedReader(new
> > InputStreamReader(source, "IBM850"));
>
> What is source? How do you create that from the JDBC-
> resultset?

I tried:
InputStream source = new
ByteArrayInputStream(stringFetchedFromDB.getBytes());

Thanks,
Andrea

 >> Stay informed about: Encoding conversion problem 
Back to top
Login to vote
Andrea

External


Since: Dec 19, 2007
Posts: 12



(Msg. 3) Posted: Tue Feb 12, 2008 3:22 am
Post subject: Re: Encoding conversion problem [Login to view extended thread Info.]
Archived from groups: per prev. post (more info?)

> > ...
> > If I save characters outside the range supported by IBM-850 (i.e. the
> > euro currency character EURO) then I read garbage...
>
> Yes, the Euro symbol is not part of the encodings, so your database
> can't contain it.
I've found a strange thing: C and COBOL application can write and read
(using embedded SQL) characters outside the accepted range without
problems... So the database can contain those characters without
loosing any information, but I can't understand how...

> If you need it, you would have to change the databases
> encoding (ISO-8859-15 includes the Euro symbol).
> Otherwise, you have to take care not to try to write unsupported
> character into string/character fields.
>
> One solution could be to parse all strings and replace the symbol with
> the shorthand "EUR", but it might not be acceptable to your client.
Actually the EURO character is just an example, I have more complex
strings to handle (and I can't change the encoding of the database).
If my problem has no solution at all then I'd like to understand why
other languages don't have this problem...

Thanks,
Andrea
 >> Stay informed about: Encoding conversion problem 
Back to top
Login to vote
Lothar Kimmeringer

External


Since: Nov 06, 2007
Posts: 3



(Msg. 4) Posted: Tue Feb 12, 2008 5:00 am
Post subject: Re: Encoding conversion problem [Login to view extended thread Info.]
Archived from groups: per prev. post (more info?)

Andrea wrote:

> I have a J2EE application which connects to a DB2 configured with code
> set IBM-850. The application works with encoding ISO-8859-1.

In general the JDBC-driver is aware of the encoding, the database
is using and is doing the conversion already if you access the
column by getString(columnName/index).

> I tried encoding conversions with InputStreamReader and
> OutputStreamWriter:
> ...
> BufferedReader reader = new BufferedReader(new
> InputStreamReader(source, "IBM850"));

What is source? How do you create that from the JDBC-
resultset?

> BufferedWriter writer = new BufferedWriter(new
> OutputStreamWriter(output, "ISO-8859-1"));

That looks OK.


Regards, Lothar
--
Lothar Kimmeringer E-Mail: spamfang DeleteThis @kimmeringer.de
PGP-encrypted mails preferred (Key-ID: 0x8BC3CD81)

Always remember: The answer is forty-two, there can only be wrong
questions!
 >> Stay informed about: Encoding conversion problem 
Back to top
Login to vote
Andrea

External


Since: Dec 19, 2007
Posts: 12



(Msg. 5) Posted: Tue Feb 12, 2008 6:33 am
Post subject: Re: Encoding conversion problem [Login to view extended thread Info.]
Archived from groups: per prev. post (more info?)

Hi Sabine,
thank you for your explanation, now the overall situation is much more
clear to me.

Thanks,
Andrea
 >> Stay informed about: Encoding conversion problem 
Back to top
Login to vote
Sabine Dinis Blochberger

External


Since: Apr 24, 2007
Posts: 7



(Msg. 6) Posted: Tue Feb 12, 2008 8:00 am
Post subject: Re: Encoding conversion problem [Login to view extended thread Info.]
Archived from groups: per prev. post (more info?)

Andrea wrote:

> Hi,
> I have a J2EE application which connects to a DB2 configured with code
> set IBM-850. The application works with encoding ISO-8859-1.
> If I save characters outside the range supported by IBM-850 (i.e. the
> euro currency character EURO) then I read garbage...

Yes, the Euro symbol is not part of the encodings, so your database
can't contain it. If you need it, you would have to change the databases
encoding (ISO-8859-15 includes the Euro symbol).

Otherwise, you have to take care not to try to write unsupported
character into string/character fields.

One solution could be to parse all strings and replace the symbol with
the shorthand "EUR", but it might not be acceptable to your client.
--
Sabine Dinis Blochberger

Op3racional
www.op3racional.eu
 >> Stay informed about: Encoding conversion problem 
Back to top
Login to vote
Sabine Dinis Blochberger

External


Since: Apr 24, 2007
Posts: 7



(Msg. 7) Posted: Tue Feb 12, 2008 11:01 am
Post subject: Re: Encoding conversion problem [Login to view extended thread Info.]
Archived from groups: per prev. post (more info?)

Andrea wrote:

> > > ...
> > > If I save characters outside the range supported by IBM-850 (i.e. the
> > > euro currency character EURO) then I read garbage...
> >
> > Yes, the Euro symbol is not part of the encodings, so your database
> > can't contain it.
> I've found a strange thing: C and COBOL application can write and read
> (using embedded SQL) characters outside the accepted range without
> problems... So the database can contain those characters without
> loosing any information, but I can't understand how...
>
Yes, in theory you can store any value (0 - 255 in case of one byte
strings) in a string, but how that is interpreted (i.e. encoding) is
where it gets hairy. Also, multibyte characters would break the
interpretation.

> > If you need it, you would have to change the databases
> > encoding (ISO-8859-15 includes the Euro symbol).
> > Otherwise, you have to take care not to try to write unsupported
> > character into string/character fields.
> >
> > One solution could be to parse all strings and replace the symbol with
> > the shorthand "EUR", but it might not be acceptable to your client.
> Actually the EURO character is just an example, I have more complex
> strings to handle (and I can't change the encoding of the database).
> If my problem has no solution at all then I'd like to understand why
> other languages don't have this problem...
>
Ah, there is always hacks around limitations. But they aren't usually
pretty. The problem is to funnel a string with these "unsupported"
characters through the JDBC driver (both ways).

You might get around it by using typeless fields (you can put any byte
sequence there), like BLOBS maybe...

Or you write a parser that substitutes the impossible characters with
acceptable replacements. Of course, this is most likele not feasable.

But the customer has to be aware that a database with encoding X can
only hold strings encoded in X. If they need UTF-8 for example now, they
will eventually have to change their database. And it would be better to
migrate to a suitable encoding than to hack around it and in a few
years, have to do all over again (and then some), when they finally do
want to change the database encoding.

On other languages not having the problem, in C, you can treat a string
just like an array of bytes and use those for whatever you like, the
compiler won't complain. Even interpreting them as memory addresses is
possible, adding and subtracting etc...

> Thanks,
> Andrea

--
Sabine Dinis Blochberger

Op3racional
www.op3racional.eu
 >> Stay informed about: Encoding conversion problem 
Back to top
Login to vote
Roedy Green

External


Since: Nov 04, 2007
Posts: 35



(Msg. 8) Posted: Tue Feb 12, 2008 2:01 pm
Post subject: Re: Encoding conversion problem [Login to view extended thread Info.]
Archived from groups: per prev. post (more info?)

On Mon, 11 Feb 2008 04:03:47 -0800 (PST), Andrea
wrote, quoted or indirectly quoted
someone who said :

>I have a J2EE application which connects to a DB2 configured with code
>set IBM-850. The application works with encoding ISO-8859-1.
>If I save characters outside the range supported by IBM-850 (i.e. the
>euro currency character EURO) then I read garbage...

First, make sure the data are truly encoded in IBM-850.
See http://mindprod.com/applet/encodingrecogniser.html

If there are characters int that file outside the range of IBM-850,
then by definition the file is not encoded in IBM-850 and you SHOULD
expect garbage.

You can write your own translate program to handle the excess chars.

see http://mindprod.com/jgloss/encoding.html

I don't know how to hook it in as an official encoding, but that is
not necessary.
--

Roedy Green Canadian Mind Products
The Java Glossary
http://mindprod.com
 >> Stay informed about: Encoding conversion problem 
Back to top
Login to vote
Roedy Green

External


Since: Nov 04, 2007
Posts: 35



(Msg. 9) Posted: Tue Feb 12, 2008 2:01 pm
Post subject: Re: Encoding conversion problem [Login to view extended thread Info.]
Archived from groups: per prev. post (more info?)

On Mon, 11 Feb 2008 04:03:47 -0800 (PST), Andrea
wrote, quoted or indirectly quoted
someone who said :

>BufferedReader reader = new BufferedReader(new
>InputStreamReader(source, "IBM850"));
>BufferedWriter writer = new BufferedWriter(new
>OutputStreamWriter(output, "ISO-8859-1"));

Your first task is to find out just what you are being handed before
you start fooling around with translations.

Unicode, IBM850, ISO-8859-1, something else?
--

Roedy Green Canadian Mind Products
The Java Glossary
http://mindprod.com
 >> Stay informed about: Encoding conversion problem 
Back to top
Login to vote
Andrea

External


Since: Dec 19, 2007
Posts: 12



(Msg. 10) Posted: Wed Feb 13, 2008 3:22 am
Post subject: Re: Encoding conversion problem [Login to view extended thread Info.]
Archived from groups: per prev. post (more info?)

Hi Roedy,
the database (DB2) has this configuration:
....
Database territory = US
Database code page = 850
Database code set = IBM-850
....

I've exported to a file the content of a table with a CHAR(N) field
containing the EURO currency character, then I've opened the file with
EncodingRecognizer: if I choose IBM850 I see a strange character (like
a small X), if I choose ISO-8859-1 I see a square.

I tried a translation with:

String problematicString = rs.getString(index);
problematicString = new String(problematicString, "IBM850"); // Am I
correct?

but I still get garbage Sad


Thanks,
Andrea
 >> Stay informed about: Encoding conversion problem 
Back to top
Login to vote
Andrea

External


Since: Dec 19, 2007
Posts: 12



(Msg. 11) Posted: Wed Feb 13, 2008 6:22 am
Post subject: Re: Encoding conversion problem [Login to view extended thread Info.]
Archived from groups: per prev. post (more info?)

Hi Silvio,
the settings are taken from the DB2 instance of a customer (and I
can't change them). The very same code works, of course, without
problems with a DB2 instance configured with ISO-8859-1.

The problem arises also when a C program stores to DB a string with
non-IBM850 valid characters: another C program can read the string
without problems while Java can't; so the string is not corrupted when
saved to DB but someone (JDBC driver? Java I/O?) looses something when
I read the field with Resultset.getString(int index) and I can't
convert it correctly (or I haven't found the right way to do it yet,
if it exists...).

BTW: a test has been made on DB2 with a table with a field declared
CHAR(n) FOR BIT DATA and Java code works without problems reading and
writing non-IBM850 characters.

Having read your feedback (THANK YOU EVERYONE!) I would say that
there's no way to read back those characters in Java in my
application.

Thanks again,
Andrea
 >> Stay informed about: Encoding conversion problem 
Back to top
Login to vote
Andrea

External


Since: Dec 19, 2007
Posts: 12



(Msg. 12) Posted: Wed Feb 13, 2008 7:28 am
Post subject: Re: Encoding conversion problem [Login to view extended thread Info.]
Archived from groups: per prev. post (more info?)

Hi Silvio,
the settings are taken from the DB2 instance of a customer (and I
can't change them). The very same code works, of course, without
problems with a DB2 instance configured with ISO-8859-1.

The problem arises also when a C program stores to DB a string with
non-IBM850 valid characters: another C program can read the string
without problems while Java can't; so the string is not corrupted when
saved to DB but someone (JDBC driver? Java I/O?) looses something when
I read the field with Resultset.getString(int index) and I can't
convert it correctly (or I haven't found the right way to do it yet,
if it exists...).

A test has been made on DB2 with a table with a field declared CHAR(n)
FOR BIT DATA and Java code works without problems reading and writing
non-IBM850 characters.

Having read your feedback (THANK YOU EVERYONE!) I would say that
there's no way to read back those characters in Java in my
application.

Thanks again,
Andrea
 >> Stay informed about: Encoding conversion problem 
Back to top
Login to vote
Silvio Bierman

External


Since: Sep 27, 2007
Posts: 10



(Msg. 13) Posted: Wed Feb 13, 2008 9:00 am
Post subject: Re: Encoding conversion problem [Login to view extended thread Info.]
Archived from groups: per prev. post (more info?)

Andrea wrote:
> Hi Roedy,
> the database (DB2) has this configuration:
> ...
> Database territory = US
> Database code page = 850
> Database code set = IBM-850
> ...
>
> I've exported to a file the content of a table with a CHAR(N) field
> containing the EURO currency character, then I've opened the file with
> EncodingRecognizer: if I choose IBM850 I see a strange character (like
> a small X), if I choose ISO-8859-1 I see a square.
>
> I tried a translation with:
>
> String problematicString = rs.getString(index);
> problematicString = new String(problematicString, "IBM850"); // Am I
> correct?
>
> but I still get garbage Sad
>
>
> Thanks,
> Andrea

Those are quite interesting database configuration parameters. Sounds
like a pre-Unicode setup to me...

Silvio Bierman
 >> Stay informed about: Encoding conversion problem 
Back to top
Login to vote
Silvio Bierman

External


Since: Sep 27, 2007
Posts: 10



(Msg. 14) Posted: Wed Feb 13, 2008 9:00 am
Post subject: Re: Encoding conversion problem [Login to view extended thread Info.]
Archived from groups: per prev. post (more info?)

Andrea wrote:
> Hi Roedy,
> the database (DB2) has this configuration:
> ...
> Database territory = US
> Database code page = 850
> Database code set = IBM-850
> ...
>
> I've exported to a file the content of a table with a CHAR(N) field
> containing the EURO currency character, then I've opened the file with
> EncodingRecognizer: if I choose IBM850 I see a strange character (like
> a small X), if I choose ISO-8859-1 I see a square.
>
> I tried a translation with:
>
> String problematicString = rs.getString(index);
> problematicString = new String(problematicString, "IBM850"); // Am I
> correct?
>
> but I still get garbage Sad
>
>
> Thanks,
> Andrea

You should look at the numeric byte values in problematicString. That
could give you an idea of what you are dealing with although it will
only disclose what your JDBC driver has made of it. It might have
already done an incorrect interpretation of a byte sequence.
Things could be even worse, the data could already have been mutilated
during insertion in the database when some program (possibly the same
program + JDBC driver?) put the data in. If the database encoding does
not support all characters that where in the original data then that is
what most likely happened.

Really, as I said in my previous post you should consider (a) going to a
different database that supports Unicode (b) refrain from
using/supporting non ASCII characters in your application or (c) do what
others have suggested and do your own translation from Unicode -> ASCII
-> DB -> ASCII -> Unicode. The latter option is only realistic if you
have wrapped all your JDBC code in some generic wrappers (which is
usually a good idea) so you can handle this locally.

Good luck,

Silvio
 >> Stay informed about: Encoding conversion problem 
Back to top
Login to vote
Silvio Bierman

External


Since: Sep 27, 2007
Posts: 10



(Msg. 15) Posted: Wed Feb 13, 2008 12:00 pm
Post subject: Re: Encoding conversion problem [Login to view extended thread Info.]
Archived from groups: per prev. post (more info?)

Hello Andrea,


Andrea wrote:
> Hi Silvio,
> the settings are taken from the DB2 instance of a customer (and I
> can't change them). The very same code works, of course, without
> problems with a DB2 instance configured with ISO-8859-1.
>
I already expected you could not change this but it was worth suggesting...

I do not really understand why a Euro sign would work with 8859-1 since
that does not contain that character as far as I am aware of.

> The problem arises also when a C program stores to DB a string with
> non-IBM850 valid characters: another C program can read the string
> without problems while Java can't; so the string is not corrupted when
> saved to DB but someone (JDBC driver? Java I/O?) looses something when
> I read the field with Resultset.getString(int index) and I can't
> convert it correctly (or I haven't found the right way to do it yet,
> if it exists...).
>

It will probably depend on how you access the DB from C (which I have
never done with DB2) but it does not sound surprising that the C binding
would just pass through byte sequences. As long as you stay inside 8-bit
character encoding and always interpret them in the same codepage then
this usually works (which is why many programmers are totally unaware of
character encoding issues).

Since Java programs handle strings as sequences of characters from the
Unicode character set all interfaces with external character storage
needs to be done encoding aware. This means that when doing input Java
String objects are created from byte sequences ALWAYS assuming an
encoding, if not explicitly specified defaulting to the platform
default. The other way around String objects are converted to byte
sequences when doing output as well ALWAYS using an encoding, again
defaulting to the platform default.

> BTW: a test has been made on DB2 with a table with a field declared
> CHAR(n) FOR BIT DATA and Java code works without problems reading and
> writing non-IBM850 characters.
>

That makes sense. What I said before means that if you do not specify
different encodings the same default will be used for input and output.
If the storage medium leaves the intermediate bytes alone (which any
binary database column type would do, just like a binary file would)
then output will again match input, again allowing encoding unaware
programmers to get away with it.

> Having read your feedback (THANK YOU EVERYONE!) I would say that
> there's no way to read back those characters in Java in my
> application.
>

The problem is probably that the default encoding in Java turns your
Unicode character into something the database is unwilling to store as
is in this encoding and therefore mutilates trying to fix something.
When queried the database then returns a byte sequence different from
what was put in.
After that modification the JDBC driver has no chance of restoring the
original value.

> Thanks again,
> Andrea


Best regards,

Silvio
 >> Stay informed about: Encoding conversion problem 
Back to top
Login to vote
Display posts from previous:   
Related Topics:
Character encoding problem Using Java 1.4.2 and Mysql 5.0 - Hi All, I am facing a problem for Character encoding that just made me crazy. Our user's copies text from different sites and pasting it . I m using jboss 3.2.3 and embedded tomcat 4.1.29 and MySql 5.0 as backend. I have used a filter that set..

ODBC text encoding API? - I need to extract text through ODBC from a variety of database systems (MySQL, MS SQL, IBM DB2 UDB, Oracle, Sybase ASE, ...). I currently use a legacy C++ program on Windows for this, and am not very familiar with ODBC. I've been looking through..

Access Database Read and Conversion - I have an Access DB that I can read into Java. I can access all text fields in the DB, with no problem, as long as I read them to a text field on a form. I was to be able to read the numberic fields and directly use them for computations without having...

OJB Problem - Hi, i'm a beginner with ojb. I had in the past an Debian Sarge with MYSQL Sserver, what is working fine with my java app (and ojb). Now we had change the server to an ubuntu system.The Java app has not been changegd, but i get errors like shown below...

getGeneratedKeys() problem - Hi, I am using IBM DB2 JDBC 2.0 Type 2 driver and understandably I am getting an exception when I call getGeneratedKeys() (Not supported on the webspehere java.sql.conenction implementation). Is there any other to get the keys other than doing another....
   Database Forums (Home) -> Java All times are: Pacific Time (US & Canada)
Goto page 1, 2
Page 1 of 2

 
You can post new topics in this forum
You can reply to topics in this forum
You can edit your posts in this forum
You can delete your posts in this forum
You can vote in polls in this forum



[ Contact us | Terms of Service/Privacy Policy ]