Google Translate API messing up Cyrillic
Thread poster: Susan Welsh
Susan Welsh
Susan Welsh  Identity Verified
United States
Local time: 12:20
Russian to English
+ ...
Dec 27, 2011

I have been translating a Russian document in a CAT tool, using the GT API with no problems, and all of a sudden it starts throwing in garbage for Cyrllic segments (lots of %signs, Ds and Os). Universal Online Cyrillic converter identifies the encoding as KOI-7. Yesterday when this happened, it worked fine on other segments; but today it's happening again. Has anyone else had this problem?

 
esperantisto
esperantisto  Identity Verified
Local time: 19:20
Member (2006)
English to Russian
+ ...
SITE LOCALIZER
No problem with Anaphraseus Dec 28, 2011

Please, provide more details on your environment. I experience no problem with the latest build of Anaphraseus in LibreOffice 3.4.4/OpenOffice.org 3.3.0 in Windows 7 and openSUSE 11.3/11.4 when translating ENG→RUS. Previously Anaphraseus returned strings of incorrectly decoded characters in UTF-8 (BTW, you description makes me think, your problem may be the same, but you need to provide a sample of what you get), but Ole solved the problem.

 
Didier Briel
Didier Briel  Identity Verified
France
Local time: 18:20
English to French
+ ...
Are you translating long segments? Dec 28, 2011

Susan Welsh wrote:
I have been translating a Russian document in a CAT tool, using the GT API with no problems, and all of a sudden it starts throwing in garbage for Cyrllic segments (lots of %signs, Ds and Os). Universal Online Cyrillic converter identifies the encoding as KOI-7. Yesterday when this happened, it worked fine on other segments; but today it's happening again. Has anyone else had this problem?

Do your "garbage" segments begin with
Server returned HTTP response code: 414 for URL:

I can reproduce the issue in OmegaT if I try and translate long segments from Russian to English. Short segments are translated fine, but I get the 414 error for long segments.

That's because Russian characters have to be encoded, so the strings are much longer than for "ASCII" based languages.

E.g.,
googleapis.com/language/translate/v2?key=xxxxx&source=RU&target=EN&q=%D0%92+1526+%D0%B3%D0%BE%D0%B4%D1%83+%D0%BF%D0%B5%D1%80%D0%B5%D0%B1%D1%80%D0%B0%D0%BB%D1%81%D1%8F

I know there is another method, which allows to send slightly longer strings. I'll check with Alex (he's more concerned than I am), but eventually the problem will always exist for lengthy segments.

Didier


 
Susan Welsh
Susan Welsh  Identity Verified
United States
Local time: 12:20
Russian to English
+ ...
TOPIC STARTER
example Dec 28, 2011

Hi Didier and esperantisto,

Didier, you seem to have identified the problem (although I would not say that this segment is terribly long), because it does give that code (below). I am working with OmegaT 2.5.2 on Ubuntu Linux, OOo 3.2.0.

(Esperantisto, I'm not familiar with Anaphraseus -- not sure what it is. I'll check when I get a chance.)

Thanks,
Susan

PS - After some editing, the garbage is no longer displaying in this message as I am
... See more
Hi Didier and esperantisto,

Didier, you seem to have identified the problem (although I would not say that this segment is terribly long), because it does give that code (below). I am working with OmegaT 2.5.2 on Ubuntu Linux, OOo 3.2.0.

(Esperantisto, I'm not familiar with Anaphraseus -- not sure what it is. I'll check when I get a chance.)

Thanks,
Susan

PS - After some editing, the garbage is no longer displaying in this message as I am seeing on my screen. It is exclusively full of %DO%BE%DO%B4 and stuff like that, with no Cyrillic words. I'm going to delete the example, except for the source text and the error code.

Выросши в холодной Сибири, постоянно с величайшим вниманием следя за описаниями полярных путешествий и многое узнав о них от покойного моего друга Норденшильда, совершившего ряд славных экспедиций в области льдов, я получил полное убеждение в возможности решительной победы над полярными льдами при помощи соответственных для того приспособлений и, главное, - ясного понимания сил, до сих пор препятствовавших кораблям проникнуть в неведомую околополюсную область, занимающую пространство около 4 млн кв.
Server returned HTTP response code: 414 ...

[Edited at 2011-12-28 14:49 GMT]
Collapse


 
Dominique Pivard
Dominique Pivard  Identity Verified
Local time: 19:20
Finnish to French
Anaphraseus Dec 29, 2011

Susan Welsh wrote:
(Esperantisto, I'm not familiar with Anaphraseus -- not sure what it is. I'll check when I get a chance.)

Anaphraseus (http://anaphraseus.sourceforge.net/ ) is a Wordfast (Classic) "clone". It works in OpenOffice instead of MS Office, is quite slower than Wordfast and has a much smaller feature set.


 


To report site rules violations or get help, contact a site moderator:

Moderator(s) of this forum
Laureana Pavon[Call to this topic]

You can also contact site staff by submitting a support request »

Google Translate API messing up Cyrillic






Wordfast Pro
Translation Memory Software for Any Platform

Exclusive discount for ProZ.com users! Save over 13% when purchasing Wordfast Pro through ProZ.com. Wordfast is the world's #1 provider of platform-independent Translation Memory software. Consistently ranked the most user-friendly and highest value

Buy now! »
CafeTran Espresso
You've never met a CAT tool this clever!

Translate faster & easier, using a sophisticated CAT tool built by a translator / developer. Accept jobs from clients who use Trados, MemoQ, Wordfast & major CAT tools. Download and start using CafeTran Espresso -- for free

Buy now! »