Convert ae, oe, ue, ss to ä, ö, ü, ß where applicable Thread poster: Hans Lenting
|
I have a list with about 40K (1) entries where ä, Ä, ö, Ö, ü, Ü and ß have been transcribed as ae, Ae, oe, Oe, ue, Ue and ss. But the list also contains (2) entries where ae, Ae, oe, Oe, ue, Ue and ss are not transcriptions of ä, Ä, ö, Ö, ü, Ü and ß. Question: How can I correct entries of type (1) but leave entries of type (2) unmodified? Ablesegeraet Ablieferungspruefung (1) Ablieferungspruefungen abmeisseln Abmessen Abmessung Abschaltfrequenz (2) Abschaltreaktivitaet Abschaltsteuerung Abschaltverstaerker abschiessen Abschirmbehaelter (1) Abschirmungsschlauch (2) Abschlaege Abschlaeger Abschlaglaenge Abschlagschuss abschliessen | | | esperantisto Local time: 19:38 Member (2006) English to Russian + ... SITE LOCALIZER
First, batch replace ae with ä, oe with ö etc. Then replace most obvious wrong replacements such as ßch to ssch. Run a spellchecker and correct as suggested. | | | Erik Freitag Germany Local time: 18:38 Member (2006) Dutch to German + ...
esperantisto wrote: First, batch replace ae with ä, oe with ö etc. Then replace most obvious wrong replacements such as ßch to ssch. Run a spellchecker and correct as suggested. That'd be my advice, too. Type 1 errors with umlauts will be few and far between anyway. You'll have most of them covered by re-replacing "qü" with "que", "Qü" with "Que", "eü" with "eue", and "Eü" with "Eue". Then, as esperantisto suggests, do ßch->ssch. Correct what's left over with a spellchecker (preferrably a good one, the old Duden spellchecker comes to mind). You may be left with not as many manual corrections as one would think at first glance. Succes! | | | Samuel Murray Netherlands Local time: 18:38 Member (2006) English to Afrikaans + ...
Hans Lenting wrote: I have a list with about 40 000 entries... How can I correct entries of type (1) but leave entries of type (2) unmodified? I'm afraid you're going to have to use a spell-checker, and it would have to be a spell-checker capable of checking compound nouns. Do you have such a spell-checker? I would be surprised if MS Word's spell-checker can't do this sort of thing. Then it's a matter of removing mis-spelled words from the list, then doing conversions on those mis-spelled words, then removing the mis-spelled words from that list, and then you're left with a list of words that your spell-checker doesn't recognise with or without the conversion, which you'd have to check manually. One possible downside to this method (that you can work around, if you know of it) is that only one variant of a word will end up in the final list. So if for example both "ass" and "aß" are valid German words, then only one of them will end up in your list. I use a macro in MS Word from editorium.com that makes a list of mis-spelled words, although the macro does not remove those words from the original list (so you'd have to find a way of doing that). On a large document with many mis-spellings, your display could freeze until the macro has run its entire course. You can try to increase the speed by replacing line breaks with spaces temporarily. You may also benefit from a different macro (or second macro) that highlights mis-spelled words in the original list. I googled for it and found one that works for me, here. In addition, I confirm that this macro works in Excel 365 (at least, it works in French) -- it highlights whole cells, so you'd have to ensure you have one word per cell. Samuel
[Edited at 2021-01-02 12:00 GMT] | |
|
|
Heinrich Pesch Finland Local time: 19:38 Member (2003) Finnish to German + ... qu/Qu und ssch ersetzen | Jan 2, 2021 |
Diese durch Sonderzeichen ersetzen und dann die generelle Ersetzung von ue -> ü, ss -> ß etc. durchführen. Danach die Sonderzeichen zurückkonvertieren. Ich bin mit der Rechtschreibprüfung von Word zufrieden. Bei ß muss man natürlich aufpassen, dass nach Diphthong ß steht, selten aber nach einzelnen Vokalen. Also würde ich iess nach ieß generell konvertieren etc. Oder die Liste gilt für die Schweiz. Dann kein ß. Am Schluss musst du die Liste doch manuell prüfen. | | | Hans Lenting Netherlands Member (2006) German to Dutch TOPIC STARTER Es war viel Arbeit | Jan 3, 2021 |
Heinrich Pesch wrote: Am Schluss musst du die Liste doch manuell prüfen. Genau so habe ich es auch gemacht. Und dabei ein neues Wort gelernt: https://iate.europa.eu/search/standard/result/1609653594195/1 Rebate on the rebate. I think that says it all. This German word is perhaps doomed to perish. Curiously, there’s no entry for “good riddance”. | | | Hans Lenting Netherlands Member (2006) German to Dutch TOPIC STARTER
Samuel Murray wrote: One possible downside to this method (that you can work around, if you know of it) is that only one variant of a word will end up in the final list. So if for example both "ass" and "aß" are valid German words, then only one of them will end up in your list. I used this list to fix misspellings in my downloaded copy of the IATE de_nl. Since I added the term pairs with the corrected spelling, the old ones, probably from the beginning of IATE, are still available. On the other hand, there will be many term pairs where I incorrectly replaced an ae with ä, etc. For my purposes, that doesn't matter: the correct spelling forms are still available. I wonder whether the IATE will ever be corrected in this regard. Probably not, since that would be gigantic operation. | | | Hans Lenting Netherlands Member (2006) German to Dutch TOPIC STARTER Another approach | Jan 9, 2021 |
In order to reduce the number of words that I would have to check manually, I came up with this other approach: From various sources I collected lists with correctly spelled German words. I placed them in one file of about 500K words. From this list I extracted all words with an ä, Ä, ö, Ö, ü, Ü or ß, resulting in a new list of about 76K words. I changed all words in this list to lowercase and copied them to the second column of a spreadsheet. I then replaced all... See more In order to reduce the number of words that I would have to check manually, I came up with this other approach: From various sources I collected lists with correctly spelled German words. I placed them in one file of about 500K words. From this list I extracted all words with an ä, Ä, ö, Ö, ü, Ü or ß, resulting in a new list of about 76K words. I changed all words in this list to lowercase and copied them to the second column of a spreadsheet. I then replaced all ä, ö, ü and ß in the 76K list to ae, oe, ue and ss and copied the result to the first column of the spreadsheet. Finally, I used this spreadsheet to make case-adaptive replacement to the original list of 40K words with incorrect spelling. So, using the 76K list I have entries like: and: And with this, I can correct words like: Fuehrungsgelaende Gelaendefuehrung
[Edited at 2021-01-09 09:07 GMT] ▲ Collapse | | | To report site rules violations or get help, contact a site moderator: You can also contact site staff by submitting a support request » Convert ae, oe, ue, ss to ä, ö, ü, ß where applicable Wordfast Pro | Translation Memory Software for Any Platform
Exclusive discount for ProZ.com users!
Save over 13% when purchasing Wordfast Pro through ProZ.com. Wordfast is the world's #1 provider of platform-independent Translation Memory software. Consistently ranked the most user-friendly and highest value
Buy now! » |
| Protemos translation business management system | Create your account in minutes, and start working! 3-month trial for agencies, and free for freelancers!
The system lets you keep client/vendor database, with contacts and rates, manage projects and assign jobs to vendors, issue invoices, track payments, store and manage project files, generate business reports on turnover profit per client/manager etc.
More info » |
|
| | | | X Sign in to your ProZ.com account... | | | | | |