How to fix corrupted character encoding (corrupted text) in Microsoft Word
What is text character corruption?
People who actively work with Plain Text files that have a suffix with an extension.TXT, will sometimes encounter documents showing distorted text instead of the expected one. This phenomenon often occurs when a corrupted text document is written in a foreign language that does not use the Latin alphabet, but can happen for all files if there are inconsistencies in the settings used when saving the file.
Character corruption occurs when the save file uses a default encoding of the file that is different from the end-user program. Most computer programs use UTF-8 encoding by default, but foreign characters usually also have one or more encoding systems depending on the language. For example, Asian languages use a 16-bit encoding system; therefore, when a document is opened on a machine that uses an 8-bit system (for example, UTF-8), the text will be replaced with distorted characters.
Rest assured, the damaged text is not lost.There are many ways to fix a corrupted character encoding, including using special software created for this particular scenario. However, if you only want to fix one or two documents, downloading and installing new software can be a problem. Here I will show you how to fix these corrupted text files in Microsoft Word, which is probably already installed on computers running the Windows operating system.
If you are using a Windows computer, most likely you already have Microsoft Word installed.Microsoft Word has a built-in character encoding converter that can be used to save a file in the desired encoding.
This fix will work with Microsoft Word 2003 and higher.
Windows opens simple text files by default (with the extension .txt) using the Notepad program. To open a damaged document in Microsoft Word:
Right-click the document
Select "Open with"
The "Convert File" dialog box should open automatically when a file with a corrupted encoding is detected.Select "Encoded Text" from the list of options and click OK.
If the dialog box does not appear, it must be started manually. Go to "File" -> "Options" -> "Advanced" and scroll down until you reach the "General" section. In the "General" section, select the "Confirm file format conversion when opening" checkbox. Close Word and reopen the damaged document, and a dialog box will appear.
The encoding selection dialog box should automatically suggest the correct encoding.If this is not the case, you can manually select the encoding from the list.
Select "Automatic Selection" if you are not sure of the source encoding, or select from the list if you know the language in which the file is located. You will be able to check if the corrupted file is fixed in the preview window.
The recovered text can now be read in Microsoft Word, but it can still be displayed as corrupted in plain text processing software, since many of them are not written to handle special character encoding. To avoid this, it is best to save the document in plain text encoding, such as UTF-8 or UTF-16.
To do this, click the "File" tab in the upper-left corner of the document and select "Save As" from the list. Select the folder to save and select "Plain Text Document" as the file format. Click "Save".
A new File Conversion dialog box opens. From the list, select the encoding for the final document. In the preview field, words that will not be saved correctly will be highlighted in red, so try to choose the encoding that matches the document. In case of doubt, it is best to use the Unicode format as an encoding, since it is designed with all the world's writing systems in mind.
Finally, click OK to save the corrected document.
Your document should now be displayed correctly in your chosen plain text processing program, such as Notepad.
What is the encoding and what does it depend on?
The encoding may vary significantly for each region. To understand the encoding, it is necessary to know that the information in a text document is stored in the form of some numeric values. A personal computer independently converts numbers into text, using a separate encoding algorithm. For the CIS countries, the encoding of files with the name "Cyrillic" is used, and for other regions, such as Western Europe, "Western European (Windows)" is used. If a text document was saved in Cyrillic encoding, and opened using a Western European format, then the characters will be displayed completely incorrectly, representing a meaningless set of characters.
How to change the encoding in Word
When opening a document saved by one encoding type, it will be impossible to read in another encoding format
In order to avoid misunderstandings and facilitate work, the developers have implemented a special single encoding for all alphabets – "Unicode". This generally accepted encoding standard contains almost all the signs of most written languages of our planet. In addition, it prevails on the Internet, where such unification is so necessary to reach more users and meet their needs.
The type of encodings that are used as standard for all languages
"Word 2013" works just on the basis of Unicode, which allows you to exchange text files without using third-party programs and correcting encodings in the settings. But often users are faced with a situation where when opening a seemingly simple file, only signs are displayed instead of text. In this case, the program "Word" incorrectly determined the existing original encoding of the text.
Help! Some encodings are applied to certain languages. The encoding "Shift JIS" was specially developed for Japanese, "EUC–KR" for Korean, and "ISO-2022" and "EUC" for Chinese.
Changing the encoding directly in the browser
In any browser there is a special option for transcoding a separate page. So, in Google Chrome, you need to go to the "Tools" menu and specify the necessary encoding. CP1251 (sometimes with the prefix "Windows", "Microsoft") and UTF8 are considered standard in Runet. The latter is the most common, it is used on sites by default. Opera, Mozilla and other browsers also have a similar feature. Usually it is not difficult to find an option. There is no point in giving detailed instructions for each browser, because updates are released quite often in them, and the location of the functional icons may change. And in Google Chrome, the interface has been about the same for a long time.
The ability to change the encoding using Word or other applications is a very useful feature. Thanks to her, even if you find yourself in an alien environment (in a document with incomprehensible writings), you will quickly establish mutual understanding with the text. That would be the way it would be abroad: if you wanted to shine in a foreign language, you switched something in your head – and you are already operating with foreign words.
Saving with encoding indication
The user may have a situation when he specifically specifies a certain encoding. For example, such a requirement is presented to him by the recipient of the document. In this case, you will need to save the document as plain text through the File menu. The point is that for the specified formats in the Word there are encodings bound by global system settings, and for "Plain text" such a connection is not established. Therefore, the Word will offer to choose its own encoding for it, showing the document conversion window that is already familiar to us. Choose the encoding you need for it, save it, and you can send or transmit this document. As you understand, the final recipient will need to change the encoding in his text editor to the same one in order to read your text.
The question of changing the encoding in Word documents does not arise so often for ordinary users. As a rule, a word processor can automatically determine the set of characters required for correct display and show the text in a readable form. But there are exceptions to any rule, so it is necessary and useful to be able to do it yourself, fortunately, the process is implemented in Word quite simply.
What we have reviewed is also valid for other programs from the Office suite. They may also have problems due to, say, incompatibility of saved file formats. Here the user will have to perform all the same actions, so this article can help not only those working in the Word. Unification of the configuration rules for all Microsoft office suite programs helps not to get confused in them when working with any kind of documents, be it texts, tables or presentations.
Finally, I must say that it is not always worth blaming the encoding. Perhaps everything is much simpler. The fact is that many users in the pursuit of "prettiness" forget about standardization. If such an author chooses a font installed by him, types a document with it and saves it, the text will be displayed correctly. But when this document gets to a person who does not have such a font installed, an unreadable set of characters will appear on the screen. This is very similar to the "flown" encoding, so it's easy to make a mistake. Therefore, before trying to decode the text in Word, first try just changing the font.
How to change the encoding in the word