D has a number of packages in the standard library Phobos to convert strings between encodings. Just to name some of them:
By default D strings have no specific encoding. In the Western world ISO-Latin-1 (a 1-byte encoding) is probably most often used. Other D classes like wstring expect 2-byte encodings (UTF-16). Class dstring expects a 4-byte encoding (UTF-32). In the DOS world (pre-Windows) codepage 850 was a frequently used encoding. If you use old texts (my texts date sometimes from the pre-DOS world) or if you use VIM on the console level then codepage 850 is probably the base encoding. This encoding uses 1 byte to represent ASCII (codes 0-127) and all usual French accents and German umlauts (codes >127). In the Windows 7 world other codepages like 1250 are used for external files (internally Windows uses a 2-byte encoding).
The normal external encoding for text files is now UTF-8 - a multibyte encoding or ISO-Latin-1 (a single byte encoding). UTF-8 codes the ASCII characters in the range 0-127. All other characters (also German Umlauts and French accents) require 2 to 4 bytes to be encoded. The usual german umlauts and French accents are two byte codes starting with 0xc3.
I have written a simple conversion program in D that trancodes such old texts into UTF-8 texts. This isn't however a complete conversion: I concentrate here only on German umlauts and French accents. This D programm is a partial solution. Here is my conversion program written in D. Hints for compilation and usage are given in the source code. You may test it with this short text file that uses codepage 850 - the text will probably be shown with strange wrong characters in the browser. You can test the conversion result with Scite or any other modern programming editor.
This conversion module is also used in a more complex program called dictionary.d. This program is a rather simple dictionary program based on an associative array (the dictionary). It reads its data from a file (may be codepage 850 encoded) and accepts keys (for test) from the Windows console.
A similar program written in standard C is here.
Copyright for all images, texts and software on this page: Dr. E. HuckertIf you want to contact me: this is my mail address