Monday 30 May 2016

Haskell: mkTextEncoding: Handling illegal characters

‘mkTextEncoding’ takes a string and convert it into IO TextEncoding.

Prelude System.IO> :t mkTextEncoding
mkTextEncoding :: String -> IO TextEncoding


FileUtil.hs
import System.IO

main = 
    do
        putStrLn "Enter file name (Including full path) to write"
        fileName <- getLine
        fileHandle <- openFile fileName WriteMode
        endocoding <- mkTextEncoding "UTF-32LE"
        hSetEncoding fileHandle endocoding

        hPutStrLn fileHandle "我愛PTR"

        hClose fileHandle

        fileHandle <- openFile fileName ReadMode
        hSetEncoding fileHandle endocoding
        info <- hGetContents fileHandle

        putStrLn info

$ runghc FileUtil.hs
Enter file name (Including full path) to write
temp.txt
我愛PTR

When you open ‘temp.txt’ file in editor, you can see following information.

1162 0000 1b61 0000 5000 0000 5400 0000
5200 0000 0a00 0000

mkTextEncoding provides special notation to handle illegal characters.

Notation
Description
//IGNORE
Ignores all illegal input sequences. e.g. UTF-8//IGNORE, will cause all illegal sequences on input to be ignored, and on output will drop all code points that have no representation in the target encoding.
//TRANSLIT
Choose a replacement character for illegal sequences or code points.
//ROUNDTRIP
Will use a PEP383-style escape mechanism to represent any invalid bytes in the input as Unicode codepoints. Upon output, these special codepoints are detected and turned back into the corresponding original byte.



Previous                                                 Next                                                 Home

No comments:

Post a Comment