Programming for beginners: Haskell: mkTextEncoding: Handling illegal characters

‘mkTextEncoding’ takes a string and convert it into IO TextEncoding.

Prelude System.IO> :t mkTextEncoding

mkTextEncoding :: String -> IO TextEncoding

FileUtil.hs

import System.IO

main = 
    do
        putStrLn "Enter file name (Including full path) to write"
        fileName <- getLine
        fileHandle <- openFile fileName WriteMode
        endocoding <- mkTextEncoding "UTF-32LE"
        hSetEncoding fileHandle endocoding

        hPutStrLn fileHandle "我愛PTR"

        hClose fileHandle

        fileHandle <- openFile fileName ReadMode
        hSetEncoding fileHandle endocoding
        info <- hGetContents fileHandle

        putStrLn info

$ runghc FileUtil.hs
Enter file name (Including full path) to write
temp.txt
我愛PTR

When you open ‘temp.txt’ file in editor, you can see following information.

1162 0000 1b61 0000 5000 0000 5400 0000

5200 0000 0a00 0000

mkTextEncoding provides special notation to handle illegal characters.

Notation	Description
//IGNORE	Ignores all illegal input sequences. e.g. UTF-8//IGNORE, will cause all illegal sequences on input to be ignored, and on output will drop all code points that have no representation in the target encoding.
//TRANSLIT	Choose a replacement character for illegal sequences or code points.
//ROUNDTRIP	Will use a PEP383-style escape mechanism to represent any invalid bytes in the input as Unicode codepoints. Upon output, these special codepoints are detected and turned back into the corresponding original byte.

Previous Next Home

Programming for beginners

Monday, 30 May 2016

Haskell: mkTextEncoding: Handling illegal characters

No comments:

Post a Comment