Monday 30 May 2016

Haskell: Specify encoding and decoding

Encoding is a process of storing given sequence of characters into specialized format, for efficient transformation and storage. Decoding is the reverse process of encoding which converts this specialized format into sequence of characters. Haskell provides following functions to set and get encoding used in transformation.

Function
Signature
Description
hSetEncoding
hSetEncoding :: Handle -> TextEncoding -> IO ()
Sets the text encoding for the handle hdl to given encoding. localeEncoding is the default encoding for a handle at the time of creation. If you open a file in Binary mode, encoding don’t affects here, since binary mode uses no encoding at all.
hGetEncoding
hGetEncoding :: Handle -> IO (Maybe TextEncoding)
Return the current TextEncoding for the specified Handle, or Nothing if the Handle is in binary mode.

Following encoding schemes are supported by TextEncoding type.

Encoding
Description
latin1
Uses ISO8859-1 encoding. It uses first 256 Unicode code points, so when you try to write a character greater than '\255' to a Handle using the latin1 encoding will result in an error.
utf8
The UTF-8 Unicode encoding.
utf8_bom
The UTF-8 BOM is a sequence of bytes (EF BB BF) that allows the reader to identify a file as being encoded in UTF-8.
utf16
The UTF-16 Unicode encoding.
utf16le
The UTF-16 Unicode encoding (litte-endian)
utf16be
The UTF-16 Unicode encoding (big-endian)
utf32
The UTF-32 Unicode encoding.
utf32le
The UTF-32 Unicode encoding (litte-endian).
utf32be
The UTF-32 Unicode encoding (big-endian).
localeEncoding
The Unicode encoding of the current locale.
char8
An encoding in which Unicode code points are translated to bytes by taking the code point modulo 256. When decoding, bytes are translated directly into the equivalent code point.This encoding never fails in either direction. However, encoding discards information, so encode followed by decode is not the identity.


FileUtil.hs
import System.IO

main = 
    do
        putStrLn "Enter file name (Including full path) to write"
        fileName <- getLine
        fileHandle <- openFile fileName WriteMode
        endocoding <- mkTextEncoding "UTF-32LE"
        hSetEncoding fileHandle endocoding

        hPutStrLn fileHandle "我愛PTR"

        hClose fileHandle

        fileHandle <- openFile fileName ReadMode
        hSetEncoding fileHandle endocoding
        info <- hGetContents fileHandle

        putStrLn info

$ runghc FileUtil.hs
Enter file name (Including full path) to write
temp.txt
我愛PTR

When you open ‘temp.txt’ file in editor, you can see following information.

1162 0000 1b61 0000 5000 0000 5400 0000

5200 0000 0a00 0000



Previous                                                 Next                                                 Home

No comments:

Post a Comment