Sunday 1 May 2016

Haskell: Bangpatterns


Haskell is a lazy language, but lazy programming is not always good. Lazy evaluation needs to maintain track of the expressions. Haskell keep track of the expressions using thunk. I explained about this in my previous post “How Lazy evaluation works in Haskell”.

What is thunk?
Thunk represents a value that yet to be evauated. Haskell don’t evaluate thunk unless it has to.
Prelude> let fun x y = x
Prelude> fun 10 [1..]
10


In above case thunk represents [1..], since it is not used any where, Haskell don’t evaluate this infinite list.

Problem with thunks
If you keep building complex expressions to solve later, it keeps on consuming memory, obviously affects performance and you may end up in memory overflow error.
foldl (+) 0 [1, 2, 3, 4]
==> foldl (+) (0 + 1) [2, 3, 4]
==> foldl (+) ((0 + 1) + 2) [3, 4]
==> foldl (+) (((0 + 1) + 2) + 3) [4]
==> foldl (+) ((((0 + 1) + 2) + 3) + 4) []
==> (((0 + 1) + 2) + 3) + 4
==> ((1 + 2) + 3) + 4
==> (3 + 3) + 4
==> 6 + 4
==> 10

 
Explicit strictness
To solve above kind of problems, Haskell provides a way to enforce strict evaluation on expressions, instead of maintaining long thunk.
By using Bang pattern we can easily write strict programs in Haskell.

For example, let me try to show you the memory consumption of a program which calculates the sum of elements of a list.

Without Bang pattern

listUtil.hs
sumList :: [Integer] -> Integer
sumList xs = sumOfEle xs 0

sumOfEle :: [Integer] -> Integer -> Integer
sumOfEle [] acc = acc
sumOfEle (x:xs) acc = sumOfEle xs (acc+x)

main = print (sumList [1..1000000])


Following commands are used to calculate the memory usage of above program.

ghc listUtil.hs -rtsopts
./listUtil +RTS -s -h -i0.001
$  ghc listUtil.hs -rtsopts
[1 of 1] Compiling Main             ( listUtil.hs, listUtil.o )
Linking listUtil ...
$ ./listUtil +RTS -s -h -i0.001
500000500000
     161,344,208 bytes allocated in the heap
   1,533,989,880 bytes copied during GC
      65,033,816 bytes maximum residency (54 sample(s))
         776,072 bytes maximum slop
              98 MB total memory in use (0 MB lost due to fragmentation)

                                     Tot time (elapsed)  Avg pause  Max pause
  Gen  0       266 colls,     0 par    0.090s   0.094s     0.0004s    0.0010s
  Gen  1        54 colls,     0 par    2.510s   2.587s     0.0479s    0.1063s

  INIT    time    0.000s  (  0.001s elapsed)
  MUT     time    0.049s  (  0.053s elapsed)
  GC      time    2.600s  (  2.681s elapsed)
  EXIT    time    0.000s  (  0.000s elapsed)
  Total   time    2.650s  (  2.734s elapsed)

  %GC     time      98.1%  (98.0% elapsed)

  Alloc rate    3,264,689,261 bytes per MUT second

  Productivity   1.9% of total user, 1.8% of total elapsed


Now update the program using bangpattern, and rerun the profiling.

listUtil.hs
{-# LANGUAGE BangPatterns #-}
sumList :: [Integer] -> Integer
sumList xs = sumOfEle xs 0

sumOfEle :: [Integer] -> Integer -> Integer
sumOfEle [] acc = acc
sumOfEle (x:xs) !acc = sumOfEle xs (acc+x)

main = print (sumList [1..1000000])


Re run profiling again.
$ ./listUtil +RTS -s -h -i0.001
500000500000
     128,051,920 bytes allocated in the heap
         157,128 bytes copied during GC
          44,312 bytes maximum residency (34 sample(s))
          21,224 bytes maximum slop
               1 MB total memory in use (0 MB lost due to fragmentation)

                                     Tot time (elapsed)  Avg pause  Max pause
  Gen  0       213 colls,     0 par    0.000s   0.000s     0.0000s    0.0000s
  Gen  1        34 colls,     0 par    0.002s   0.005s     0.0002s    0.0035s

  INIT    time    0.000s  (  0.001s elapsed)
  MUT     time    0.029s  (  0.032s elapsed)
  GC      time    0.002s  (  0.006s elapsed)
  EXIT    time    0.000s  (  0.000s elapsed)
  Total   time    0.032s  (  0.038s elapsed)

  %GC     time       6.2%  (15.4% elapsed)

  Alloc rate    4,342,656,763 bytes per MUT second

  Productivity  93.5% of total user, 79.7% of total elapsed


Following table summarizes the memory usage.

Name
With out Bang pattern
With Bang pattern
Bytes allocated in the heap
161,344,208
128,051,920
Bytes copied during GC
1,533,989,880
157,128
Bytes maximum residency
65,033,816
44,312
Bytes maximum slop
776,072
21,224
Total memory in use
98 MB
1 MB







Previous                                                 Next                                                 Home

No comments:

Post a Comment