Saturday, 30 April 2016

Haskell: Lazy Vs Eager evaluation


a. Most traditional programming languages like C, C++, Java use eager evaluation strategy, where an expression is evaluated as soon as it is bound to a variable, argument to a function etc., Lazy evaluation is completely opposite to eager evaluation, where expression is evaluated when it is required.

For example,
let fun x y = x

Above function returns the value of x, it don’t depend on the value y. No point in evaluating the second argument. In case of eager evaluation strategy, both arguments x and y are evaluated, before calling the function. But lazy programming evaluates the value only when it is required, so it don’t evaluate the second argument at all in this case.
Prelude> let fun x y = x
Prelude> 
Prelude> fun 10 20
10
Prelude> fun 10 [1..]
10
Prelude> fun 10 (2^123456789)
10
Prelude> 


Observe above snippet,
fun 10 [1..] : Second argument is infinite list, eager programming languages end up in calculating the infinite list, where as lazy programs return first argument and exit.

fun 10 (2^123456789) : Second argument is 2^123456789, eager programming languages calculate the value 2^123456789 even though it is not required, where as lazy programs return first argument and exit.

b. One advantage of eager evaluation is that it eliminates the need to track and schedule the evaluation of expressions. Where as lazy evaluation needs to maintain track of the expressions. Haskell keep track of the expressions using thunk. I explained about this in my previous post “How Lazy evaluation works in Haskell”.

What is thunk?
Thunk represents a value that yet to be evauated. Haskell don’t evaluate thunk unless it has to.
Prelude> let fun x y = x
Prelude> fun 10 [1..]
10


In above case thunk represents [1..], since it is not used any where, Haskell don’t evaluate this infinite list.

Problem with thunks
If you keep building complex expressions to solve later, it keeps on consuming memory, obviously affects performance and you may end up in memory overflow error.
foldl (+) 0 [1, 2, 3, 4]
==> foldl (+) (0 + 1) [2, 3, 4]
==> foldl (+) ((0 + 1) + 2) [3, 4]
==> foldl (+) (((0 + 1) + 2) + 3) [4]
==> foldl (+) ((((0 + 1) + 2) + 3) + 4) []
==> (((0 + 1) + 2) + 3) + 4
==> ((1 + 2) + 3) + 4
==> (3 + 3) + 4
==> 6 + 4
==> 10


Explicit strictness
To solve above kind of problems, Haskell provides a way to enforce strict evaluation on expressions, instead of maintaining long thunk.

By using bang(!) operator, we can make an argument strict explicitly.

 For example, let me try to show you the memory consumption of a program which calculates the sum of elements of a list.

Without Bang pattern
listUtil.hs
sumList :: [Integer] -> Integer
sumList xs = sumOfEle xs 0

sumOfEle :: [Integer] -> Integer -> Integer
sumOfEle [] acc = acc
sumOfEle (x:xs) acc = sumOfEle xs (acc+x)

main = print (sumList [1..1000000])


Following commands are used to calculate the memory usage of above program.

ghc listUtil.hs -rtsopts
./listUtil +RTS -s -h -i0.001
$  ghc listUtil.hs -rtsopts
[1 of 1] Compiling Main             ( listUtil.hs, listUtil.o )
Linking listUtil ...
$ ./listUtil +RTS -s -h -i0.001
500000500000
     161,344,208 bytes allocated in the heap
   1,533,989,880 bytes copied during GC
      65,033,816 bytes maximum residency (54 sample(s))
         776,072 bytes maximum slop
              98 MB total memory in use (0 MB lost due to fragmentation)

                                     Tot time (elapsed)  Avg pause  Max pause
  Gen  0       266 colls,     0 par    0.090s   0.094s     0.0004s    0.0010s
  Gen  1        54 colls,     0 par    2.510s   2.587s     0.0479s    0.1063s

  INIT    time    0.000s  (  0.001s elapsed)
  MUT     time    0.049s  (  0.053s elapsed)
  GC      time    2.600s  (  2.681s elapsed)
  EXIT    time    0.000s  (  0.000s elapsed)
  Total   time    2.650s  (  2.734s elapsed)

  %GC     time      98.1%  (98.0% elapsed)

  Alloc rate    3,264,689,261 bytes per MUT second

  Productivity   1.9% of total user, 1.8% of total elapsed


Now update the program using bangpattern, and rerun the profiling.

listUtil.hs
{-# LANGUAGE BangPatterns #-}
sumList :: [Integer] -> Integer
sumList xs = sumOfEle xs 0

sumOfEle :: [Integer] -> Integer -> Integer
sumOfEle [] acc = acc
sumOfEle (x:xs) !acc = sumOfEle xs (acc+x)

main = print (sumList [1..1000000])


Re run profiling again.
$ ./listUtil +RTS -s -h -i0.001
500000500000
     128,051,920 bytes allocated in the heap
         157,128 bytes copied during GC
          44,312 bytes maximum residency (34 sample(s))
          21,224 bytes maximum slop
               1 MB total memory in use (0 MB lost due to fragmentation)

                                     Tot time (elapsed)  Avg pause  Max pause
  Gen  0       213 colls,     0 par    0.000s   0.000s     0.0000s    0.0000s
  Gen  1        34 colls,     0 par    0.002s   0.005s     0.0002s    0.0035s

  INIT    time    0.000s  (  0.001s elapsed)
  MUT     time    0.029s  (  0.032s elapsed)
  GC      time    0.002s  (  0.006s elapsed)
  EXIT    time    0.000s  (  0.000s elapsed)
  Total   time    0.032s  (  0.038s elapsed)

  %GC     time       6.2%  (15.4% elapsed)

  Alloc rate    4,342,656,763 bytes per MUT second

  Productivity  93.5% of total user, 79.7% of total elapsed


Following table summarizes the memory usage.

Name
With out Bang pattern
With Bang pattern
Bytes allocated in the heap
161,344,208
128,051,920
Bytes copied during GC
1,533,989,880
157,128
Bytes maximum residency
65,033,816
44,312
Bytes maximum slop
776,072
21,224
Total memory in use
98 MB
1 MB




Previous                                                 Next                                                 Home

No comments:

Post a Comment