Saturday, 30 April 2016

How Lazy evaluation works in Haskell


In this post, I am going to explain how Haskell programs are executed on a machine. By knowing the execution sequence on a machine, you can able to write efficient programs. As a functional programming language, all Haskell implementations are lazy. In the following sections, I am going to explain about how the reduction of expression performed, Normal form, lazy evaluation etc.,

Reductions
Functions are the first class entities in Haskell. Suppose, you want to implement (a+b)2 function, your code looks like below.
square x = x * x
twiceAB a b = 2 * a * b
aPlusbSquare a b = (square a) + (square b) + (twiceAB a b)

aPlusbSquare a b = (square a) + (square b) + (twiceAB a b)

aPlusbSquare 2 3 = (square 2) + (square 3) + (twiceAB 2 3)
   = (2 * 2) + (square 3) + (twiceAB 2 3)
   = 4 + (square 3) + (twiceAB 2 3)
   = 4 + (3 * 3) + (twiceAB 2 3)
   = 4 + 9 + (twiceAB 2 3)
   = 4 + 9 + (2 * 2 * 3)
   = 4 + 9 + (4 * 3)
   = 4 + 9 + 12
   = 13 + 12
   = 25


As you observe, above evaluation sequence, every reduction replaces a sub expression called reducible expression (redex). An expression without reducible expression is said to be in normal form.

Following are standard definitions
a.   redex: Reducible Expression
b.   leftmost redex : is the one whose abstraction is textually to the left of all other redexes.
For example, in the exporession (x+2)(y+10), left most redex is (x+2)


c.    rightmost redex : is the one whose abstraction is textually to the right of all other redexes.
For example, in the exporession (x+2)(y+10), left most redex is (y+10)

d.   Innermost Redex: is one that contains no other redex.
e.   Outermost Redex: is one that is contained within no other redex.

For example, In the expression x * ((x+ (y * (z + 10)))), Innermost Redex is z+10, outermost exprsession is x * (....)

Reduction strategies
Reduction strategy specifies a way that a complex expression is reduced to a simple expression by successive reduction steps. Following are the popular reduction strategies.

Strategy
Description
Full beta reductions
Any redex can be reduced at any time. It don’t follow any particular reduction strategy.
Applicative order
The rightmost, innermost redex is always reduced first. In case of functions evaluation, function arguments are evaluated first.
Normal order
The leftmost, outermost redex is always reduced first.
Call by name
As normal order, but no reductions are performed inside abstractions. The arguments to a function are not evaluated before the function is called — rather, they are substituted directly into the function body
Call by value
Only the outermost redexes are reduced. A redex is reduced only when its right hand side has reduced to a value. The argument expression is evaluated, and the resulting value is bound to the corresponding variable in the function
Call by need
Perform reduction only when it is needed, also called as lazy evaluation.

Is reduction strategies matter?
Yes, of course reduction strategies matter.

For example, will try to evaluate the expression ‘fst (square 10, square 20)’ using innermost, outermost reduction strategies.

If you use innermost reduction strategy, the evaluation looks like below.

fst (square 10, square 20)    = fst (square 10, 20 * 20)
                                             = fst (square 10, 400)
                                             = fst (10*10, 400)
                                             = fst (100, 400)
                                             = 100
If you use outermost reduction strategy, the evaluation looks like below.

fst (square 10, square 20)    = square 10
                                             = 10 * 10
                                             = 100

In above scenario, outermost reduction takes lesser reductions that innermost reduction strategy, it is because, fst function doesn’t require second argument evaluation.

Some times innermost reduction strategy end up in performing unnecessary calculations, which are not required to get final result.

For example,
infLoop = 1 + infLoop
fst (42, infLoop)

If you use innermost reduction strategy, system endup in processing infLoop, which is not at all required to get the first element of a tuple.

That means outermost reduction better than innermost reduction strategy?
No, it depends on the expression.

For example,
cube x = x * x * x

Using innermost reduction strategy
cube (1+2+3)      = cube (1 + 5)
                           = cube (6)
                           = 6 * 6 * 6
                           = 36 * 6
                           = 216
Innermost reduction strategy requires 5 reduction steps to evaluate the expression.

Using outermost reduction strategy
cube (1+2+3)      = (1 + 2 + 3) * (1 + 2 + 3) * (1 + 2 + 3)
                           = (3 + 3) * (1 + 2 + 3) * (1 + 2 + 3)
                           = 6 * (1 + 2 + 3) * (1 + 2 + 3)
                           = 6 * (3 + 3) * (1 + 2 + 3)
                           = 6 * 6 * (1 + 2 + 3)
                           = 36 * (1 + 2 + 3)
                           = 36 * (3 + 3)
                           = 36 * 6
                           = 216
Outermost reduction strategy requires 9 reduction steps to evaluate the expression; it is because we are performing duplicate reduction ‘(1 + 2 + 3)’.

We can solve above problem of outermost reduction using outermost graph reduction technique.

Outermost graph reduction
Outermost graph reduction = Reduction + Sharing

In outermost graph reduction, each argument is evaluated at most once.

(1 + 2 + 3)          = (3 + 3)
                           = 6

cube (1 + 2 + 3)  = (1 + 2 + 3) * (1 + 2 + 3) * (1 + 2 + 3)  
                           = 6 * 6 * 6
                           = 36 * 6
                           = 216

It evaluates the argument (1 + 2 + 3), once and reuses it in other places. As I said Haskell uses lazy evaluation (don't do anything unless you have to.), lets see how lazy evaluation works in Haskell.

How Lazy evaluation works in Haskell?
Before going to explain about lazy evaluation, I want to explain about some examples of strict evaluation, which is opposite to lazy. In strict evaluation strategy, arguments to a function are evaluated before calling the function. You may ask me, what is wrong in this approach. For example, Observe following Java language snippet.
// Perform some calculation and return the result
long getData() {
 int data = 0;
 for (int i = 0; i < 1000000; i++) {
  data = data + rand.nextInt();
 }
 return data;
}

boolean isDataValid(boolean b, long data) {
 return b && (data > 1000000);
}


getData()’ function perform some calculation and return some value. ‘isDataValid’ function takes two arguments and return whether data is valid (or) not.

Suppose if you call the method ‘isDataValid(false, getData());’, Do you think is it really required to evaluate the function getData(), no it is not, Since first argument to the function isDataValid is false, false && getData() is any way false, we no need to evaluate. But when you run above snippet in strict languages like Java, getData () function will be evaluated before calling isDataValid(false, getData()). In case of functional programming, getData() will not be evaluated at all.

Lazy evaluation is an execution strategy, which delays the evaluation of an expression until the value is needed and which also avoids repeated evaluations of same expression (I discussed about this in Outermost graph reduction).  

Technically Lazy evaluation = call-by-name +  Sharing.

Sharing means if same expression is repeated in multiple places, the value of the expression is calculated on demand, saved the result and the result is used in multiple places whenever required (We are eliminating reevaluating the same expression).

Prelude> let fun x y = x
Prelude> fun 10 [1..]
10

fun x y = x : It simply returns the value of x, So when I called ‘fun 10  [1..]’ it return 10 immediately, It never evaluate the second argument [1..], which is an infinite list.

When evaluating fun 10 [1..], the second argument will simply be packaged up into a thunk without doing any actual computation, and fun will be called immediately. Since fun never uses its second argument the thunk will just be thrown away by the garbage collector.

What is thunk?
Thunk represents a value that yet to be evauated. Haskell don’t evaluate thunk unless it has to.

Prelude> let fun x y = x
Prelude> fun 10 [1..]
10

In above case thunk represents [1..], since it is not used any where, Haskell don’t evaluate this infinite list.

Problem with thunks
If you keep building complex expressions to solve later, it keeps on consuming memory, obviously affects performance and you may end up in memory overflow error.

foldl (+) 0 [1, 2, 3, 4]
==> foldl (+) (0 + 1) [2, 3, 4]
==> foldl (+) ((0 + 1) + 2) [3, 4]
==> foldl (+) (((0 + 1) + 2) + 3) [4]
==> foldl (+) ((((0 + 1) + 2) + 3) + 4) []
==> (((0 + 1) + 2) + 3) + 4
==> ((1 + 2) + 3) + 4
==> (3 + 3) + 4
==> 6 + 4
==> 10

Explicit strictness
To solve above kind of problems, Haskell provides a way to enforce strict evaluation on expressions, instead of maintaining long thunk.

By using bang(!) operator, we can make an argument strict explicitly.

 For example, let me try to show you the memory consumption of a program which calculates the sum of elements of a list.

Without Bang pattern
listUtil.hs
sumList :: [Integer] -> Integer
sumList xs = sumOfEle xs 0

sumOfEle :: [Integer] -> Integer -> Integer
sumOfEle [] acc = acc
sumOfEle (x:xs) acc = sumOfEle xs (acc+x)

main = print (sumList [1..1000000])


Following commands are used to calculate the memory usage of above program.

ghc listUtil.hs -rtsopts
./listUtil +RTS -s -h -i0.001

$  ghc listUtil.hs -rtsopts
[1 of 1] Compiling Main             ( listUtil.hs, listUtil.o )
Linking listUtil ...
$ ./listUtil +RTS -s -h -i0.001
500000500000
     161,344,208 bytes allocated in the heap
   1,533,989,880 bytes copied during GC
      65,033,816 bytes maximum residency (54 sample(s))
         776,072 bytes maximum slop
              98 MB total memory in use (0 MB lost due to fragmentation)

                                     Tot time (elapsed)  Avg pause  Max pause
  Gen  0       266 colls,     0 par    0.090s   0.094s     0.0004s    0.0010s
  Gen  1        54 colls,     0 par    2.510s   2.587s     0.0479s    0.1063s

  INIT    time    0.000s  (  0.001s elapsed)
  MUT     time    0.049s  (  0.053s elapsed)
  GC      time    2.600s  (  2.681s elapsed)
  EXIT    time    0.000s  (  0.000s elapsed)
  Total   time    2.650s  (  2.734s elapsed)

  %GC     time      98.1%  (98.0% elapsed)

  Alloc rate    3,264,689,261 bytes per MUT second

  Productivity   1.9% of total user, 1.8% of total elapsed


Now update the program using bangpattern, and rerun the profiling.

listUtil.hs
{-# LANGUAGE BangPatterns #-}
sumList :: [Integer] -> Integer
sumList xs = sumOfEle xs 0

sumOfEle :: [Integer] -> Integer -> Integer
sumOfEle [] acc = acc
sumOfEle (x:xs) !acc = sumOfEle xs (acc+x)

main = print (sumList [1..1000000])


Re run profiling again.
$ ./listUtil +RTS -s -h -i0.001
500000500000
     128,051,920 bytes allocated in the heap
         157,128 bytes copied during GC
          44,312 bytes maximum residency (34 sample(s))
          21,224 bytes maximum slop
               1 MB total memory in use (0 MB lost due to fragmentation)

                                     Tot time (elapsed)  Avg pause  Max pause
  Gen  0       213 colls,     0 par    0.000s   0.000s     0.0000s    0.0000s
  Gen  1        34 colls,     0 par    0.002s   0.005s     0.0002s    0.0035s

  INIT    time    0.000s  (  0.001s elapsed)
  MUT     time    0.029s  (  0.032s elapsed)
  GC      time    0.002s  (  0.006s elapsed)
  EXIT    time    0.000s  (  0.000s elapsed)
  Total   time    0.032s  (  0.038s elapsed)

  %GC     time       6.2%  (15.4% elapsed)

  Alloc rate    4,342,656,763 bytes per MUT second

  Productivity  93.5% of total user, 79.7% of total elapsed

 
Following table summarizes the memory usage.

Name
With out Bang pattern
With Bang pattern
Bytes allocated in the heap
161,344,208
128,051,920
Bytes copied during GC
1,533,989,880
157,128
Bytes maximum residency
65,033,816
44,312
Bytes maximum slop
776,072
21,224
Total memory in use
98 MB
1 MB



Previous                                                 Next                                                 Home

No comments:

Post a Comment