Understanding a Python Closure Oddity

9 days ago (utcc.utoronto.ca)

Typical workaround:

  def loop():
     for number in range(10):
        def closure(number=number):
            return number
        yield closure

Many languages have surprising behaviour when closing over iteration variables.

That's why Javascript introduced the "let" keyword years ago

This logs from 0 to 9:

for(let i = 0; i < 10; ++i;) setTimeout(() => console.log(i));

This logs from 10, 10 times: for(var i = 0; i < 10; ++i;) setTimeout(() => console.log(i));

  • Yep, with exactly the same semantics as perl5's 25+ years old 'my' -

      foreach my $x (@input) {
        push @callbacks, sub { $x };
      }
    

    works how you'd expect.

    I was spoiled by this and driven nuts by function level scoping like python's and javascript's 'var', and the adoption of 'let' made me very happy.

    My current annoyance with JS is its lack of -

      my $foo = do {
        ...;
        ...;
        <expression>
      };
    

    but there's a proposal for that, and the 'match' proposal (which would be glorious) depends on it on the basis it's a shoo-in, so I am very hopeful.

    Note that I know:

      let foo; {
        ...;
        ...;
        foo = <expression>
      }
    

    already works but it's Not The Same (although significantly less upsetting than (() => { ... })() is ;).

  • Do you think you could elaborate on why the results differ? I'm somewhat unfamiliar with how JavaScript works.

    • 'var' is JavaScript's older variable declaration construct. Variables created this way are live from the beginning of the function that contains them (or globally if there isn't one). So a block with braces (such as you'd use for a for or while loop body) doesn't actually restrict the scope of var `v` below:

          console.log(v); // <-- v is a variable here, we can access its value even though it is only declared below
          // prints 'undefined'
          {
              var v = 1;
              console.log(v); // prints 1
          }
          console.log(v); // prints 1
      

      You used to (and might still) see a workaround to recover more restrictive scoping, known as the "IIFE" (Immediately Evaluated Function Expression): (function () { var v; ... code using v here ... })() creates a function (and thus a more restrictive scope for v to live in) and evaluates it once; this is a sort of poor man's block scoping.

      `let` and `const` were created to fill this gap. They have block scope and are special-cased in for loops (well, `let` is; you can't reassign to a `const` variable so nontrivial for loops won't work):

          console.log(l); // <-- throws ReferenceError: l is not defined
          {
              // pretend the console.log line above is commented out, so we can reach this line
              let l = 1;
              console.log(l); // prints 1, as expected
          }
          console.log(l); // throws ReferenceError: l is not defined
          // ^^ l was only in scope for the block above
      
          
      

      The interaction with `for` is explained well on MDN, including the special casing: https://developer.mozilla.org/en-US/docs/Web/JavaScript/Refe...

      "More precisely, let declarations are special-cased by for loops..." (followed by a more detailed explanation of why)

      See also https://developer.mozilla.org/en-US/docs/Web/JavaScript/Refe... and https://developer.mozilla.org/en-US/docs/Web/JavaScript/Refe...

      1 reply →

Further proof that Python is not fit for functional programming and that Python's namespaces and default lookup by reference are confusing.

In safe languages like ML you would have to create an explicit reference to get that effect, and then it is obvious to the reader.

Python is not built for complex abstractions, but unfortunately it is (ab)used for that, particularly in machine learning.

  • No. The code is completely consistent: the closures are different instances of a function that returns the value of 'number' at the moment it gets called. Not the value of 'number' when it gets created. And it gets called differently when a list comprehension calls it each step of the loop or when the list function loops completely through and then the list comprehension calls the instances. Nothing to see here.

    • > No. The code is completely consistent: the closures are different instances of a function that returns the value of 'number' at the moment it gets called. Not the value of 'number' when it gets created.

      Isn't that exactly what GP said? And gave an example how to create that effect explicitly in functional languages, using refs?

      1 reply →

    • Naively I would expect that each iteration of a for loop creates a new loop variable (all with the same name, but effectively each in their own scope) and so each closure holds a reference to a variable named number whose value never changes, and there is a distinct one of these number variables per closure .

      1 reply →

    • It's consistent, but unexpected. Especially if you don't know python well enough. And it leads to ugly effect sometimes, even for experienced pythonistas.

      1 reply →

  • If you want to "close over" the value in the closure then there's a simple trick you can use:

    Pass a keyword argument with the default value equal to the value you want to close over.

    Voila!

    • Functions.partial can also supply captures in that way, and is slightly more obvious (and less likely to generate linter complaints if the kwarg is mutable).

      1 reply →

  • all one needs to understand is that there are names that refer to objects in Python. Nothing is passed by value.

    The closure returns whatever object the name refers at the moment. That is all.

        def loop():
            for i in range(10):
                 yield lambda j=i:  j
    

    `j` refers to the same object (int) `i` refers to at the time of the lambda creation (the desired result). `j` is local to the lambda (each lambda has its own local namespace).

    Just `lambda: i` code would result in an object that `i` refers to at the time of calling the lambda (`i` name is not local to the lambda).

    https://docs.python.org/3/faq/programming.html#why-do-lambda...

This is less wat. when you write it like

   eager:
   for item in loop():
      item()

   lazy:
   items = []
   for item in loop():
      items.append(item)

   for item in items:
       item()

Each next() into loop increments a shared counter so when you print it out as you go it 1, 2, 3 but when you print it out once you've called it a bunch is 9, 9, 9.

You can put an extra 'level' in, to make both options return the same 0,1...9

    def loop():
        for number in range(10):

            def outer(n):
                def inner():
                    return n

                return inner

            yield outer(number)

Is there a neater way?

  • Yes. The usual way to do this is to bind them as defaults to an argument, for example:

        def loop():
            for number in range(10):
    
                def func_w_closure(_num=number):
                    return _num
    
                yield func_w_closure
    

    This works because default arguments in python are evaluated exactly once, at function definition time. So it's a way of effectively copying the ``number`` out of the closure[1] and into the function definition.

    [1] side note, closures in python are always late-binding, which is what causes the behavior in OP

    • I’ve never thought that leaking this type of implementation detail into the return value (and return type!) was a nice solution. I like the double closure better, and one can shorten it a bit with a lambda.

      For those who prefer a functional style, functools.partial can also solve this problem.

      (I use Python, and I like a lot of things about Python, but I don’t like its scoping rules at all, nor do I like the way that Python’s closures work. I would use a double lambda.)

  • I would have just done something like

    ``` def loop(): for number in range(10): fixed_number = number def inner(): return fixed_number yield inner ```

    • Correctly formatted (two spaces preceding each line, one blank line before the first code line, no extra lines needed between code lines):

        def loop():
          for number in range(10):
            fixed_number = number
            def inner():
              return fixed_number
            yield inner

      The output:

        eagerly = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
        lazily  = [9, 9, 9, 9, 9, 9, 9, 9, 9, 9]
      

      It doesn't work because of the way Python's variables are scoped. Your fixed_number variable is still shared across all instances of inner. Python doesn't have any sort of block scoping like you seem to think it has.

I'll give you one better:

    >>> strings = tuple((lambda: f'{i}') for i in range(10))
    >>> strings[1]()
    '9'

The f-strings are supposed to be evaluated at... what time exactly?

And some more stuff:

    >>> class Elephant:
    ...   def __init__(self, i):
    ...     self.i = i
    ...   def __call__(self):
    ...     return self.i
    ... 
    >>> elephants = tuple(Elephant(i) for i in range(10))
    >>> elephants[0]()
    0
    >>> elephants[5]()
    5

  • The answer is simple. f strings are evaluated at run time. The behaviour is correct. Up until you call the lambda, they are just a function definition.

    What's so interesting about the Elephant case? You store the value of i at run time, so it can be accessed later. lambda doesn't have that capability, as it's inheriting i from the block.

    • > The answer is simple. f strings are evaluated at run time.

      You didn't understand the question. Everything in Python is evaluated at runtime. Bytecode is just a form of caching.

      > The behaviour is correct.

      Python doesn't have a standard. Nobody knows what behavior is correct. That's the only behavior Python has, so, again, calling it correct is just not saying anything.

      > What's so interesting about the Elephant case?

      This class, in principle, shouldn't be different from lambda from the previous example, yet it acts differently.

      You don't need to lecture me on why things behave the way they are. Since I found these examples, you can trust me, I know why they behave like this. The idea is to drive your attention to the inconsistencies that a "naive" reader would discover.

      4 replies →

but is it a closure oddity? Looks to me like it's a generator/comprehension oddity :)

  • It’s a closure oddity. The closure captures a reference to the variable, not the value of the variable at the time of its creation. Given that, the results are entirely predictable.

    • If we're really going to argue, I'd argue for a 'for' oddity. The for statement could've been defined with a fresh binding per iteration instead of a mutation.

      10 replies →