Weird Binding Stuff
May 22, 2012 1 Comment
Voice Over This man is Ernest Scribbler… writer of jokes. In a few moments, he will have written the funniest joke in the world… and, as a consequence, he will die … laughing.
Ernest stops writing, pauses to look at what he has written… a smile slowly spreads across his face, turning very, very slowly to uncontrolled hysterical laughter… he staggers to his feet and reels across room helpless with mounting mirth and eventually collapses and dies on the floor.
Summary: id(), copy, copy.copy(), copy.deepcopy()
A short tutorial this week on some oddities with the way Python stores and references (“binds to”) data. You might remember, a long time ago, we talked about how, when we store data in a variable, it’s like putting your stuff in a bucket so that you can access it later. Variables in Python actually turn out to be references to objects. A side effect of this is that, in some cases, Python doesn’t work out how you think it will – typically this is where your object is a list or dictionary (actually any object, but you only notice this effect with compound objects) that you think you have copied, but you actually haven’t.
In particular, you can do this with ‘plain’ variables:
>>> a = 5 >>> b = a >>> a = 6 >>> b 5
You can also do this with lists:
>>> c = [1,2] >>> d= c >>> c =[5,6] >>> d [1, 2]
But there's a gotcha with lists where you change one of the list's entries:
>>> c= [1,2] >>> d = c >>> d [1, 2] >>> c=3 >>> d [3, 2]
Can you see that, even though we only changed the first entry in the list c (that is, c), the first entry of d has also changed? That's because there is an underlying list object that both c and d are pointing to. That is, they are both pointing to the same thing. In a sense they are both windows to the same room (the list object). Looking in either window allows you to "see" the changes made in the room. You can see that the objects are the same because you can check their location in memory using the id() function which is built in to Python (try help(id)):
>>> id(c) 139636641421288 >>> id(d) 139636641421288 >>> id(c) == id(d) True
The number (139636641421288) is where in the computer's memory the object is stored. It will change, probably each time you run the program. If we assign a different list to d, it will have a different id, even though the values in the list are the same:
>>> d = [3,2] # note this new list has the same values as the old one >>> id(c) == id(d) False >>> id(d) 139636640593824 >>>
We can see that this other list is stored in a different location because the id() of the lists is different. It turns out that this referencing behaviour is actually what you want to happen in most cases. However, every so often you want your lists to be separate. For that there is a special module called copy. The copy module has a method (also called copy) which allows you to copy across the values of an object, rather than simply referencing (called "binding") to an existing object:
>>> import copy >>> d = copy.copy(c) >>> d [3, 2] >>> c=1 >>> d [3, 2] >>> c [1, 2]
When you use copy.copy() the two objects will be separate and can be used independently. Changes to one won't show up in the other. Where a compound object like a list or a dictionary has values which themselves are compound objects - for example a list where each entry in the list is itself a list - use the copy.deepcopy() method. Depending on the complexity of your objects deepcopy() is not guaranteed to work (objects which refer to themselves somehow can cause a problem), but generally you will be fine.