Forum Archive

Python Question - class cleanup - file handles etc

Phuket2

Sorry to ask this here, but a lot of talented guys here. What I have been reading on the net seems like mumbo jumbo....Nonsensical! But its possible its true. I would just like a way for an object to be able to do some cleanup after it falls out of scope or at worst when the application finishes. I see the problem with del and garbage collection, ref counts etc. But I want to take action on a specific object/class. Here I am mainly just thinking about closing files, although the subject is broader than that. I have read about weak references , atexit. They appear to have many caveats attached. A context manager's scope appears to be to limited to be useful. i.e I have a object with an open file for the life of the app.
So basically, for a class instance I would like to have a callback, you_are_closing_clean_up_idioit. Yes, you could say, clean up yourself explicitly you idiot. But it just seems so 90's or something. I have struggled with this concept for a long time. I drives me crazy there is no explicit destructor.
I just wanted to ask you guys in the forefront, is there a solution to this problem or a PEP that addresses it. Maybe someone has written some super smart decorator that can facilitate a class cleanup callback, without all the but maybes. I listened to one podcast ' Talk Python To Me' about how far they have come along in regards to the performance issue of removing the GIL. I think they are as close -2x without a GIL now. So in my mind, this deconstructor (instance level) issue seems quite trivial compared to that problem.
Sorry, I know I have been ranting! But I find it so frustrating. I also accept, maybe there is an elegant solution staring me in the face that I just don't comprehend. I see that py3.7 alpha or beta is circulating now. Maybe there is some hope there. I guess if I had my wish del would be instance based, not ref count based. I know they will not change that, too many things would break i guess. But a new magic method, like finalising.
Look, sorry, this not a Pythonista issue, of course. But if anyone has some magic to share, I would love to hear it! Again, i dont think it's unreasonable to think an object/class could be totally self sufficient in regards to cleaning up file handles etc... fingers crossed 🤞

JonB

One approach is to be fastidious in avoiding cyclic references, so that __del__ will work, though that can be difficult.

weakref also has a finalize which lets you avoid the drawbacks with __del__.

mikael

@Phuket2, I would really like a good answer to this as well. JonB already replied, so now the smart people are out of the way, and I can ask: given that you often have UI code, will will_close help?

Phuket2

@mikael , lol. There are still some smart ones around. Every since I started Pythonista (hence Python), @JonB has continually warned me over the dangers of using/depending on del. At the time I found it very difficult to understand. As time has gone on, I have a better idea of why its not a good idea to reply on it. I am not even @JonB assertion about making cyclic refererences would solve then problem, because by then time your del gets called other things may have already fallen out of scope or become unreachable. Also as he suggests avoiding cyclical references in the first place could be different for your average novice like me. I remember @JonB posting some examples in the past that seem innocent enough, but can in fact stop the garbage collector from decrementing the ref count as sometimes it can not be sure if an object can be cleaned up. Look I am sure those really familiar with Python would understand these gotcha's. At it only appears like black magic when you really don't understand all thats going on under the hood. I am not entirely new to Memory management, mind you it was some 30 years ago. I was programming in C on Macs that where using the Motorola 68k chips. memory management was basically handled though the use of handles. The handles would be always be valid, but the data that pointed to could be moved around. If you wanted to access the underlying data you normally would lock the handle and then. Could deference the handle to the data safley. Alternatively you could just double dereference the handle to the ptr and work with the ptr. This would work as long as you didn't call an Apple API/Toolbox call that could move memory(These were all document). I don't know if you think that sounds complicated or not, I used to find it ok. I never had to do a lot debugging looking for memory leaks or dangling pointers. Anyway, my point is that even though it was 30 years ago, my memory (no pun intended) is still fairly fresh on it.
Sorry, this is turning into a novel. Regards to your comment, about will_close from the ui, would definitely a way to help you clean up things. But its just a bandaid and ties your classes behaviour to a ui element, which cant be good. Of course the class could just have an explicit close() or clean_up() method that you call. Kill me now, but I just dont like I have to do that. Seems ugly. I was just writing a test catalog database with TinyDB the other day, The init takes a filename and the object and the file is open and ready to do adds/updates/deletes etc... And lets say I wanted to share that as a module for others to use. This is when I started looking at this clean up issue again. I thought, maybe that I am a little further on I might be about to find a way to do it. Rather than telling the user, you must call close otherwise your data may not get written to disk or the file might get corrupted. BTW, this does not seem to happen if I dont close the file. But you could imagine that it could. BTW, there maybe some magic I just dont know about.
Sorry, still going :)
So brings me to the point, why I dont hear about discussions about adding a real destructor to Python Classes. I feel stupid, like I am the only one in the world that doesn't get it. It appears everyone else (i mean my own little world I get to examine) is ok with the fact there is not a destructor.
I am only really bold enough to bring it up after to listening to vids/podcasts about how they are trying to fix/reimagine the GiL, how typed data is started to be used in different CPython implementations, how they changed the undying dict algorithm in py3.6 and a host of other complicated things. I would have thought a proper deconstructor would be high on the list of things to tackle. Ok, thats my Monday afternoon rant. Just something I find personally frustrating. But I am glad I got that off my chest :) Have a good day all

technoway

One way to do cleanup an object when it leaves scope is using the following class pattern and implement the __init__, __enter__, and __exit__ methods.
This script just prints some text. It's just to show how the __enter__ method is used in a with statement, and how the __exit__ method is called automatically when leaving scope.

from myclass import MyClass

class MyExampleClass:
    """ Class to demonstrate cleanup when leaving scope. """

    def __init__(self):
        """ Initialize this class instance. """
        # self.my_obj doesn't have to be assigned to a class instance.
        # It can any other python type.
        self.my_obj = MyClass()

    def __enter__(self):
        """ This method is called in a "with" statement line to return
            this instance.
        """
        return self

    def __exit__(self):
        """ This method is called when this class instance leaves the
            scope in a "with" statement.
        """
        self.my_obj.do_cleanup()
        self.my_obj = None

    def do_cleanup(self):
        """ Do something. """
        print('cleaning up.\n')

    def do_something(self):
        """ Do something. """
        print('doing something.\n')

    def do_something_else(self):
        """ Do something else. """
        print('doing something else.\n')

if __name__ == "__main__":
    # Instead of the usual:  "obj = MyClass()", do this:
    with MyClass() as obj:
        obj.do_something()
        obj.do_something_else()
    # Leaving scope obj.__exit__() is automatically called, and
    # the __exit__ method calls self.do_cleanup()

Of course, you could use del in the do_cleanup() method, but I would typically just set the the variable to None and let the garbage collector handle cleaning up the memory for the object when it either needs to to allocate more memory, or it decides that it's time.

Adding the __enter and __exit__ methods doesn't preclude using the class without a "with" statement, but then you'll have to explicitly call the do_cleanup method when you're done with the object, i.e.:

obj = MyClass()
obj.do_something()
obj.do_something_else()
obj->do_cleanup()

The __enter and __exit__ methods are implemented in Python file objects, so you can write:

with open('filename.ext', 'w') as outfile:
    outfile.write('Do not read this.\n')
    outfile.write('Do not read this either.\n')
    outfile.write('Stop it!\n')
# outfile.close() is called here after leaving the scope above.

Unfortunately, this only works with a with statement, so leaving the scope of an if-statement or a while loop won't work. However, at the cost of extra indentation, you can add a with statement almost anywhere in a function or method.

JonB

This is where i always end up, maybe because it is the first google hit.

One of the comments leads here whch has code for a wrapper that adds a weakref destructor that calls close() on the stream.

This led me to try a few more ideas. Basically:
1) A metaclass that automatically calls close on any attributes that have a close method. So if you have a class that has a file object, and a db object that are open, close() gets called on those objects when the weakref finalizer is called.
The problem with that approach is that you don't have much control, and error handling might be tricky (what if an object is already closed).
2) A meta class that requires a class to explicitly define a _closing_fcn_, which returns a function that does the cleanup. The trick is that the returned function cannot reference self at all, to avoid cyclic refrences.

def _closing_fcn_(self):
   db=self.db
   def close():
      #gets called when self dies
      db.close()
   return close

3) Finally, it is possible to do this all without a metaclass, just be defining this stuff in init, hence the manual method. This is the most compact, but hard to remember
class TestManual(object): def __init__(self): self.stream=SomeStream() stream=self.stream def close(): stream.close() self._finalizer_=weakref.finalize(self, close)
Honestly, i was a little surprised that storing the ref to the weakref inside the object itself would work, since it doesnt get called until the obj is already dead, but it does. But you have to take care not to use self in the finalizer method.

zrzka

Cleanup

Theory

Context manager

One way to clean up resources is context manager. Here's an example of class based context manager.

class File:
    def __init__(self, path, mode = 'r'):
        self.path = path
        self.mode = mode
        self.handle = None

    def __enter__(self):
        self.handle = open(self.path, self.mode)
        return self.handle

    def __exit__(self, *args):
        self.handle.close()


with File(__file__) as f:
    # __file__ - contains script name, basically opens itself
    # __enter__ was called, file is opened and handle is stored in f (== File.handle)
    print(f.read())
    # __exit__ will be called

Shorter way to achieve same thing is contextmanager decorator. Example:

from contextlib import contextmanager

@contextmanager
def file(path, mode = 'r'):
    handle = open(path, mode)
    yield handle  # yields handle -> as f and the nested with block is executed
    handle.close()

with file(__file__) as f:
    print(f.read())

There's also contextlib.ContextDecorator, which allows you to use it as a function decorator or you can use it with with statement. If you don't know yield, search for Python generator for example. Head over to the contextlib documentation to learn more. Also check contextlib.closing if the only thing you would like to do is to call close().

try / ... / finally

One way is to use finally:

handle = None
try:
    handle = open(__file__, 'r')
    print(handle.read())
finally:
    if handle:
        handle.close()

Hmm, what if we do not want to catch handle.read() exception for example? else is our friend:

handle = None
try:
    handle = open(__file__, 'r')
except OSError:
    # Executed if try: raises OSError
    print('Failed to open file')
else:
    # Executed only and only if try: doesn't raise exception
    print(handle.read())
finally:
    # Always executed
    if handle:
        handle.close()

Read Defining Clean-up Actions to learn more.

finalizer

Another way to clean up is to use weakref.finalize. Here's an example:

class TinyDBWrapper:
    def __init__(self, path):
        self._db = TinyDB(path)
        self._finalizer = weakref.finalize(self, self._db.close)

Finalizer is fired when:

  • object (TinyDBWrapper in this case) is garbage collected,
  • program exits.

What if you'd like to provide close method too? You can add it in this way:

class TinyDBWrapper:
    def __init__(self, path):
        self._db = TinyDB(path)
        self._finalizer = weakref.finalize(self, self._db.close)

    def close(self):
        self._finalizer()

And now finalizer is fired when:

  • object (TinyDBWrapper in this case) is garbage collected,
  • TinyDBWrapper.close is called,
  • program exits.

del

And here comes the never ending story of del, __del__, ... What's the problem with __del__?

  • You're responsible for calling any __del__ in superclass
  • Exceptions raised in __del__ are ignored
  • __del__ is called when GC collects objects and not when the last reference is lost
  • Prior Python 3.4, __del__ prevents collection of cyclic links (see PEP442), even >= 3.4 has still some issues though
  • It's not guaranteed that __del__ is called when the interpreter exits.

Head over to object.del(self) to learn more.

What's the problem with del?

People think that del is equivalent of GC - collect me now - but it isn't. del removes binding and decreases reference counter. That's all what it does. When the object is GCed? It's implementation specific and it depends on GC scheme. CPython collects it when reference counter reaches zero. But other implementations can do it later.

You want to rely on this? DB wrapper where you have no idea when the close will be called (if ever). Data loss? Here's my rule of thumb - hard to explain? Bad idea.

Practically

Simplicity

And lets say I wanted to share that as a module for others to use. This is when I started looking at this clean up issue again. I thought, maybe that I am a little further on I might be about to find a way to do it. Rather than telling the user, you must call close otherwise your data may not get written to disk or the file might get corrupted.

To make it short - you're trying to be too smart with your module. What you should do is to provide simple API, which allows to:

  • explicitly open catalog (can be __init__),
  • explicitly close catalog (like close method),
  • use your wrapper as context manager with with.

And leave the rest on the developer. You have no idea how your module will be used, when it will be used, if it's really required to keep the wrapper around for the whole lifecycle of the application, ...

If you'd like to help, add finalizer if consumer forgets to call it, but I wouldn't do it, close is enough. This is not kindergarten :)

will_close

@mikael asked about will_close. Don't use will_close for stuff like closing streams, databases, ... This should be done on lower level. What if will_close is not going to be called? It's like depending on __del__ and you don't know when and if it will be called.

How to check if finalizer was called

#!python3

import weakref
from tinydb import TinyDB


class TinyDBWrapper:
    def __init__(self, path):
        self._db = TinyDB(path)
        self._finalizer = weakref.finalize(self, self._db.close)

    def close(self):
        self._finalizer()


# Instantiate wrapper
wrapper = TinyDBWrapper('aa.json')
# Cheat, get reference to finalizer
fin = wrapper._finalizer
# Trash the wrapper
wrapper = None  # or replace with `del wrapper`
# Get finalizer status, if alive is False, finalizer was already called
print(fin.alive)  # prints False, because of CPython, ref count == 0, ... can differ on desktop / other implementations ...

Conclusion

Yeah, I know, I'm too pedantic sometimes. But simple stuff works, doesn't require you to write long documentation, explanation, ... And it's good. Remember my rule of thumb? Hard to explain? Then it's a bad idea.

Just provide a way how to do it explicitly and leave the rest on the developer. He has lot of ways (context manager, ...) how to handle this and only he knows how / when your module is going to be used. You don't know it. And you can't prepare your module for all possible use case scenarios.

All these examples were written in Python 3 and tested in Pythonista 311015 (latest beta).

Phuket2

Guys, thanks for the excellent answers. I have started testing a few of the things proposed here. But its a lot to take in for me, so it will be a few days before I get back to this thread. I want to understand what is happening. I have already had inconsistencies using weakref.finalize, but i need to try to understand why first.