Forum Archive

Script crashes when making bulk upload into local server (Synology NAS)

aldoblack

I am writing a script to upload files and photos into my Synology NAS server and during testing the script crashes after a period of time. I have around 1400 photos in my phone and it always crashes between 900-1000. At first I thought that it might be some corrupt file that makes this crash and tested starting from file 800 and onwards but it does not crash.

Any idea what causes this crash when trying to upload files between 900-1000? There are no error codes, the app just closes.

ccc

Pythonista might not be deleting memory that you are freeing up. You might add a deliberate del variable_name to suggest that the garbage collector free up large memory allocations like images and videos.

aldoblack

@ccc tried but still crashes. This is the code:

import appex
import photos
import requests
import os
import editor
import sys
from objc_util import *

from datetime import datetime


class Synology(object):
    def __init__(self, url, port, account, passwd):
        """

        :param url: IP/URL or Synology. So far it only supports HTTP.
        :param port:
        :param account:
        :param passwd:
        :returnv:
        """
        self.url = url
        self.port = port

        self.account = account
        self.passwd = passwd

        self.sid = None

        # self.session = requests.Session()

    def login(self):
        print("Logging in...")

        param = {
            "version": "2",
            "method": "login",
            "account": self.account,
            "passwd": self.passwd,
            "session": "FileStation",
            "format": "cookie"
        }

        url = f"http://{self.url}:{self.port}/webapi/auth.cgi?api=SYNO.API.Auth"

        r = requests.get(url, param, verify=False)
        r = r.json()

        if not r["success"]:
            raise Exception("Authenticatoin failed. Note that account with 2FA enabled does not work.")

        print("Logging successful.")

        self.sid = r["data"]["sid"]
        return self.sid

    def upload_file(self, files):
        upload_url = f"http://{self.url}:{self.port}/webapi/entry.cgi?api=SYNO.FileStation.Upload&version=2&method=upload&_sid={self.sid}"

        args = {
            "path": "/home/b",
            "create_parents": "true",
            "overwrite": "true"
        }

        # Ignpre this ugly code. I am trying to figure out if I can do bulk upload.
        file = {'file': (files[0]["fname"], files[0]["payload"], 'application/octet-stream')}
        # file = [('files', (x["fname"], x["payload"], "application/octet-stream")) for x in files]

        r = requests.post(upload_url, data=args, files=file, verify=False)

        del upload_url
        del args
        del file
        del r
        del files



def main():
    # if not appex.is_running_extension():
    #   print("Sript is intended to be un from sharing extension.")
    #   return

    print("Starting...")
    s = Synology(url="1.1.1.1", port="1000", account="1", passwd="1")
    s.login()


    startTime = datetime.now()

    all_photos = photos.get_assets()

    # images = []

    for idx, asset in enumerate(all_photos):

        if idx % 100 == 0:
            print(f"Uploading... {idx}/{len(all_photos)}")

        #fname = str(ObjCInstance(asset).valueForKey_('filename'))
        #payload = asset.get_image_data(original=True)

        # images = [{"fname": fname, "payload": payload}]

        s.upload_file([{"fname": str(ObjCInstance(asset).valueForKey_('filename')), "payload": asset.get_image_data(original=True)}])
        del asset
        # print(fname)

    print(datetime.now() - startTime)

if __name__ == "__main__":
    main()
JonB

I believe you will need to do the following:
1) ObjCInstance(asset) should be on its own line, assigned to a variable, so that you can del the instance after it is created. You might also need to delete the instance method (if I recall, the release method has a reference cycle of some sort).

2) when dealing with many images, you MUST have an with objc_util.autorelease_pool(): wrapping the code that grabs the asset and data, otherwise the asset won't actually get released (even if you del it).

JonB

Aha,
https://forum.omz-software.com/topic/3844/photos-asset-get_image_data-is-there-a-memory-leak/6

I am not on my iPad, so not sure if this but was ever fixed.... But when the ObjCInstance is retained, the retain cached method creates a reference cycle. It is necessary to modify objc_util to avoid the leak by deleting the cached methods of the objc instance.

So,what I think you need to do is:
A) wrap your upload line in an autorelease_pool() context handler
B) inside that handler, also add:

ObjCInstance(asset)._cached_methods = {} 
del asset # probably not needed

I think it is possible that making the change to your objc_util.py effectively does the same thing, but I have not tried it recently.

JonB

ok. turns out that autoreleasepool() does not work -- it works once, but not upon repeated script run. This appears to be fairly robust, and does not leak:


from objc_util import *
#from objc_util import autoreleasepool #this doesnt work properly
import photos
import sys, os

try:
   sys.path+=os.path.expanduser('~')
   from objc_hacks.memstatus import _get_taskinfo
except:
   pass # if memstatus is not installed,or not supported device

import gc
from PIL import Image
all_photos=photos.get_assets()
for idx, asset in enumerate(all_photos):
      pool=ObjCClass('NSAutoreleasePool').new()
      try:
         if idx % 10 == 0:
            print(f"Uploading... {idx}/{len(all_photos)}")
            if _get_taskinfo:
               print(_get_taskinfo().resident_size/1024/1024)
         fname = str(ObjCInstance(asset).valueForKey_('filename'))
         payload = asset.get_image_data(original=True)
         #do the actual upload....
         # upload(....)

         # this is the important bit, all objcinstances must have cached methods cleared manually in order to get gc'd
         ObjCInstance(asset)._cached_methods.clear()

      finally:
         pool.drain()
         pool._cached_methods.clear()
         del pool
         ObjCClass('NSAutoreleasePool').releaseAllPools()



They keys:

1) using NSReleasePool manually, which requires a context manager or try/finally to ensure it gets releaseAllPools at the end. (the releaseAllPools could go at the end if the script, maybe, but it needs to happen before gc gets called, or script is run again, i think)

2) inside the pool, any ObjCInstance's that get created must have
._cached_methods.clear() called to break the ref cycle.
I tried with and without del the asset and payload, that is not necessary, but not harmful either

```

JonB

Just to close out on my investigations here, here is a cleaned up version with a proper context manager to replace the objc_util version, and some extra fluff removed. .


from objc_util import *
#from objc_util import autoreleasepool #this doesnt work properly
import photos
import sys, os,gc
import contextlib


try:
   sys.path+=os.path.expanduser('~')
   from objc_hacks.memstatus import _get_taskinfo
except:
   pass # if memstatus is not installed,or not supported device

from PIL import Image
all_photos=photos.get_assets()

'''Replacement for objc_util.autoreleasepool that doesnt crash when rerunning script.
objc_util's has two issues:   first, a bug in ObcCInstance.__del__ means that  NSAutoreleasePool will have release called, which is never proper for NSAutoreleasePool.  This wouldnt be an issue if the object is alive, since those calls get ignired, it seems.  The second issue affects all ObjCInstances:  the _method_cache prevents timely garbage collection due to a ref cycle.  So ObjCInstances stick around -- and thus never get release called -- until gc is run.   Problem is, once the pool has been drained, we are not guaranteed to have it around later.  Since objc_util doesnt retain it, after draining, it might get released, and then when gc finally breaks the ref cycle, release gets called, and that pointer might be a new object at that point, so release causes a crash'''
@contextlib.contextmanager
def autoreleasepool():
   pool = ObjCClass('NSAutoreleasePool').new()
   try:
      yield
   finally:
      pool.drain()
      pool._cached_methods.clear() #break ref cycle
      pool.__del__=None #prevent release from being called


for idx, asset in enumerate(all_photos):
      with autoreleasepool():
         ''' debugging '''
         if _get_taskinfo:
            print(f'\nSTART {_get_taskinfo().resident_size/1024/1024:3.2f} MB')

         ''' get the data'''
         fname = str(ObjCInstance(asset).valueForKey_('filename'))
         payload = asset.get_image_data(original=True)

         ''' Do some actual work here
              ... insert upload, save, etc, code
         ''' 

         '''Clean up'''
         del payload # not actually needed         
         ObjCInstance(asset)._cached_methods.clear() #required !!!
         del asset # not actually needed

         ''' debugging '''
         if _get_taskinfo:
            print(f'END.  {_get_taskinfo().resident_size/1024/1024:3.2f} MB')

As written, the memory is basically constant each cycle:

START 76.20 MB
END.  76.52 MB

START 76.20 MB
END.  76.52 MB

START 76.20 MB
END.  76.94 MB

START 76.20 MB
END.  76.79 MB

START 76.20 MB
END.  76.85 MB

If i comment out the with autoreleasepool() line, essentially not using a release pool, it should now be obvious why looping phassets never works -- despite deleting the objc objects, the actual PHAsset's data does not get cleared without draining the pool, and we gain several MB per cycle

START 94.90 MB
END.  96.81 MB

START 96.81 MB
END.  98.68 MB

START 98.68 MB
END.  99.94 MB

START 99.94 MB
END.  100.99 MB

START 101.09 MB
END.  102.03 MB

START 102.02 MB
END.  102.89 MB

If I use the pool, but comment out the two del lines:

START 63.28 MB
END.  65.14 MB

START 63.70 MB
END.  64.77 MB

START 63.51 MB
END.  62.46 MB

START 62.36 MB
END.  62.43 MB

START 62.35 MB
END.  66.07 MB

START 64.17 MB
END.  66.01 MB

we see that there is no net growth, although there is somewhat more variation from cycle to cycle (because the bytesIO object payload is still in scope at the start -- if the meat of the loop is moved to a function, the results are the same with or without the del) . This shows that reference counting gc does do its job -- deleting the bytesIO object (payload) is not necessary (this was a point of contention with @ccc in a previous thread) -- when it falls out of scope, the memory is cleared.

Finally, to see the effect of the objc_util bug that prevents ObjCInstances from getting cleared when they fall out of scope, I comment out the
ObjCInstance(asset)._cached_methods.clear() ! line:

since ObjCInstance objects themselves are not that big, we see a slow leak, growing maybe a 0.1 MB every few hundred images. I seemed to get more crashes if starting/stopping the fhe script many times, but that could be my imagination. So I conclude, this probably not needed unless you are creating many tens of thousands of ObjCInstances, or you need to ensure that the objects release is called at a specific time for some other reason.

ccc

Awesome detailed work @JonB . The small leak might become significant after hundreds of loops in an appex environment.