Forum Archive

Using continue with yield

Phuket2

Sorry, this is not a Pythonista question, it's a Python Question. I have looked this up, but I don't get it.

The below is just walking through the dir structure. Just copied the code from stackflow. Well I added the yield as I wanted to return a generator.
What I am having difficulty with is filtering. I want to introduce ignore dir lust as well as define what file types (ext) I want returned.the filter conditions are not a problem. I am just not sure what to do if I want to ignore a file either based on its ext or dir.
My simple idea was to use continue if it failed my filter test otherwise yield.
From what I can ascertain, this does not work, the generator terminates. Eg, no more values. I am not even sure it's possible to have conditional tests inside a generator to skip items. I know I could return some flag, but that's ugly.

The code without filtering

def allfiles(self):
        for path, subdirs, files in os.walk(self.root_dir):
            for filename in files:
                f = os.path.join(path, filename)
                yield f
dgelessus

There's no need to do any filtering in allfiles, you can do that afterwards with a list comprehension (or a normal for loop):

print([filename for filename in self.allfiles() if os.path.splitext(filename)[1] == "py"])

for filename in self.allfiles():
    if os.path.dirname(filename) == os.path.expanduser("~/Documents"):
        print(filename)
Phuket2

@dgelessus , yes I realised that and I do that at the moment, I just seemed redundant to me that's all. Also if other methods are calling allfiles or what would become something like _allfiles with a filter param, the intention gets more explicit as per import this
Maybe it's not possible to skip over items in a generator, I really don't know

omz

I'm not sure if I understand the question correctly, but you can just not use a yield statement for items in your generator that you want to skip, i.e. something like if some_condition: yield item

Below is a complete allfiles function that accepts two regular expressions for filtering the results; one that is matched against subdirectory names (matches are skipped), and one that is matched against file extensions (matches are included).

The example iterates over all py/txt/md files in ~/Documents, except for files in site-packages or .Trash.

import os
import re

def allfiles(root_dir, skip_dirs_re=None, file_ext_re=None):
    for path, subdirs, files in os.walk(root_dir):
        if skip_dirs_re:
            new_subdirs = []
            for subdir in subdirs:
                if not re.match(skip_dirs_re, subdir):
                    new_subdirs.append(subdir)
            subdirs[:] = new_subdirs
        for filename in files:
            ext = os.path.splitext(filename)[1][1:]
            if (file_ext_re is None) or (re.match(file_ext_re, ext, re.IGNORECASE)):
                full_path = os.path.join(path, filename)
                yield full_path

skip_dirs = '\\.Trash|site-packages'
exts = 'py|md|txt'
root_dir = os.path.expanduser('~/Documents')

for file_path in allfiles(root_dir, skip_dirs, exts):
    print file_path

Phuket2

@omz , ok thanks. You answered my question and give a nice allfiles 😜
My main thing was not understanding if I could skip files or not with the generator. It sort of makes sense you can't. I just thought python might do some under the hood tricks if it sees a continue in a generator. But it appears not.
Anyway thanks guys. I know it was not a Pythonista question. Just had trouble tracking down the answer

JonB

You can filter within a generator... you would simply wrap your yield with a conditional

if somecondition:
   yield item

the outer loop would keep running until it hits a yield, then returns one yielded value. continue should be unneccessary, though seems to work ok, as long as you are paying attention to which loop you are continuing, etc. One way to think of things: repace yield with print, and think about what gets printed... those are the items that will get yielded.

def all_f_files(root):
        for path, subdirs, files in os.walk(root):
            for filename in files:
                f = os.path.join(path, filename)
                if filename.startswith('f'):
                    yield f
for f in all_f_files('.'):
    print f
JonB

@Phuket2
Just think of using print instead of yield. Anything that gets printed would be yielded. If you wanted to skip printing an item you would use

if some_condition:
    print item

or alternatively

if some_skip_condition:
    continue
print item

now replace print with yield and you have a generator. I don't think there are any restrictions on the type of control structures you use in a generator, only that your logic must be correct in the first place. If you had tried something like that and it didn't work, it wouldn't have worked if you used print instead of yield!