Forum Archive

re (Regular Expression) module "caret" character not working?

TutorialDoctor

In the code below, the action should replace all words that begin with "the" as a list. But it returns a blank list.

\# Extracts words from input text and outputs it as a list
import re
import editor
import workflow

params = workflow.get_parameters()
sentence = workflow.get_input()

list = []

expression = '^the'
pattern = re.compile(expression)
matches = re.findall(pattern,sentence)

for word in matches:
    list.append(word)

workflow.set_output('\n'.join(list))

This code works with other special characters, but this isn't working. Any tips?

JonB

Caret matches only at the start of a string, I.e if the is the first work in the sentence.
The expression you are looking for probably looks like

expression='\bthe'

\b matches but does not consume a word boundary.

http://regex101.com/r/gC6nN8/1

ccc

Remember that you can also simplify things by using list comprehensions:

# Instead of this:
list = []

for word in matches:
    list.append(word)

workflow.set_output('\n'.join(list))

# You can just write this:
workflow.set_output('\n'.join([word for word in matches]))
omz

@ccc The list comprehension seems redundant here, '\n'.join(matches) would do the same, as far as I can see.

Btw, it's not a good idea to use list as a variable name, you'll run into problems when you try to use the built-in function list().

TutorialDoctor

I am still not getting a match on the word "them" nor on an email that begins with "the."

# Extracts words from input text and outputs it as a list
import re
import editor
import workflow

params = workflow.get_parameters()
sentence = workflow.get_input()

match_list = []

expression = '\bthe'
pattern = re.compile(expression)
matches = re.findall(pattern,sentence)

for word in matches:
    match_list.append(word)

workflow.set_output('\n'.join(match_list))

omz

You need to escape the backslash in the pattern or use a raw string, i.e. use either '\\bthe' or r'\bthe'.

TutorialDoctor

Thanks ole. That did help.