Forum Archive

Problem with (what I thought was) a simple regex

tguillemin

I want to capitalize the first letter of each line in a text. So, I came up with the following Find/Replace (Regular Expression)

Find : ^([a-z])

Replace with : \U$1

This works in Regex101 (Python flavor), but not in Editorial (I get a capital U at the beginning of each line starting with a lower case character)

Where did I go wrong ?

Thanks in advance

omz

Hmm, this doesn't work in Regex101 for me, and I've never heard of \U as an "uppercase modifier" (I'm not a regex expert though).

You can achieve the same effect using a Python script action with something like this in Editorial:

import re, workflow
text = workflow.get_input()
text = re.sub(r'^([a-z])', lambda m: m.group(1).upper(), text, flags=re.M)
workflow.set_output(text)
omz

Wait, it does work in Regex101 (no idea why it didn't previously), but I don't really know why, i.e. which regex feature/flavor is responsible for it. Anyway, I hope the script helps.

tguillemin

It woks nicely
Thank you very much, and thanks again for this app.

ccc

Some people, when confronted with a problem, think "I know, I'll use regular expressions." Now they have two problems. -- Jamie Zawinski, 1997

lines = '\n'.join(line[:1].upper() + line[1:] for line in lines.splitlines())
dgelessus

@ccc You might want to use .title() instead of .upper(). For Unicode characters that contain more than one letter (like the ligature), .upper() does not do what you expect:

>>> s = "finish"
>>> s[0].upper() + s[1:]
'FInish'
>>> s[0].title() + s[1:]
'Finish'

To be fair, it's unlikely that you'll ever encounter a character like that in practice, but it doesn't hurt to use the proper method.

ccc

@dgelessus Watch out for those blank lines...

lines = '\n'.join(line[:1].title() + line[1:] for line in lines.splitlines()) + '\n'

Solves the ligature issue and deals gracefully with blank lines (except at end of file).

The regex approach does not handle leading ligatures but instead leaves them lowercase because the ligature is not in [a-z].

tguillemin

@ccc said:

lines = '\n'.join(line[:1].title() + line[1:] for line in lines.splitlines()) + '\n'

I had not tried it with blank lines (in my file, it does not occur)
Nevertheless, I tried your solution - successfully - after inserting those blank lines.

Thank you. That will most certainly be useful some day…!