The re module is probably fine, but most likely you're storing the Japanese text in a problematic way. This is not something you're doing wrong, but something that Python doesn't do right by default.
Python initially had no support for Unicode. If you don't know what exactly Unicode is, it's a standard that assigns numbers to basically all characters of all writing systems of the world, not just English letters like ASCII does. Although almost all versions of Python 2 have some Unicode support, the default str type and "strings" don't support Unicode for compatibility reasons.
A single character in a str is one byte, a number from 0 to 255. This is enough if you're just working with ASCII letters and maybe some other Latin letters, but not enough to fully support Unicode. This means that if you type e. g. a Japanese character in a string, it is stored in multiple bytes. Python thinks that 1 byte is 1 character, so it reads one Japanese character as multiple characters. The re module gets this string and treats the bytes of your Japanese character as multiple characters, and this is where things go wrong.
If you want to properly use non-ASCII characters in your strings, there are a few things you need to change in your code:
- Add the comment
# -*- coding: utf-8 -*- as the first line of your program. This line tells Python what encoding the file uses, i. e. how non-ASCII characters are stored. UTF-8 is an encoding that is compatible with ASCII, but can also encode any Unicode character. It's also what the Pythonista editor uses by default.
- Add the prefix
u before every string literal, e. g. "this is a string" becomes u"this is a string". This makes the string a Unicode string, which properly supports all Unicode characters.
- Instead of converting objects to
str if you want to convert them to text, convert them to unicode. For example str(mylist) becomes unicode(mylist). The output might look the same, but it is now a Unicode string.
(Python 3 has much better Unicode support, but Pythonista still uses Python 2, and switching to Python 3 would be a lot of work and would break user code.)