Forum Archive

Recognize text from picture

mikael

This script recognizes text from a camera or photo library picture. Sharing it since iOS 13 has made it this easy, and Apple Shortcuts do not have support for it (yet, I bet).

Adjust languages on the first row.

```
language_preference = ['fi','en','se']

import photos, ui, dialogs
import io
from objc_util import *

load_framework('Vision')
VNRecognizeTextRequest = ObjCClass('VNRecognizeTextRequest')
VNImageRequestHandler = ObjCClass('VNImageRequestHandler')

def pil2ui(pil_image):
buffer = io.BytesIO()
pil_image.save(buffer, format='PNG')
return ui.Image.from_data(buffer.getvalue())

selection = dialogs.alert('Get pic', button1='Camera', button2='Photos')

ui_image = None

if selection == 1:
pil_image = photos.capture_image()
if pil_image is not None:
ui_image = pil2ui(pil_image)
elif selection == 2:
ui_image = photos.pick_asset().get_ui_image()

if ui_image is not None:
print('Recognizing...\n')

req = VNRecognizeTextRequest.alloc().init().autorelease()
req.setRecognitionLanguages_(language_preference)
handler = VNImageRequestHandler.alloc().initWithData_options_(ui_image.to_png(), None).autorelease()

success = handler.performRequests_error_([req], None)
if success:
    for result in req.results():
        print(result.text())
else:
    print('Problem recognizing anything') ```
pavlinb

I think VNRecognizeText works only for English.

cvp

It works perfectly in French, thanks to @mikael

mikael

@pavlinb, works perfectly for Finnish, too.

But the version above is slower than it needs to be, due to an unnecessary roundtrip to ui.Image. Here’s a faster version:

language_preference = ['fi','en','se']

import photos, ui, dialogs
import io
from objc_util import *

load_framework('Vision')
VNRecognizeTextRequest = ObjCClass('VNRecognizeTextRequest')
VNImageRequestHandler = ObjCClass('VNImageRequestHandler')

def pil2ui(pil_image):
    buffer = io.BytesIO()
    pil_image.save(buffer, format='PNG')
    return ui.Image.from_data(buffer.getvalue())

selection = dialogs.alert('Get pic', button1='Camera', button2='Photos')

pil_image = None

if selection == 1:
    pil_image = photos.capture_image()
elif selection == 2:
    pil_image = photos.pick_asset().get_image()

if pil_image is not None:
    print('Recognizing...\n')

    buffer = io.BytesIO()
    pil_image.save(buffer, format='PNG')
    image_data = buffer.getvalue()

    req = VNRecognizeTextRequest.alloc().init().autorelease()
    req.setRecognitionLanguages_(language_preference)
    handler = VNImageRequestHandler.alloc().initWithData_options_(image_data, None).autorelease()

    success = handler.performRequests_error_([req], None)
    if success:
        for result in req.results():
            print(result.text())
    else:
        print('Problem recognizing anything')
mikael

@pavlinb, ah, but you were right. This does not recognize the scandinavian letters ä and ö, substituting them with a and o. @cvp, are you getting é, ô and all the others?

cvp

@mikael
é yes
à no

mikael

@cvp, checked, usesLanguageCorrection is true and recognitionLevel set to ”accurate” by default, so no help there.

pavlinb

Doesn’t work for Cyrillic (Bulgarian).

mikael

Eh.

revision = VNRecognizeTextRequest.currentRevision()
supported = VNRecognizeTextRequest.supportedRecognitionLanguagesForTextRecognitionLevel_revision_error_(0, revision, None)

Returns ”en-US”.

cvp

@mikael I had also seen that but it supports French, thus ...buggy?

pavlinb

BTW, I'm impressed from accuracy ( for Latin based texts ).

JonB

@mikael @cvp have you tried setting customWords attrib of the request? Or, turn off usesLanguageCorrection? (Since the language is en-US you DON'T want language correction when trying to detect other languages!)

I gather they are looking for words you'd find in an English dictionary. So perhaps façade, or tête-à-tête might recognize, while other examples wouldn't?

cvp

@JonB I didn't try but we are not alone with this problem, see here.

I've tried with unknown language codes like xx and yy in setRecognitionLanguages_ and the result is the same. It seems that characters are recognized in any languages.
My last test on a French text was entirely correct

Asie-Pacifique
La mission économique belge en Chine cible de
cyberattaques massives
cvp

@JonB said:

usesLanguageCorrection

Tried with False: à still recognized as a

Edit : even with

    req.setCustomWords_(['à']) 
sodoku

Is there any code examples of how to recognize text with ios 12.4.3 for the i pad mini 2 that would be cool to add it to my sodoku app game

mikael

@sodoku, yes, but it seems a bit more involved. Check this thread where @cvp does all kinds of magic.

ccc
def pil2ui(pil_image):
    buffer = io.BytesIO()
    pil_image.save(buffer, format='PNG')
    return ui.Image.from_data(buffer.getvalue())

is memory leaking buffer which has been proven to crash Pythonista when multiple images are processed. A better approach is to use a context manager to force the .close().

def pil2ui(pil_image):
    with io.BytesIO() as buffer:
        pil_image.save(buffer, format='PNG')
        return ui.Image.from_data(buffer.getvalue())
mikael

Revisiting this.

Regardless of language restrictions, I have found the simple and reliable ability to pick text from paper to be useful for me almost weekly - URLs, email addresses, reservation codes, laptop serial numbers, etc.

With the use, I noted that the original script had some issues:

  1. Difficult to find and open when quickly needed.
  2. Slow to get from the picked photo to recognized text.
  3. Results are a pain to copy from the Console as it likes to jump around just as you’ve selected the text to copy.

Point #1 was fixed with a simple Apple Shortcuts shortcut to make the script easy to run.

Point #3 was resolved by presenting the recognized text in a TableView, with tap to copy.

Point #2 took a bit more doing.

Pythonista photos module wants to return PIL images, and that results in two very slow conversions - first the module converts the UIImage to PIL, and then I converted that back to a PNG image for recognition. I found some @cvp code in this thread and replaced photos module with objc_util pickers, which return PNG data almost directly.

And hey presto! Not just faster recognition, but instantaneous - and with much better quality than with the only contender app I could find (Prizmo Go).

Updated script here.

cvp

@mikael Thanks for your great 🎁for New Year

cvp

Question for a specialist of this forum.
In the last post of @mikael , I see my user as @cvp but as a black text and not clickable blue, although I've received a notification "Mikael mentioned you...".
How is that possible?

mikael

Happy last day of the decade to everyone who shares my calendar!

I finessed the script a bit with the ability to select, copy or share multiple items, and nicer icons.

@cvp, noted and wondered about the lack of the link for your handle, no idea why.

sodoku

Does this work with the new iPad OS ????

mikael

@sodoku, do you mean if the latest versions have included robust support for non-English characters? Not to my knowledge.

sodoku

So good news I got an iPhone 11 and I’m testing this on it, for sudoku, do the pictures taken save anywhere when used, just curious if I have to delete them because after I use it the pictures don’t show up in my pictures app

sodoku

Also what’s the updated code posted by Mikael on GitHub used for is it same as this one posted here or not because it’s so much longer and bigger then this code posted on the forum, is it a better version then this one on the forum

mikael

@sodoku, the code on Github is more of a tool, and much faster than the version at the beginning of this thread. For your purposes, you probably just want pieces of it.

It supports taking a picture normally and then selecting it from the photo library when you use the tool, or just snapping a quick ”disposable” in-tool image which is not saved.

pavlinb

@mikael Is it possible to specify a region from screen to capture ? Not whole screen?

sodoku

So there are a few edits in this thread I don’t know how to piece together to have the best edited version of this???

mikael

@sodoku, the one on github is the latest version.

sodoku

I will test it for sudoku in the console, well I will try to convert it to use in console for the sudoku solver if need help I’ll post a message

sodoku

I need help adapting this script for inputting the numbers from a picture of sudoku and insert the starting numbers into a console script, it does not work good for recognizing ones and sevens???

Example of sudoku solver

In my version I want to make the board is all zeros and when you take a picture it will add the numbers than solve, this combines the two programs (sudoku solver) & (ocr text recognition)

```
board=[
[5,8,4,1,0,0,0,0,0],
[0,0,6,8,0,0,5,1,0],
[0,0,0,0,5,4,7,0,6],
[0,5,3,0,1,0,0,6,7],
[0,0,0,0,2,0,0,0,0],
[4,6,0,0,9,0,8,3,0],
[7,0,8,5,4,0,0,0,0],
[0,2,9,0,0,3,4,0,0],
[0,0,0,0,0,1,3,7,9]
]

def solve(bo):

find = find_empty(bo)
if not find:
    return True
else:
    row,col = find

for i in range(1,10):
    if valid(bo,i,(row,col)):
        bo[row][col] = i

        if solve(bo):
            return True

        bo[row][col] = 0

return False

def valid(bo,num,pos):
#check row
for i in range(len(bo[0])):
if bo[pos[0]][i] == num and pos[1] != i:
return False
#check column
for i in range(len(bo[0])):
if bo[i][pos[1]] == num and pos[0] != i:
return False
#check quadrant
box_x = pos[1] // 3
box_y = pos[0] // 3

for i in range(box_y * 3, box_y * 3 + 3):
    for j in range(box_x * 3, box_x * 3 + 3):
        if bo[i][j] == num and (i,j) != pos:
            return False

return True

def print_board(bo):
for i in range(len(bo)):
if i % 3 == 0 and i != 0:
print('------+-------+------')

    for j in range(len(bo[0])):
        if j % 3 == 0 and j != 0:
            print('|',end=' ')
        if j == 8:
            print(bo[i][j])
        else:
            print(str(bo[i][j])+ ' ', end='')

def find_empty(bo):
for i in range(len(bo)):
for j in range(len(bo[0])):
if bo[i][j] == 0:
return (i,j) # row, col
return None

print_board(board)
solve(board)
print('=====================')
print_board(board)

language_preference = ['fi','en','se']

import photos, ui, dialogs
import io
from objc_util import *

load_framework('Vision')
VNRecognizeTextRequest = ObjCClass('VNRecognizeTextRequest')
VNImageRequestHandler = ObjCClass('VNImageRequestHandler')

def pil2ui(pil_image):
buffer = io.BytesIO()
pil_image.save(buffer, format='PNG')
return ui.Image.from_data(buffer.getvalue())

selection = dialogs.alert('Get pic', button1='Camera', button2='Photos')

ui_image = None

if selection == 1:
pil_image = photos.capture_image()
if pil_image is not None:
ui_image = pil2ui(pil_image)
elif selection == 2:
ui_image = photos.pick_asset().get_ui_image()

if ui_image is not None:
print('Recognizing...\n')

req = VNRecognizeTextRequest.alloc().init().autorelease()
req.setRecognitionLanguages_(language_preference)
handler = VNImageRequestHandler.alloc().initWithData_options_(ui_image.to_png(), None).autorelease()

success = handler.performRequests_error_([req], None)
if success:
    for result in req.results():
        print(result.text())
else:
    print('Problem recognizing anything') ```
mikael

@sodoku, I tried something similar as well a while ago, first recognizing rectangles and then trying to recognize the numbers, but I hit the same issue of very poor recognition of the numbers. I wonder if we would need a number-specific recognizer for that.

Spitfire

Hi, since the sudoko is a square of many squares I think it is more robust to slice the cells evenly and only have one nr in a small image.

Of course downside is to use a recognition service per image and you go from 1 image to 81 - that can get expensive.

But it would work more robust.
Best reg Tommy

mikael

@Spitfire, thanks. I did try all kinds of approaches, finally resorting to manual cropping, and it still was not reliable enough.

pavlinb

@mikael Could you share some picture of sudoku, where recognition fails?

ccc

Multiple sample Sudoku puzzles would help to achieve a robust solution.

sodoku

I have a few questions about the very first text recognition code posted on this one

the example video I am referring to is https://developer.apple.com/videos/play/wwdc2019/234

1 how do you change the recognition level from fast to accurate
example code from apple website I am not sure if its written in swift or objective c but it is like this :

myTextRegcognitionRequest.recognitionLevel = VNRequestTextRecognitionLevel.accurate

and another example of this shown in the apple video for setting the recognition level

 request.recognitionLevel = .fast 

question 2
to ensure that numbers don't get mistaken as letters
without the language corrector active to avoid mistaking the number 5 for an S or I as 1
example of this from the video is

```
extension Character {

 func GetSimilarCharacterIfNotIn(allowedChars: String -> Character {
        let  conversionTable = [
                  's':'5',
                  'S':'5',
                  'i':'1',
                  'I':'1', ]

```

question 3
if you know how to set up the special words detector thingy feature mentioned in the video

westjensontexas

I gather they are looking for words you'd find in an English dictionary. So perhaps façade, or tête-à-tête might recognize, while other examples wouldn't? mobdro apk tubemate

JonB

@sodoku
See https://developer.apple.com/documentation/vision/vnrequesttextrecognitionlevel/fast

Try req.recognitionLevel=1 for fast, or 0 for accurate.

Re fixing characters... I gather you might set req.usesLanguageCorrection=False (or maybe 0), then make your own replacement map and use str.translate.

Custom words is handled by
req.customWords = ['customword1', 'etc']

See apple docs for VNRecognizeTextRequest

sodoku

ive seen the apple documentation coding on Vision Framework I just dont know how to convert it to python

Question 1
What about the setting the minimum text height how do you translate either of these codes to python????
@property(readwrite, nonatomic, assign) float minimumTextHeight; written in objective-c
var minimumTextHeight: Float { get set } written in Swift

Question 2
I was also interested in learning how to recognize the individual boxes from a sudoku puzzle to extract the numbers is there a way to do that possibly with
VNRecognizedTextObservation A request that detects and recognizes regions of text in an image.
or possibly with the bounding box technique show in the video https://developer.apple.com/videos/play/wwdc2019/234 , also can you put multiple bounding boxes to recognize text from a sudoku card

this is Mikeals code i am trying to insert the code into but dont know how to convert the code shown in the apple documentation into python

language_preference = ['fi','en','se']

import photos, ui, dialogs
import io
from objc_util import *

load_framework('Vision')
VNRecognizeTextRequest = ObjCClass('VNRecognizeTextRequest')
VNImageRequestHandler = ObjCClass('VNImageRequestHandler')

def pil2ui(pil_image):
    buffer = io.BytesIO()
    pil_image.save(buffer, format='PNG')
    return ui.Image.from_data(buffer.getvalue())

selection = dialogs.alert('Get pic', button1='Camera', button2='Photos')

ui_image = None

if selection == 1:
    pil_image = photos.capture_image()
    if pil_image is not None:
        ui_image = pil2ui(pil_image)
elif selection == 2:
    ui_image = photos.pick_asset().get_ui_image()

if ui_image is not None:
    print('Recognizing...\n')

    req = VNRecognizeTextRequest.alloc().init().autorelease()
    req.recognitionLevel=1
    req.setRecognitionLanguages_(language_preference)
    handler = VNImageRequestHandler.alloc().initWithData_options_(ui_image.to_png(), None).autorelease()

    success = handler.performRequests_error_([req], None)
    if success:
        for result in req.results():
            print(result.text())
    else:
        print('Problem recognizing anything')
JonB

@sodoku For things like enumerations, you can usually check the swift version of docs, which tells you the value. Otherwise, you can often look up source code.

For minimumTextHeight, both swift and ObjC say this is a float. The fact that it is readwrite/nonatomic/assign is not important.
So, usually this would just be
req.minimumTextHeight = 32.5
or whatever you want...

It is often helpful to explore objects in the console, since this can tell you what you're working with. For instance, if you type req. in the console, you will see autocomplete of all known attributes. Usually you need to treat objc properties as function calls -- so to check minimumTextHeight, you'd use req.minimumTextHeight(). But to set, you can treat the property as a python attribute and assign directly. In some cases, you may need to use the set_propertyName_(value) convention.

Where things get tricky is where the declared type is another object (in which case you have to provide the right type of object), or a structure. Structures can be tricky because objc_util often screws up the type encodings, and you have to manually override. Structures get turned into python STRUCTUREs, and you access fields normally like you would with a python object (no () needed).

Re question 2:
Per the docs, the results of a request will be VNRecognizedTextObservation objects. This is a subclass of VNRectangleObservation.

@interface VNRecognizedTextObservation : VNRectangleObservation <-- colon here means inherits from

If you look up VNRectangleObservation, you will see it has the following attributes
bottomLeft
bottomRight
topLeft
topRight
Which are declared as CGPoint, which is a structure that has an .x and .y fields.

for result in req.results():
    x = result.bottomLeft().x
    y = result.bottomLeft().y
    w = result.topRight().x-x
    h = result.topRight().y-y
    print('({},{},{},{}) {}'.format(x,y,w,h, result.text())

You could draw the image into an image context, and then also stroke a rectangle.. something like this...(not tried).

with ui.ImageContext(ui_image.size()) as ctx:
   ui_image.draw()
   for result in req.results():
      vertecies = [(p.x, p.y) 
                               for p in [result.bottomLeft()
                                        result.TopLeft()
                                        result.TopRight()
                                        result.BottomRight()
                                        result.bottomLeft()]
      pth = ui.Path.moveTo(*vertecies[0]) %initial point
      for p in vertecies[1:]:
         pth.line_to(*p)  
      ui.set_color('red')
      pth.stroke()
      x,y = vertecies[0]
      w,h =(vertecies[2].x-x), (vertecies[2].y-y)
      ui.draw_string(result.text(), rect=(x,y,w,h), font=('<system>', 12), color='red')
   marked_img = ctx.get_image()
   marked_img.show()
JonB

I realized that result will also have a .boundingBox() attribute which would make some of this a little simpler.
That is a CGrect, consisting of .origin (in turn consisting of .x and .y and .size containing .w and .h.
In that case you could use ui.Path.rect.

JonB

Okay, my previous reply was full of errors... here is a working version, which adds red boxes around each result, along with the text

language_preference = ['fi','en','se']

import photos, ui, dialogs
import io
from objc_util import *

load_framework('Vision')
VNRecognizeTextRequest = ObjCClass('VNRecognizeTextRequest')
VNImageRequestHandler = ObjCClass('VNImageRequestHandler')

ACCURATE=0
FAST=1

def pil2ui(pil_image):
    buffer = io.BytesIO()
    pil_image.save(buffer, format='PNG')
    return ui.Image.from_data(buffer.getvalue())

selection = dialogs.alert('Get pic', button1='Camera', button2='Photos')

ui_image = None

if selection == 1:
    pil_image = photos.capture_image()
    if pil_image is not None:
        ui_image = pil2ui(pil_image)
elif selection == 2:
    ui_image = photos.pick_asset().get_ui_image()

if ui_image is not None:
    print('Recognizing...\n')

    req = VNRecognizeTextRequest.alloc().init().autorelease()
    req.recognitionLevel= ACCURATE# accurate
    req.setRecognitionLanguages_(language_preference)
    handler = VNImageRequestHandler.alloc().initWithData_options_(ui_image.to_png(), None).autorelease()

    success = handler.performRequests_error_([req], None)
    if success:
        for result in req.results():
            print(result.text())
    else:
        print('Problem recognizing anything')

with ui.ImageContext(*tuple(ui_image.size) ) as ctx:
   ui_image.draw()
   for result in req.results():
      cgpts=[   result.bottomLeft(),
                result.topLeft(),
                result.topRight(),
                result.bottomRight(),
                result.bottomLeft()  ] 
      vertecies = [(p.x*ui_image.size.w, (1-p.y)*ui_image.size.h) for p in cgpts]
      pth = ui.Path()
      pth.move_to(*vertecies[0]) 
      for p in vertecies[1:]:
         pth.line_to(*p)  
      ui.set_color('red')
      pth.stroke()
      x,y = vertecies[0]
      w,h =(vertecies[2][0]-x), (vertecies[2][1]-y)
      ui.draw_string(str(result.text()), rect=(x,y,w,h), font=('<system>', 12), color='red')
   marked_img = ctx.get_image()
   marked_img.show()
mikael

@JonB and @sodoku, just a note that I tried a different route, where I first used a rectangle recognizer to isolate the numbers, and only then used text recognition.. The results were not impressive, but I can try to find the code for reference, if you think it might be useful.

JonB

Here is another solution... I use a rectangle detection and a perspective correction to crop the puzzle. This gives much better detection, though not perfect. The recognition is pretty good, though it has troubles with 1’s on their own.... turn into Ts of all things. Some additional work in the clean function might fix common problems.

I’m using images from https://github.com/prajwalkr/SnapSudoku/tree/master/train

I suspect doing some CIFiltering first will probably improve things.

from objc_util import *
import ui

VNImagePointForNormalizedPoint=c.VNImagePointForNormalizedPoint
VNImagePointForNormalizedPoint.argTypes=[CGPoint, c_int, c_int]
VNImagePointForNormalizedPoint.restype=CGPoint


ui_image=ui.Image.named('image2.jpg')
ui_image.show()

CIImage=ObjCClass('CIImage')
ci_image=CIImage.imageWithCGImage_(ui_image.objc_instance.CGImage())

CIPerspectiveCorrection=ObjCClass('CIPerspectiveCorrection')
f=CIPerspectiveCorrection.perspectiveCorrectionFilter()
f.inputImage=ci_image
o=f.outputImage()

load_framework('Vision')
VNRecognizeTextRequest = ObjCClass('VNRecognizeTextRequest')
VNDetectRectanglesRequest = ObjCClass('VNDetectRectanglesRequest')
VNImageRequestHandler = ObjCClass('VNImageRequestHandler')

req=VNDetectRectanglesRequest.alloc().init().autorelease()
req.maximumObservations=2
req.minimumSize=0.5
req.minimumAspectRatio=0.7
req.quadratureTolerance=30

handler = VNImageRequestHandler.alloc().initWithData_options_(ui_image.to_png(), None).autorelease()

success = handler.performRequests_error_([req], None)
try:
   result=req.results()[0]
   nm=lambda p :VNImagePointForNormalizedPoint(p,int(ui_image.size.w),int(ui_image.size.h))
   f.topLeft = nm(result.topLeft())
   f.topRight = nm(result.topRight())
   f.bottomLeft = nm(result.bottomLeft())
   f.bottomRight = nm(result.bottomRight())
   o=f.outputImage()

   with ui.ImageContext(o.extent().size.width, o.extent().size.height) as ctx:
     UIImage.imageWithCIImage_(o).drawAtPoint_( CGPoint(0,0))
     ui_image2=ctx.get_image()
   ui_image2.show()
except:
   print('bounding rec not found...results wont work')
   ui_image2=ui_image
'''now, detect rectangles again...'''
handler = VNImageRequestHandler.alloc().initWithData_options_(ui_image2.to_png(), None).autorelease()
req0 = VNRecognizeTextRequest.alloc().init().autorelease()
req0.recognitionLevel= 0# accurate
req0.usesLanguageCorrection=True
req0.customWords=[str(a) for a in range(10)]

#req0.maximumObservations=81
#req0.minimumSize=.1
success = handler.performRequests_error_([req0], None)
with ui.ImageContext(*tuple(ui_image2.size) ) as ctx:
   ui_image2.draw()
   for result in req0.results():
      cgpts=[result.bottomLeft(),
                                        result.topLeft(),
                                        result.topRight(),
                                        result.bottomRight(),
                                        result.bottomLeft()] 
      vertecies = [(p.x*ui_image2.size.w, (1-p.y)*ui_image2.size.h) for p in cgpts]
      pth = ui.Path()
      pth.move_to(*vertecies[0]) 
      for p in vertecies[1:]:
         pth.line_to(*p)  
      ui.set_color('red')
      pth.stroke()
      x,y = vertecies[0]
      w,h =(vertecies[2][0]-x), (vertecies[2][1]-y)

      ui.draw_string(str(result.text()), rect=(x,y,w,h), font=('<system>', 12), color='red')
   marked_img = ctx.get_image()
   marked_img.show()

def bbcenter(bb):
   return((9*(bb.origin.x+bb.size.width/2)-0.5), 
          (9*(bb.origin.y+bb.size.height/2)-0.5) )
def clean(results):
   cleaned=[]
   for r in results:
      col,row=bbcenter(r.boundingBox())
      approx_num_ch=(r.boundingBox().size.width*9)
      txt=str(r.text()).replace(' ','')
      if approx_num_ch<=1:
         if len(txt) == 1:
             cleaned.append(((round(col),round(row)),txt))
         else:
             cleaned.append(((round(col),round(row)),'-1'))
      else: #more than one char
         col-=(len(txt)-1)/2
         col=round(col)
         row=round(row)
         for ch in txt:
           if ch in [str(a) for a in range(10)]:
             cleaned.append(((col,row),ch))
           else:
             cleaned.append(((col,row),'-1'))
           col+=1
   return cleaned


import numpy as np
puzzle=np.zeros([9,9])
for c,v in clean(req0.results()):
   puzzle[c]=int(v)
print(np.flipud(puzzle.T))
mikael

@JonB, thanks, very nice. I have noted and wondered about how difficult number 1 is to recognize... Not very exotic, is it? But in my experiments it looked like the simple heuristic of ”if the result is something else than 1-9, assume it is a 1” would work pretty well for Sudoku.

mikael

@JonB, can you open up this one a little bit?

approx_num_ch=(r.boundingBox().size.width*9)
JonB

The *9 is because if the initial rectangle detection and crop works, then each square is approx 1/9 width. So the approx number of squares a rectangle covers tells us how many characters it should have... I was getting many cases where 1 got read as Te, or some other two character value, even though the width was less than one box... so I wanted to have special handling for narrow boxes, as that is probably a 1, while wide boxes could have multiple characters because the bounding box legitimately spans adjacent boxes.

mikael

@JonB, there’s something in the math here I do not quite get. I would expect something like:

num_char = r.bbox/(full_bbox/9) = r.bbox * 9 / full_bbox

Thus looks like you are missing the division?

JonB

The results of vision are always provided as normalized coordinates — meaning the full box is always 1.
For drawing, you have to then multiply by image width/height.

Since the perspective correction both fixes perspective and crops — 1/9 is the size, roughly, of a single cell.

mikael

@JonB, now I understand, thank you.

sodoku

I have a quick question in regards to the original ocr post how do I print the text as one single csv list, I tried but I have been getting a list of lists instead of one single list

This is a snippet of the code example I think needs to be altered

  success = handler.performRequests_error_([req], None)
    if success:
        for result in req.results():
            print(result.text())
    else:
        print('Problem recognizing anything')
JonB
results=[str(result.text()) for result in req.results()]

print(results)

Or maybe

print(','.join(results))
ccc

There is a lot of good code here... It would be really awesome if there was a GitHub repo to stitch it all together into an app.

mikael

@ccc, do you mean this one? PRs are always welcome.

chrillek

Hi,
I'm aware that this thread is about a year old. But maybe someone can nevertheless alighten me. I'm trying to do a similar thing in JavaScript for Automation (JXA), and I see this line in your example:

for result in req.results():
            print(result.text())

translated to JavaScript, that's

results.forEach(r => {
      console.log(r.text);
 })

and that works like a charm. I'm just wondering why, since according to Apple's documentation, the results object doesn't even have a text property, only string (cf. https://developer.apple.com/documentation/vision/vnrecognizedtext?language=objc)

I was first wondering if text is perhaps a nice Python thing, but since the same works in JavaScript, I'm sure that I'm missing something obvious in Apple's documentation. Does anyone know what (and where I should be looking)?

Thanks a lot in advance
Christian

JonB

Does string not work?

Often there are undocumented or decrecates features available in objc objects. Often we just poke around using autocomplete (which ultimately uses some of the introspection objc features of the objc runtime (which let you get a list of methods or instance vars, etc)

chrillek

Does string not work?

It does, but only in a very convoluted way, like so:

results.forEach(r => {
      console.log(r.topCandidates(1).js[0].string.js)
})

The js in the middle is required to convert the ObjC array returned by topCandidates to a JavaScript array (and again to convert the NSString returned by string to a JS string). But using string directly at r does not work.

we just poke around using autocomplete

I gues that happens in XCode (the poking around)?

JonB

No, the exploration happens in pythonista, in the console. Once you have an object, dir(variable) lists the methods and such, or frankly just typing a letter and autocomplete suggestions does it's thing.

If you're not using a bridging library like
https://github.com/TooTallNate/NodObjC
I'd suggest that you do, since it might take care of a lot of the annoying bits like converting every type to js equivalents, and let's you access some of the dynamic introspection stuff that makes objc pretty neat.

Under the hood, there are objc runtime functions that let you get lists of method names. For instance,see
https://github.com/jsbain/objc_hacks/blob/master/print_objc.py
For how you can do it in python. Or, look at the NodObjC code for class.js and core.js -- it looks like it does something similar, using the objc copy_methodsList, etc, and adds those as J's callable functions to the prototype. Then your favorite J's debugger ought to show you what is there...

JonB

In this case, looking at the headers for VNTextObservation shows the text attribute.

https://github.com/xybp888/iOS-Header/blob/master/13.0/Frameworks/Vision.framework/VNRecognizedTextObservation.h

chrillek

Thanks a lot for that. Apple's documentation doesn't mention any of these properties :-(

Arsh_gabbi

Thanks a lot for that. Apple's documentation doesn't mention any of these properties f95zone uwatch