Recognise Text Object iOS SDK

One of my relatives makes use of an assistive device for their eyesight, which is an interesting example where a set of technologies, (hardware, and a variety of software capabilities) are used to augment the senses as an assistive aide.

Mobile devices are ubiquitous and whilst not as specialised as the hardware my relative was using (something that seemed to incorporate a google glass like device), the mobile platform may offer some opportunities for assistive technologies.

With the Apple SDK’s release of CoreML, there is a toolkit that places machine learning capabilities on the edge of the architecture, embedded in mobile devices rather than necessarily executing on a server platform. Of course it does not mean data needs to reside on those devices, rather the centralised architecture can be used for data storage and modelling and assessment, and once the model is produced it could possibly be distributed to the edge compute.

Similarly with this type of capability there also appears to be a growing trend in sharing of pretrained models. Hence the ability to distribute the same models on a variety of platforms is interesting, as well as the promotion of an open source approach to the distribution and sharing of models. It is interesting in some cases, as the associated model assessment may be distributed either in form of reference articles or possibly as extracts that reside with the model repo.

So it seems there is alot of opportunity for synthesising these capabilities, in a manner that is relatively cost effective, and the mobile platform seems to be a nominal target for the dissemination of these capabilities. Where they may be translated into a variety of different applications.

So I thought I’d spend a weekend and quickly see if I could put togethor something that could use the iOS SDK to capture a video stream, segment the images to identify text areas, make use of an pretrained set of models in the form of an open source OCR toolkit to convert from image to text and leverage the TTS capability to read out the result.

Of course this is a toy project to play with some of the ideas around possibility for assistive tech for sensory augmentation, in this case, it does require the use of a visual UI unfortunately, but incorporates audio feedback.

There are obvious problems with using a visual interface in providing an assistive UI for the visually impaired, but potentially it is tenable for those with partial visual impairment.

Project Link: recognisetextobject

The README muses on the application as an assistive tool, as well as identifies some of the more difficult problems with the domain, especially in the area of image preprocessing prior to the OCR stage of the pipeline.

It also lists the toolkits used.

What was interesting is that with relatively little time, it was possible to rapidly put togethor a combination of capabilities on the platform in an attempt at realising the goal of the application. Under good lighting conditions with appropriate contrast the application worked reasonably well. It is, however, very far from a reasonable MVP, since more attention would be required in the image preprocessing area, which is non-trivial.