Skip to content

DPI and you?

I have been experimenting with tasseract and occasionally get really terrible results when I know I should not. When I grab the images, change the dpi aka save them at a different resolution I get great results.

Tasseract wants 300 dpi. when I google dpi of digital images or how to change dpi on android most of the information is along the lines of “you dont want dpi, you dont know what you are talking about, your life is a lie and nobody cares about you”. So what is going on here?

Maybe some more details from Tasseract will help:

Is there a Minimum Text Size? (It won’t read screen text!)
There is a minimum text size for reasonable accuracy. You have to consider resolution as well as point size. Accuracy drops off below 10pt x 300dpi, rapidly below 8pt x 300dpi. A quick check is to count the pixels of the x-height of your characters. (X-height is the height of the lower case x.) At 10pt x 300dpi x-heights are typically about 20 pixels, although this can vary dramatically from font to font. Below an x-height of 10 pixels, you have very little chance of accurate results, and below about 8 pixels, most of the text will be “noise removed”.

So it looks like they want the letters to be within a certain range of pixels tall. They only give minimum pixels here, around 20 pixels minimum. But I believe there is also a maximum pixel height they will look for. In my previous post ‘tesseract’ small unclear text was picked up reliably but large very clear text was ignored. From what I can find on google it does not appear there is a configured maximum but a algorithm that tries to determine it based on picture height.

My current image is 115 x 59 and the text is about 1/3 of that height or about 19 to 20 pixels high. It seems silly to my that doubling the pixels in the picture with no quality improvement would yield better results but I guess I can give it a try.

So my next steps will be

  1. Make sure my text is above the minimum pixel xheight by a safe margin
  2. study up more on the max height algorithm. maybe I just need to zoom out?

building AnySoftKeyboard

Trying to build anysoftkeyboard

https://github.com/AnySoftKeyboard/AnySoftKeyboard/wiki/How-to-Build-AnySoftKeyboard/8d1a075333ff9a649f237aba35b0a4da74821d81

android update project -p .
Error: . is not a valid project (AndroidManifest.xml not found).

cool, the android manifest is actually a couple directories deep. ill skip this for now

next step


gradle build
:buildSrc:clean
:buildSrc:compileJava UP-TO-DATE
:buildSrc:compileGroovy

FAILURE: Build failed with an exception.

* What went wrong:
Execution failed for task ':compileGroovy'.
Cause: You must assign a Groovy library to the groovy configuration!

* Try:
Run with --stacktrace option to get the stack trace. Run with --info or --debug option to get more log output.

BUILD FAILED

Total time: 2.374 secs

well that went well. Probably because I have a super old version of gradle by default with ubuntu that doesnt include the groovy configs by default.

Installing the latest and greatest gradle fixed it. Overall I enjoy apt-get but this seems to be a recurring problem!

trying out (another) android-ocr app

After getting my feet wet with the simple app in my last post I wanted to try a little bit more advanced app.

Getting it running was pretty straight forward.

  1. git clone https://github.com/rmtheis/android-ocr.git android-ocr
  2. made sure it was pointing at tess two
  3. right clicked on project and ran as android app

android_ocr_Screenshot

it works pretty well. thoughts

  • I dont need the translation stuff so I will rip that out.
  • zoom would be nice for when you are a little farther away
  • there is some code for continuous recognition, that sounds neat but im not sure how to enable it
  • it would be nice to prompt the user if this is correct, save the files

Starting out with Simple-Android-OCR on ubuntu

pretty boring post. basically followed this tutorial

  1. make sure you have the build toolssudo apt-get install build-essential
    sudo apt-get install ia32-libs
    sudo apt-get update
    sudo apt-get install ia32-libs
    sudo apt-get install openjdk-6-jdk
    sudo apt-get install icedtea-plugin
  2. install android sdk
  3. install android ndk
  4. add adt tools to your path and .bashrc
  5. build tess-two
    git clone git://github.com/rmtheis/tess-two tess
    cd tess/tess-two
    ndk-build
    fix target in project.properties
    android update project –path .
    ant release
  6. build eyes-two
    cd ../eyes-two/
    ndk-build
    fix target in project.properties
    android update project –path .
    ant release
  7. build simple-android ocr
    git clone https://github.com/GautamGupta/Simple-Android-OCR.git Simple-Android-OCR
    go change target version in properties file
    point the project at the tess two install path
  8. I had trouble getting my phone to allow the debug connection. “adb devices” would either return nothing or the device serial number as question marks or the status as unauthorized. I tried several combinations of restarting adb along with restarting the phone, unplugging and replugging, disabling / enabling dev mode. I think is what finally did it was removing all currently authorised computers from my phone.

    ANDROID_SDK_HOME=~/adt-bundle-linux-x86-20140702/sdk
    export ANDROID_SDK_HOME
    and add to .bashrc
    2015 adb kill-server
    2016 adb start-server
    2017 adb devices

  9. open eclipse, right click on simple ocr project and run as android application.
  10. success!
    dev

tesseract

Tesseract is probably the most accurate open source OCR engine available. It differs from OpenCV because opencv is a general purpose image library. You could use it to build something like Teseeract.

How well does it work? I downloaded the latest portable version to try it out. The ReadMe is very helpful.

IMG_43111

out of the box

Out of the box just point it at an image.


>tesseract.exe IMG.JPG out
>cat out.txt
*1
*3
'?
i

v

.‘-go.-—: qv..»- . v_—
r : -.; 1

‘LT WT

Lp LMT 223500
62500

_ ' -r~,.\'4-5-p-—-A-. 4.
ts‘ "3' ”'

2 INCH HF COMP SHOES

..,. ...¢...-..

,, ..--4
..r.....

Pretty impressive that it correctly read the small text at the bottom but pretty said it missed the giant text in the middle.

focus on digits

Maybe if I focus on digits I can get the big “352264”.

In Tesseract-OCR\
>cat tessdata\configs\digits
tessedit_char_whitelist 0123456789
>tesseract.exe IMG_4311.JPG out digits
>cat out.txt
21
23
3
0

0

20522024 0101022 2 222
1 2 252 0

17 0011

00 1001 223500
02500

2 0 0554112402 42
2200 11 000

2 10168 512 60009 551053

2402 0242402

202 0224
2222232

So my guess is the letters are too big compared to the size of the picture. Tesseract is really geared towards looking at a page of text so it would make sense to ignore larger patterns and focus on smaller ones.

general approach for best results from tasseract

according to this stack overflow post

  1. fix DPI (if needed) 300 DPI is minimum
  2. fix text size (e.g. 12 pt should be ok)
  3. try to fix text lines (deskew and dewarp text)
  4. try to fix illumination of image (e.g. no dark part of image
  5. binarize and de-noise image

fix DPI

The original dpi of the image was 72. Probably a setting in the camera that could be changed, or changed automatically with pre processing.

As a quick test I changed the dpi in gimp to the recommended 300.


>tesseract.exe 300dpi.JPG out
>cat out.txt

_`__ ___`___,.. ....-u-9.`-""

._ , ,~...,.--... ....

..- ...-..-.......

..,\.-.,~. -
,, .. ....~\-.»..v
, ., ._..x-. o-

CHTT mi
«~352264

«:4.»

`$7

a
I
.

`a
n

V.`-w-v -2
..
'''-_§_``'.: '. `

QLD LMT 223500
`L1 WT 62500

co.
..
_ . ' ".".."4-1`
.:...--"!CT::~ ts`

2 INCH HF COMP SHOES

Success! I got the string CHTT 352264 I was looking for. But there is still a bunch of junk..

only allow alpha numeric

In Tesseract-OCR\
>cat tessdata\configs\alphanumeric
tessedit_char_whitelist 0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ
>tesseract.exe IMG_4311.JPG out alphanumeric
>cat out.txt

V1 HFR4 9P

A 4 N NW

H 4 N

V A
H A V
F N

CHTT M
U352264

V4

37

I
1

V

V 4

V1 2

3 LD LMT 223500
LT WT 62500

A
P P 41
A IQ17 M

2 INCH HF COMP SHOES

what is next?

  • use regex to limit the pattern I want
  • pre-process the image more, possibly only look at largest text

links

ros arduino notes

Most of the groovy packages are broken including rosserial. Short version of building from source:

cd ~/overlay_ws/src
git clone git://github.com/ros-drivers/rosserial.git
cd ~/overlay_ws/
catkin_make
source ~/overlay_ws/devel/setup.bash

quick ros tutorial on using rosserial
http://wiki.ros.org/rosserial_arduino/Tutorials/Hello%20World

start the ur5 simulator

roslaunch ur5_description test.launch
source ~/overlay_ws/devel/setup.bash
rosrun rosserial_python serial_node.py /dev/ttyUSB1
source ~/workspace/devel/setup.bash
rosrun talker talker.py
rosrun rqt_graph rqt_graph
rostopic echo JointState
rostopic pub -1 /shoulder std_msgs/UInt16 195

getting urdf_tutorials to work in groovy with ubuntu 12.04

This should be pretty straight forward but unfortunately the package is broken. Groovy is supposed to use .rviz configuration files for rviz instead of the older .vcg files. The urdf_tutorials were never properly tested or groovy changed sometime after urdf_tutorials were released.


dpkg-query -W ros-groovy-robot-model-tutorials
ros-groovy-robot-model-tutorials 0.1.2-s1374436389~precise

dpkg-query -L ros-groovy-robot-model-tutorials
...bunches of files
/opt/ros/groovy/stacks/robot_model_tutorials/urdf_tutorial/urdf.vcg
...bunches of files
...no .rviz file

The best way I have found around this is to grab the latest urdf_tutorials source from git and use everything else stock as installed from the repos. This is called workspace overlay, setup here.


mkdir -p ~/overlay_ws/src
cd ~/overlay_ws/src
git clone git://github.com/ros/ros_tutorials.git
git clone git://github.com/ros/urdf_tutorial.git
cd ~/overlay_ws/
catkin_make
source ~/overlay_ws/devel/setup.bash
echo "source ~/overlay_ws/devel/setup.bash" >> ~/.bashrc
roscd urdf_tutorials
pwd
# should be ~/overlay_ws/src/urdf_tutorial
cd ~/overlay_ws/src/urdf_tutorial/urdf_tutorial
roslaunch urdf_tutorial display.launch model:=06-flexible.urdf gui:=True

now rviz should pop up. When it used to work out of the box the robot would be visible. For me this did not happen.

norobot

I had to click add in the lower left

add

select the robot model and click ok

select_robot_model

now I can see the robot, but the gripper is not there and the robot appears to be below the origin. so there are still errors

robot

you can use the joint state publisher to interact with the model.

quick ros models to play with

roscd ur5_description
roslaunch ur5_description test.launch gui:=True

roscd urdf_tutorial/
roslaunch urdf_tutorial display.launch model:=06-flexible.urdf gui:=True

maximum recursion depth exceeded while calling a Python object

have you ever ran into an error that you remember getting before but don’t remember how you fixed it or why it happened? This is my attempt to break that loop for this error.

my project was structured as such

/gateway.py
/gatewayTest.py

I had more modules and classes to add so I wanted to make a package.

/serverSupport
__init__.py
gateway.py
gatewayTest.py

When I ran the test I received a great gift! Yippie!

Finding files... done.
Importing test modules ... done.

Exception RuntimeError: ‘maximum recursion depth exceeded while calling a Python object’ in ignored
Exception RuntimeError: ‘maximum recursion depth exceeded while calling a Python object’ in ignored
Exception RuntimeError: ‘maximum recursion depth exceeded while calling a Python object’ in ignored
Exception RuntimeError: ‘maximum recursion depth exceeded while calling a Python object’ in ignored
Exception RuntimeError: ‘maximum recursion depth exceeded while calling a Python object’ in ignored
Exception RuntimeError: ‘maximum recursion depth exceeded while calling a Python object’ in ignored

so it looks like that gatewayTest.py is trying to extend itself, repeatedly. I’ll move the test out of the package so when the test tries to load the package it doesn’t try to load itself which would try to load the package which would load itself.


/serverSupport
__init__.py
gateway.py

/gatewayTest.py

This works! but it would rather keep the tests in their own directory or with the package..

The following fails with the same recursion error


/serverSupport
__init__.py
gateway.py

/test
gatewayTest.py

I’m annoyed. fix later.

sparkfun quadstep driver refactor

i bought a quadstep from sparkfun to run stepper motors.

the supplied arduino driver

  1. did a good job of showing the quadsteps functionality
  2. is free
  3. was limited to the arduino mega
  4. had hardcoded pins
  5. could use some cleanup

 

i have also been reading clean code: a handbook of agile software craftsmanship by robert martin and thought this would be a good opportunity to apply some of the concepts.

When re-factoring it is critical to have tests. I want to make incremental code changes and test after each one. I want to improve the code, not break it. The supplied drive only supports the the Arduino Mega or Mega 2560. This is because of hardcoded pins in the driver. I will need to fix the hardcoded pins before I am able to test the driver with my hardware.

soft-coding the step pin

The step pin is hardcoded for all motors. I believe the author did this because the stepping speed was a concern and the author was considering using analogWrite to do pulse width modulation (pwm). PWM was never used so there is no longer a need for hardcoded pins.

In this library the pins for each connection are stored in a private variable. an additional private variable for each channel is needed to hold each step pin number. another argument is added to the motor_pins function so the variable can be initialised with the desired pin.

The cbi / sbi functions can all be replaced with digital writes. The newer arduino libraries use cbi/sbi in the background when the pin can be determined at compile time. there is A LOT of duplicated code that i really want to take out but I will resist until i get a test working.

https://github.com/johnjamesmiller/quadstep/commit/699e94e261a5fd950ca38ca5c2ecc28945b91b2c

Repeated code

repeated code is a place for bugs to hide. if you have to make the same change multiple places what are the chances you will do it exactly the same each time? the quadstep.cpp file is 630 lines long now and most of it is repeat code. lets start condensing some code.

motor_pins


	if(motnum == 1)
	{
	        repeatedPinModeFunctions();
       	        uniqueVariableInitilizations = parameter;
	}
	else if(motnum == 2)
	{
	        repeatedPinModeFunctions();
       	        uniqueVariableInitilizations = parameter;
	}
... continued for motors 3 and 4

In the motor_pins code there are 8 lines of pin mode functions repeated for each motor. extract the duplicated code out of the if/else blocks and execute it before.


       	
	pinMode(motor_enable, OUTPUT);
	pinMode(motor_dir, OUTPUT);
	pinMode(motor_step, OUTPUT);
	pinMode(motor_ms1, OUTPUT);
	pinMode(motor_ms2, OUTPUT);
	pinMode(motor_ms3, OUTPUT);
	digitalWrite(motor_enable, HIGH);
	digitalWrite(motor_dir, LOW);
	
	if(motnum == 1)
	{
		_motor_enable_1 = motor_enable;
		...
	}
	else if(motnum == 2)
	{
		_motor_enable_2 = motor_enable;
		...
	}
... continued for motors 3 and 4

motor_go

Each motor has the same stepping code repeated 5 times for the 5 different stepping values. The hierarchy is:


motor1
 if(step_size == 1)
                //sets speed
                current_control(1);

                //sets step_size
                digitalWrite(_motor_ms_11, LOW);    
                digitalWrite(_motor_ms_12, LOW);    
                digitalWrite(_motor_ms_13, LOW);

		digitalWrite(_motor_enable_1, LOW);
		for(int i=1;i<=number_of_steps;i++)
		{
			//low to high transition moves one step
			digitalWrite(_motor_step_1, HIGH);
			delayMicroseconds(step1);
			digitalWrite(_motor_step_1, LOW);
			delayMicroseconds(step1);
		}
		digitalWrite(_motor_step_1, LOW);
		digitalWrite(_motor_enable_1, HIGH);
 else if(step_size == 2)
                //sets speed
                current_control(2);

                //sets step_size
                digitalWrite(_motor_ms_11, HIGH);    
                digitalWrite(_motor_ms_12, LOW);    
                digitalWrite(_motor_ms_13, LOW);


		digitalWrite(_motor_enable_1, LOW);
		for(int i=1;i<=number_of_steps;i++)
		{
			//low to high transition moves one step
			digitalWrite(_motor_step_1, HIGH);
			delayMicroseconds(step2);
			digitalWrite(_motor_step_1, LOW);
			delayMicroseconds(step2);
		}
		digitalWrite(_motor_step_1, LOW);
		digitalWrite(_motor_enable_1, HIGH);
 step4
 step8
 step16
motor2
 step1
 step2
 step4
 step8
 step16
motor3
 step1
 step2
 step4
 step8
 step16
motor4
 step1
 step2
 step4
 step8
 step16

each stepN is a block of 18 very similar lines of code. the current control is called at the beginning of each block to set one of the step size variables (step1, step2, step4, step8, step16). I am going to condense the 5 variables to 1. Now the current_control function is a lot smaller but more importantly each block is more similar and does not have to be repeated. Each block is left with 3 unique lines. the other 15 lines only need to be repeated once per motor.

Wow I am down from 630 lines of code to 323. What did I lose in those 300 lines of code? previously you were forced to pass 1,2,4,8,16 or it wouldnt step. now I can pass any number, probably not a good idea.

enums

I will add an enum and pass that instead of an integer. This will force the user of the library to select one of our values. This also adds clarity when reading the code. now instead of seeing the counter-intuitive number 16 you see SIXTEENTH and you know it is a smaller step then FULL.

A reminder that it is important to make small changes and test after each change, insuring that the code still works.

https://github.com/johnjamesmiller/quadstep/commit/b2ddf7dd29cae197ae17d06dd6b71f21242df0ad

comments are a failure to express yourself in code.

What is worse then no information at all? wrong information. wrong information is worse. Most comments are a duplication of information in the code. As the code is changed comments do not get updated. There is real world examples of this in the original code (quadstep.cpp line 369, 379, quadstep.pde line 61). This code has only been around for a year and maintained by one person. Imagine what happens to large enterprise projects after 10 years and 20 people have roughed up the comments. eventually maintainers ignore all of the comments. some comments are good but before you use one think long and hard if it is the best thing to do. or just be lazy and don’t put any comments.

This is the single easiest take away from clean code. Whenever you see a comment above a block of code turn that comment into a function name and put the code block in the function. If the comment is above a function consider renaming it.


	//sets speed
	current_control(step_size);

In eclipse use alt+shift+r to rename a function and eclipse will automatically update all the places the function is used.


	set_speed(step_size);

There are quite a few more examples in this code but I will let them be for now.

classes

A class holds variables and the functions that act on them. The variables in the quadstep class are the pins for each of the 4 motors and the functions that act on them. The functions are exactly the same except for the variables they act on. The class should be condensed to a unique set of functions. It should handle one motor. Each motor should be an instance of that class.

Now using the code looks like this:


quadstep motor1;
quadstep motor2;

motor1.motor_pins(0,1,2,3,4,9);
motor2.motor_pins(7,6,A0,A1,A2,5);

motor1.motor_go(FULL,-200,2);
motor2.motor_go(FULL,200,2);

motor1.stall();
motor2.stall();

There are many other “smells” that say this should happen. The functions first argument is a flag, the motor number. Flags indicate that the function does more then one thing. The “Single Responsibility Principle” says that functions should do only one thing. Switches and if else chains are another indicator that the function has more then one responsibility.

The cpp file is now down to 115 lines from the original 630 lines.

https://github.com/johnjamesmiller/quadstep/commit/a3fac1927270252f4ef979f75a69117dd2be6199

comments, round 2

Now that I don’t have to make 4 direction functions, 4 enable functions, 4 ect I can extract each of the commented blocks of code out. The idea here is to separate levels of abstraction. The motor ‘go’ function needs to know that you must set the direction, enable the motor, step, and then disable the motor. The motor ‘go’ function does NOT need to know how you enable the motor, that is a lower level of abstraction.

The motor go function is now much more readable


void quadstep::go(step_modes_t step_size, int number_of_steps, int torque)
{
	_torque = torque;
	set_direction(number_of_steps);
	set_speed(step_size);
	set_microstep_format(step_size);
	enable();
	for(int i=1;i<=abs(number_of_steps);i++)
	{
		step();
	}
	disable();
}

https://github.com/johnjamesmiller/quadstep/commit/af13236e47700707083e0f2459a844d7a2218ccd

hidden temporal couplings

The _torque member is the only variable directly set here. Interesting, what does it do? It is used by the set_speed function. If I were to call the set_speed function first it would not behave correctly. The torque parameter should be passed to the set_speed function so the temporal coupling is exposed. It turns out that is the only place torque is used so I can delete the member variable.

https://github.com/johnjamesmiller/quadstep/commit/2ae188f3d39a598d189f4be56ae0611912b9a76f

avoid mental mappings

Setting the motor pins is annoying. I have to make look back and forth between


  motor_pins(y,z,l,m,n,o)
  y: enable pin assignment
  z: direction pin assignment
  l: MS1 pin assignment
  m: MS2 pin assignment
  n: MS3 pin assignment
  o: step pin

and this


  motor1.motor_pins(0,1,2,3,4,9); //ch 1
  motor2.motor_pins(7,6,A0,A1,A2,5);      //ch 2
  motor3.motor_pins(12,13,A0,A1,A2,10);  //ch 3
  motor4.motor_pins(4,27,A0,A1,A2,11);  //ch 4

several times to make sure I put the correct pin in the correct location. Your brain cycles are too valuable for this. Any time you find yourself mentally mapping between one thing and the other you should consider changing the code.

Setting the pins has now become self explanatory.


  motor1.set_enable_pin(0);
  motor1.set_direction_pin(1);
  motor1.set_step_pin(9);
  motor1.set_microstep_select_pins(2,3,4);

This has expanded the code a bit but I am ok with it because the code is now more readable and self explanatory.

https://github.com/johnjamesmiller/quadstep/commit/04aafea841ba8923951801aa4027c070e9562f42

conclusion

Try to make your intentions as clear as possible. Make small changes. Test after every change. Take pride in your code. That reminds me I should add my name to the headers.

git://github.com/johnjamesmiller/quadstep.git

I am sure there are more ways this code can be improved but I am tired of rambling in this excessively long post.