Wednesday, August 11th, 2010
Daily Archive
Daily Archive
Welcome Atomic Fungus to the blogroll, first of all. I started looking at Ed because Dick linked to his livejournal entry about being harrassed by the Crete cops; turns out Ed lives a short distance from here. And he’s a grade-a wiseass, I look forward to going to the range with him.
He has a post here that includes a reference to robotic vision and memory. This is a subject with which I have intimate experience, and I commented there. I’d like to expand on it here, because it’s a subject of some interest to me
Recognizing something is a monumentally difficult task that humans do with remarkable ease. Our brains follow a process that sort of acts like a Pachinko machine. Let me explain.
You see something. It has a shape. Your eyes focus on it, using stereo vision, so you have a good judge of how far away it is. You don’t conciously do this. you just do. Your eyes have a centerline of vision; the centerline of vision changes based on an object’s distance- you basically draw a line from the back center of the eyeball to the front center of the eyeball to the object. When you see the object, both eyes are focused on the object, but the intersection of the angles of the eyes is what relays the information about it’s distance to you. (well, that and the amount of distortion of the lens that is required to bring the object into optical focus)
Anyway, using those two bits of feedback, we have a “distance”. Combine distance with shape, and the mind begins to work.
Ed uses the example of “Fork”, and it’s a good one.
The Pachinko ball with it’s two data, “Shape” and “Distance” fall through a lot of decisions immediately. You know it’s not a couch, or a giraffe, or a car, so the ball speeds right by that stuff. The ball may bounce off “pitchfork” briefly, you may even think “Pitchfork” briefly, but the “distance” data lets you mentally calculate “Size” so you know “Pitchfork” is not it.
The little pachinko ball keeps falling through the decision tree, bouncing momentarily off this pin and that, until it falls down the hole called “Fork” and turns on the “Fork” bit in the CPU. then, the CPU can project the virtual image “fork” on the object so the spatial relationship is established. (is the for on it’s side, tines to the left, right, etc) Sometimes this can be messed up by an optical illusion; you might see the fork as laying on it’s bottom with the tines pointed up, but it is laying on it’s top with the tines pointing down. Without a perspective view or shadows it can be hard to tell- in fact, that’s what an optical illusion is, your mind overlays the virtual image “Fork” on the fork, but then you find that it’s actual position disagrees with the virtual image your CPU has overlaid on the object.
This bit of computation happens in a remarkably tiny amount of time. We only even know it sort of happens this way because of the things the little ball bypasses on the way down (remember the “Pitchfork” digression?), little side alleys it almost gets trapped in. Sometimes these little things float to the surface of our awareness and we think, “hmm, a fork is like a pitchfork. THey even share some letters in their name”
We also have some idea of how this works because of the way we recognize complex shapes. When you see someone, your little pachinko ball of observation is automatically redirected into the “Human” funnel, because we know at once the observation that it is not a building or a bucket or a baseball or a blimp.
We see the human and recognize it’s humanness but then we have a different set of rules to follow. Like a tennis ball is very similar to a baseball we now have to look at features and textures to begin to distinguish one human from another.
Exceptions to the normal “face” rules obviously make a huge difference in our process. Someone lacking a nose or ears would be an immediate flag- but as is, under the “NORMAL FACE” category, we bounce around until the observation falls in it’s proper hole, and the interesting thing about this, is we can adapt as a person ages, taking in the changes in facial features as they age and change. We know that the brain overlays our 3d memory of the shape of a person’s face on it, because a dramatic change, say, seeing someone for the first time in years, will be a shock to us, because our CPU is trying to overlay the outdated 3d virtual image onto the new actual face. Or, it could be something as simple as growing a beard, shaving it off, or a change in hair color, or hairline.
It is interesting, to me, when an object’s shape and size and distance is ambiguous and we get tricked. I love to look at people’s reactions when they see outrageously disproportionate images of things- like the 20′ long ants that used to be on the hillsides at the Morton Arboretum, or the things you can buy at “Big!”
I’m also fond of optical illusions because I have a kind of an understanding of how the process of visual recognition works, at least in machines, and it’s the optical illusion that will be the most difficult to defeat in machines because we’re not at all certain about how to thwart them in humans
I have written some very rudimentary vision hierarchy routines. Most of them have to do with groups of ten or fewer objects, and robot speed being what it once was, it was, for the longest time, adequate to look at this in a very binary way. The code would look something like this:
#observation EQ fork = false
#observation EQ spoon = false
#observation EQ hat = false
#observation EQ boat = false
#observation EQ zebra = false
#observation EQ knife = false
#observation EQ beer bottle = false
#observation EQ can opener = false
#observation EQ rubber duck = false
#observation EQ dumpster = false
#observation EQ hammer = true
The machine sorts through it’s library until it gets a match. Then it uses shape and distance to get size, and find it’s orientation, and report it’s orientation back to the CPU. The robot can then pick.
As the number of objects the robot needs to interact with increases, the complexity increases. The speed of industrial processes, as well, have increased. So things have to happen faster.
The state of the art has changed, so we can do other things. We can determine distance using a simple laser, and use pixel count to determine size. So, we can have a decision tree that looks like this:
While (pixelcountXdistance)>300,000,000 do
//look for object in “large” category
#observation EQ boat = false
#observation EQ zebra = false
#observation EQ dumpster = false
While (pixelcountXdistance)>300,000 do
//look for object in “medium” category
#observation EQ hat = false
#observation EQ beer bottle = false
#observation EQ hammerr = false
While (pixelcountXdistance)>300 do
//look for object in “small” category
#observation EQ fork = true
#observation EQ knife = false
#observation EQ spoon = false
So this can be done to kind of mimic the way the brain “seems” to search for recognition of an item.
I think it’s a good model. Sometimes, builders and inventors build models of objects so they can better demostrate their operation or construction to peopl incapable of comprehending the vision otherwise, and if we can make, using machine vision, a model that approximates human recognition of objects, we can perhaps understand better how the mind works.
Y’all go back to your coffee now. I’m done. Sorry for the degreasion.