Self driving technologies have been in the public eye for a number of years now. In fact, according to Elon, we were going to have self driving cars several years ago. Well, we do, sort of. What’s the definition of self driving again? It can be famously difficult to define what we mean when we say self driving. Though, as Supreme Court Judge Potter Stewart so famously said about pornography “I know it when I see it, and the (motion picture in this case) is not that.” Likewise, it can be tough to define exactly what we mean by a “self driving car” and it depends on whom you ask. But, I think most of us would “know it if we saw it” and so far, we’re tantalizingly close yet so far away. There is a saying in software development that the last 10% of the project takes 90% of the work. It’s true. Well, it’s probably a smidgen overly optimistic. The last 5% could take 95% of the time, and so it is with self driving. Getting a car to drive straight down the expressway is no difficult task (don’t tell GM this, their lane keeping sucks.) Conversely, knowing what to do if a litter of kittens darts across the street while child chases their ball from the other direction and someone in front spills their coffee and slams on the brakes is a whole other matter. We take for granted how amazing the human brain can be. We can adapt to many situations, even if we’ve never seen them. This can be stressful, sometimes we make mistakes too. But, we have the benefit of trillions of interconnections between neurons, something AI might never quite match. And so, many extremely talented and very capable people are trying their best and mostly doing a pretty amazing job.
But, it isn’t quite there yet, at least depending on who is answering. Does it count as self driving if it normally can do most of the work? According to Tesla, it would seem so. The current Full Self Driving beta can drive in many places nearly 100% for you.
Despite what it might seem like, FSD Beta from Tesla is really only considered level 2 or 3. The goal, of course, in self driving is to get the coveted level 5 where the car actually does drive itself. To get there the car must always drive itself no matter what. In level 5 you should be able to forgo the steering wheel and play on your phone while the car drives. Tesla just isn’t there yet, despite many promises from a certain leader with the initials EM. So far level 5 has eluded everyone. But, there are level 4 systems from a variety of companies (Waymo, Volvo, Baidu, etc). So, the future is looking bright for self driving.
A natural question is “what does it take to make a computer drive a car?” Well, ultimately the computer must control acceleration, braking, and steering. It has to know where it is and where everything else is. Some companies rely on maps – the computer essentially drives on premapped roads. Almost everyone has cameras of some sort so that the car can see what is around it. Some companies use lidar, radar, or ultrasonic sensors. These things can be expensive but help the car to know where it is and what is around it. The biggest debates surrounding how to make self driving cars revolve around two issues:
- Should the car use maps and other topographical data or should it figure out where it is just by looking through cameras and interpreting where the road is? Of course all cars will use maps to navigate but this point is more related to how it knows where it is *right now*.
- Should the car rely only on visual light cameras or should the car use a larger array of sensors?
It is no secret that Tesla has mostly doubled down on their bet that vision only self driving can work. They also really don’t use pre-mapping and instead interpret the world around the car in real time to figure out what to do. These two choices combined make for the most challenging way of doing it. The reasoning sounds logical on the surface – you, a (supposed) human being, drive via sight. Sure, you have ears, you have a sense of balance, you have a sense of touch. But, mostly you drive by looking out the windows to see what is happening around you and the car. It is, thus, logical to say that any self driving system could/should take a similar approach. However, not everyone has done this. Quite a number of companies are using radar, lidar, ultrasonics, and other technologies as well. Many of those same companies also try to create maps that the car can reference. So, why are they doing this if vision only could work? Well, it boils down to a difference in philosophy. If vision only can work then it will be reasonably cheap. A camera can be bought for under $100. Even buying several of them still only stacks up a reasonably small chunk of the cost of a car. If vision only works then the sensing hardware could be very cheap. Conversely, lidar is NOT cheap at this point in time. Radar and ultrasonic sensors can be reasonably priced. So, betting on cameras is the toughest option but potentially the cheapest. Everyone is going to use cameras anyway to help see speed limit signs and better interpret the world that was meant for vision. But, as the economy of scale changes for things like lidar it can become cheaper and cheaper until the cost is no longer prohibitive. LCD TVs were very expensive at first. Now you can buy a TV the size of your wall for under $1000. As a technology matures and gains popularity it becomes cheaper too. I have no doubt that lidar that costs $30000 today and takes up the whole top of a car will be $300 in a decade and the size of a pack of cards.
But, let’s look at this a different way. Human beings largely move around on two legs. Cars largely move around on four wheels. These are entirely different means of movement yet both are viable strategies. Why do humans have legs when wheels are more efficient from an energy standpoint? Well, one argument a biologist might make is that 360 degree free rotation is very difficult to do with biology. There are some aspects at the cellular level where such things exist. But, I think we all know how rare wheels are on animals. It is simply too difficult to transfer blood and neurological signals through a freely rotating joint. So, are we disadvantaged by the lack of wheels? Somewhat but not entirely. You see, if I wanted to I could climb a mountain. I can ascend ladders. I can kick you if you mock my legs. Legs are good for some things, they aren’t wheels, they don’t purport to be wheels. They can’t do everything a wheel can do. If I had to go 1000 miles I’d rather do it on wheels than on my own legs. But, legs aren’t without their uses. Bringing this back to the real topic at hand, yes, humans use vision to drive but that’s because that’s what we’ve got. We don’t have lidar on the sides of our head (work on this science!) so we use vision. The fact that we must rely nearly entirely on vision does not preclude other solutions from existing. So, I find the vision only arguments to be a bit short sighted (was that a pun?!) Certainly vision will have to be a component as street signs are written to be looked at, not sniffed. But, is vision all there is? Does it have downsides? Well, it does have some downsides but it has many perks as well. Let’s delve down to the differences between the sensing options.
First of all, a very valid thing to consider is the “spatial resolution” of a sensing technology. That is, from 50 feet away what size object can a given sensor pick up as a unique / discrete object. Let’s say a baseball is 50 feet away. Can you see a baseball at 50 feet away? If you have your glasses handy, sure. It’s easy to see. Can radar see it at 50 feet? Maybe, maybe not. Can lidar? Almost certainly. Can ultrasonic sensors? NO. Other than ultrasonic, the rest are just forms of light. But, even for ultrasonic, there exists an upper limit to how fine of detail any sensing scheme can show. This relates to the wavelength of the used light or sound. Visible light has a wavelength of between 380 and 700 nanometers. That’s less than 1 millionth of a meter or 1/1000th of a millimeter. This is a very fine amount of precision. Of course, the wavelength merely sets a limit for how small we can see. We still need to know the sensing resolution too. If our sensor is 10mm square and 1000×1000 pixels then each pixel is 0.01mm square. We thus will never be able to distinguish anything smaller than 0.01mm (at it hits the sensor) as two objects smaller than that size could both be seen by one pixel of the sensor and be completely indistinguishable. Of course, camera zoom and distance affect all of this so that, just as with your eyes, getting twice as far away makes it twice as tough to distinguish two objects. But, still, the limits set by the wavelength and the sensor size dictate how small of an object can be seen. Obviously we as humans can see very small objects so long as our eyesight is yet sharp. The news for radar is much more depressing. Perhaps it might be time to mention that wavelength is related to the speed of light in a vacuum and the frequency of the light. The wavelength is c/f where c is the speed of light and f is the frequency. So, let’s take a very common frequency of radiation (and for some radar) -> 2.54GHz, the frequency of standard WiFi. The wavelength is 118 millimeters (That’s 4.65 inches). Yes, your wifi router is throwing 4.65 inch waves off from the antenna. This may give you a new appreciation for the frequency of visible light – it is literally over 400THz – 400 trillion cycles per second! So, it may come as no surprise that a 2.54Ghz radar might be a little limiting with its wavelength being so large. This is true. That’s a bit limiting for sure. It is for this reason that radar can actually be around 77GHz. This brings the wavelength to around 0.153 inches or 3.9mm. So, this sets a bit of a limit for how small of an object we can discriminate but 4mm is still plenty small enough that we can distinguish cars even quite a distance away. But, it’s probably not high enough resolution to figure out whether an object is a Mercedes or a Chevy. They’ll look pretty similar to radar.
So, how do radar and lidar compare? Well, lidar can be run at 905nm or 1550nm as two options. These are pretty close to visible light. So, the wavelength limitations are similar to visible light. The sensor resolution isn’t going to be as good as a camera but potentially not so terrible. It might then seen strange to talk about a technology that is almost like visible light. But, the situation is a little more complicated. One helpful thing to consider is automotive glass. On a hot summer day while sitting in a car you might be tempted to roll down the window. If you do that while sitting you may be somewhat discouraged to find that you get a LOT hotter as soon as the window rolls down. In fact, you can feel the sun start to bake your skin the moment the window is no longer in the way. What gives? Well, not all light is made equally in the sight of God or physics.
Some materials are reflective or transmissive to given wavelengths of light while acting differently to other wavelengths. The glass used in cars is this way. You may think it looks almost 100% clear and you’d be half right. It’s clear to visible light but not all light. Light comes in lots of shades, only a couple of which you can see. The rest are bouncing around invisible but not always forgotten. The sun transmits a large amount of ultraviolet and infrared light. UV isn’t super great for you and infrared light is very prone to warm you up. And so, we come to the reason why rolling down a window can heat you up – that window was blocking the infrared light while letting through almost all of the visible light. From the above graphic, you can see that most infrared is blocked anyway but not all of it. Why is this relevant? Well, both lidar and radar are forms of light, just like the visible light you can see. They act more or less the same as visible light except for 1. they have different wave lengths and 2. they interact with nature somewhat differently. Where an object may be opaque to visible light, it may be transparent to either radar or lidar – the converse may also be true. Thus, some things that might block one will either not block another or may only partially block (like a green glass – it blocks all but green light). It should be noted that supposedly solid objects are not necessarily all that opaque to various forms of light. Radio waves pass right through you. So do X-Rays for the most part. Radar might partially pass through you but does not pass well through metallic car bodies. This makes it sometimes useful to have multiple sensing types as they each may work in different contexts and compliment each other. An easy example is rear collision avoidance radar. My Bolt has it. The Bolt will send radar out to both sides of the rear of the vehicle. This radar is not completely blocked by even the metal frames of vehicles, can pass through glass, and bounces well off of things it did not pass through. So, it will end up bouncing off of nearly everything in a 50 foot radius. This allows the radar to see through cars beside you and notify you if cars or pedestrians are coming outside the view of visible light. This is QUITE USEFUL. I cannot tell you the number of times I’ve had the car chime and then 2 seconds later someone shoots by in a vehicle, obviously going too fast for a parking lot. Visible light cannot do this. It doesn’t bounce well enough and it certainly does not partially pass through metal car bodies.
Both lidar and radar have the advantage over visible light in that they’re active – they both send out light and look for replies. Visible light is usually mostly “ambient” where we do not have control over how lit the scene is. This is both an advantage and a curse. It’s advantageous from an energy standpoint as we don’t have to provide the illumination (but at night we tend to still have to!) but being the source of light has a big advantage – you can time it! That is, with something like radar how it works is that you send out a pulse of radar and then watch for that pulse to bounce off of things and come back. Timing how long this takes allows for knowing how far away things are. There are ways this can go wrong. The radar could bounce off of multiple objects and eventually still make it back. This leads to weird artifacts where it looks like there are objects far away that really aren’t there. So, any radar has to be able to weed out the noise. In reality what radar is mostly looking for are replies that seem to be moving. Anything that doesn’t seem to make sense or doesn’t seem to be moving can probably be ignored. Bringing up the example of rear collision radar, it ignores all the parked cars and focuses only on things that seem to be moving toward your vehicle. Sometimes this could be taken to an extreme. When driving down the road you really want whatever sensors are being used to still pay attention to bridge overpasses. Overpasses are inconsiderate and refuse to budge even if you ask them please at 80MPH. They will not move but you will, quite quickly in fact.
Obviously we know about how far away things are without being able to time the light. We do this through binocular vision. Self driving cars can use the same idea. If you can get two different pictures of the same object but have the cameras a known distance apart then you can figure out how far away it is based on how much differently each image looks. But, this is somewhat of a guess in a way. There are equations and mathematically an object can be determined to be a given distance accurately from two pictures of a known offset. But, we’ve all misjudged distances on many occasions. The brain isn’t always super great at knowing the distance just from two pictures. Perhaps a self driving car can do a better job than your head. But, the farther away an object is, the harder it is to get a perfect idea of the distance as the difference between the view from each camera becomes very tiny. Close up, binocular vision is very effective at judging distance. From a quarter mile away, it really isn’t. For longer distances lidar can be more accurate. Light travels 299,792,458 meters per second. If an object is 300 meters ahead then light will take around 1 microsecond to get there then 1us to get back as well. 2 millionths of a second is not a long time to wait to find that an object is 300m in front of you. Were it only that lidar could actually give an answer so quickly. Alas, reality conspires to ruin our fun yet again. Lidar tends to be a scanning technology. Rather than firing out laser beams in all directions like drug addled sharks with fricken laser beams attached to their heads, lidar sweeps the scene. Physics, still being here and being consistent, dictates that it still takes about 2uS for the light to bounce off of an object at 300m and return but now if the scene is scanned only 30 times per second you might be waiting 33 milliseconds to find that out. Still, this would be OK as a car driving 100KPH only travels less than one meter in this time interval. The self driving would have plenty of time to react.
Above is what lidar more realistically looks like (as opposed to the crummy mock up I made above). However, one can still see how this view would create quite a lot of useful information. In fact, there is information here that the human eye or vision in general cannot produce. How far away is the nearest tree? Lidar can easily tell you. With vision you have to guess. Those guesses can be pretty accurate but the fact remains that what blinds one form of visualization does not always blind the rest. Cameras cannot see in the dark, lidar can. Cameras cannot see through fog, lidar sometimes can and so can radar. Lidar can be blinded by special attacks… as can a camera. The world is unfortunately not full of definitive and easy answers.
Without a crystal ball it is difficult to know what the future will bring. Will vision only work? Tesla and Comma.ai think so. They’ve both got very promising results but neither is even remotely close to level 5 autonomy. Waymo and Cruise are much closer but will they be able to operate at a reasonable price point? Don’t touch that dial! We’re all about to find out… in about a decade.