The Many Meanings of Artificial Intelligence
Subscribe with RSS
As a preface, I am no linguistic prescriptivist (nor a linguist, for that matter). This is not a criticism, but an observation, of the way Artificial Intelligence, or AI, has two different, but not mutually exclusive definitions.
But first, let's put those definitions to paper
The main unifying theme is the idea of an intelligent agent. We define AI as the study of agents that receive percepts from the environment and perform actions
Preface, Artificial Intelligence: A Modern Approach
This is the "traditional" definition of Artificial Intelligence. Note that it is focused on the result, not how you get there. What makes something AI is an abstract "agent" in an environment, where it receives signals in that environment, and chooses actions that can be done in that environment (typically it's stated that it needs to be rational in addition, which usually means trying, though not necessarily succeeding, in doing well by some defined metric).
And indeed, most of that book, which is the preeminent text used for undergraduate AI classes, is spent not on neural networks, or machine learning at all, but things like Dijkstra's algorithm for pathfinding, a fact which no doubt disappoints many college students every semester.
Next, the more colloquial, "modern" definition. In lieu of anything official, I'll offer up one
Artificial Intelligence is the use and study of deep neural networks
Oftentimes "deep neural network" is replaced with "machine learning" as well.
Note that the latter is a superset of the former. Now this is about the how: the technique used, not the result gained. A neural network controlled robot is as much "Artificial Intelligence" as a 10 layer neural network to classify the hand-written digits in MNIST, which stands in contrast to the prior definition. There's no mention of agents or environments at all!
And very interestingly, both terms are used with similar vigor. Here, an excerpt from CMU's Libratus, the first poker algorithm to beat professional players in a full handed game.
Libratus, an artificial intelligence that defeated four top professional poker players in no-limit Texas Hold'em earlier this year, uses a three-pronged approach to master a game with more decision points than atoms in the universe, researchers at Carnegie Mellon University report.
It quite proudly bears the label of "artificial intelligence". I'll spare the details, but Libratus involves no neural networks or machine learning of any kind. It's based on Counterfactual Regret Matching. And there's no doubt Libratus matches the first definition: Poker is the environment, and it is a very good agent in that environment.
Then there's this IBM blurb
As cyberattacks grow in volume and complexity, artificial intelligence (AI) is helping under-resourced security operations analysts stay ahead of threats.
IBM is IBM, but you can find similar blurbs for many companies. In fact, almost every instance of "artificial intelligence" offered by a company does not fit the first definition, since many instances of "rational agent in environment" are not particularly, well, useful, and thus not commercializable.
You have "AI Upscaling", or "AI Face Detection". These aren't really about environments and agents. They're functions, albeit very high dimensional and complicated functions, from images to more images, or images to bounding boxes.
A good litmus test for "is this the first definition or is this the second definition" is "would this be AI, if it wasn't using neural networks?" Remember, the prior definition only cares about the result. Is linear or bicubic upscaling AI? Not really.
So it's the latter. And, in fact, I would wager that their "AI security" likely doesn't involve neural networks either. As far as I know, more traditional methods like Isolation Forests and Local Outlier Factor still outperform their "neural" brethren in the very finicky field of anomaly detection, although there were some papers on autoencoder based anomaly detection. I don't work at IBM, and never have, so I can't say for certain it doesn't involve neural networks. But, I can say with certainty that unless you stretch the definitions of "agent" and "environment" quite far, whatever IBM is doing does not count as AI as per the first definition.
Then there's the intersection. Take AlphaGo, for instance. It is both definitely a rational agent in an environment, and uses deep neural networks!
CS as a field is no stranger to weird names. This is same field that gave us gems like Dynamic Programming, a term quite literally invented to be misleading, and the Trie, pronounced... "tree".
But I think the rise of the second definition to "Artificial Intelligence" really spawned from another, weird name: Neural Network. And that's why we have this weird situation: there's two parallel, independent definitions that grew together.
So we must delve into what neural networks are. Feel free to skip to "tl;dr" if this bores you, because it probably will.
It all started with the Perceptron. I won't bore you any more with the details, but basically the Perceptron algorithm, given a dataset, would generate a linear model to classify it. That linear model just looked like
y = Wx + b
Where W is a matrix and b is a vector. But otherwise it looks very similar to the good ol' y = mx + b you learn in elementary school, and if your data is 1-dimensional, then it is just that.
Here's an example
Imagine green and purple were two classes. Note the straight line that separates the two classes perfectly; this is a special case, and we call data in which this is possible linearly separable. And this is the kind of boundary the Perceptron algorithm could find.
Frank Rosenblatt created the Perceptron algorithm, and with it many of the claims that reemerged today
[The Perceptron] is the embryo of an electronic computer that [the Navy] expects will be able to walk, talk, see, write, reproduce itself and be conscious of its existence.
-The New York Times
However, Marvin Minsky and Seymour Papert published Perceptrons: an introduction to computational geometry, illustrating many of the flaws of the Perceptron, which killed all interest in it. Of particular note, it apparently shocked everyone that it couldn't even replicate the XOR function, which honestly never made sense to me.
Here's what XOR looks plotted. Of course a straight line can't model it without a kernel. 🤷
Other, perhaps more practical issues (linear models are great, despite that!) lied in the fact that the Perceptron only works on linearly separable datasets. On this diagram I shamelessly stole from University of California, Berkeley slides, you can see that not only does it fail, it fails catastrophically when the data isn't linearly separable
on the right, the "best" linear solution; on the left, the possible boundaries the perceptron could end up on
This is a pretty big problem, considering most datasets are noisy to the extent that classes are almost certainly not linearly separable. Or, if even just one point is mislabeled, that's likely enough to render the whole dataset inseparable.
Later, this problem would be solved by the soft SVM (which are still used commonly today!), that used slack variables in the cost function to get approximate classification boundaries, but that's another story, I just wanted to mention it if anyone wanted to Google it.
Perceptrons but on top of each other
Now, even if you haven't studied this before, you can clearly see at least one of the issues: the model is too simple (though, that is also a major benefit of linear models! They tend not to overfit) for XOR, or many tasks.
But... what if you make it non-linear? What if you stick a non-linear function ontop of the perceptron, and then put fed that into another perceptron? Let's take an example in 1D.
If you had y = 3x + 2, and we applied the RELU function (defined by RELU(x) = max(0, x), basically it's the a linear line but with the negative part flattened), here's what we get
Well, that doesn't look linear, now does it.
To give you some intuition, imagine we had the simple non-linear function y = x^2
Quite clearly this function cannot be represented by a linear model like our perceptron. But, what about our new function, tweaked by a bit? Let's try layer1(x) = -1x + 0, layer2(x) = 1x + 0, with relu in between. That's effectively f(x) = -1relu(x).
That's certainly much closer than a single line could ever get. And, as we add more and more layers, we can get a closer and closer approximation. That's the multilayer perceptron, or a feed-forward Neural Network. And, in fact, not only can it approximate x^2, it has been proven that it is a universal function approximater.
Whenever the activation function is continuous, bounded, and non-constant, then, standard multlayer feedforward networks can approximate any continuous function arbitrarily well with respect to the uniform distance, provided that sufficiently many hidden units are available
Hornik Theorem 2
And that is the key to some of a neural network's magic. Image classification, for instance, can be seen as an incredibly complex function, where the input is a 32x32(x3) matrix (or whatever the dimensions) of pixels, and the output is a label. That function can be approximated with arbitrary accuracy by a multilayer perceptron.
(As an aside, if you wonder how this model is fitted, it's with relatively simple technique. Gradient descent is like the continuous version of hill climbing; you pick a random point, and you calculate the slope of the point in each dimension. Then, you just go that direction for a fixed distance, called the learning rate, and repeat, until satisfied. Gradient descent is guaranteed to find an optimal solution for convex problems, but the neural network cost function is quite clearly not convex; however, if the space is sufficiently high dimension... then it hopefully works? And to some extent, it does?)
So finally, we have something that can solve XOR. And indeed, if you consider the brain in some sense a mapping of our senses and our memory to actions, then hypothetically a sufficiently complex neural network could model it.
But I hope after all that you can see that while some loose inspiration has been taken from the brain, the actual model is quite far from it. We truly understand very little about how the brain functions, although we already know that it certainly does not train itself with gradient descent (thank god), neurons do not need to fire in a linearly forward manner, that neuron timing matters, and more.
Perhaps the neural network is an equally powerful computing engine as our brains, given enough resources, but it's not the same.
Basically, this was an extremely long-winded way to say that neural networks resemble the brain as much as the tree data structure resembles actual trees. There was definitely inspiration, buuttt... don't stretch the analogy too far. The original Perceptron was created with a neuron as inspiration, but it was in a very loose sense.
But, regardless, the name is just... really cool! You have an "artificial neural network", that "learns" from data. That just begs the imagination to fill in the gaps: we're making miniature brains, that spend "millions" of years Matrix-style, to do some specific task extremely well. Just so cool.
And sounding cool, is a big benefit for convincing other people to give you money. That's not new either; remember Dynamic Programming?
We had a very interesting gentleman in Washington named Wilson. He was Secretary of Defense, and he actually had a pathological fear and hatred of the word research. I’m not using the term lightly; I’m using it precisely. His face would suffuse, he would turn red, and he would get violent if people used the term research in his presence. You can imagine how he felt, then, about the term mathematical... What title, what name, could I choose? In the first place I was interested in planning, in decision making, in thinking. But planning, is not a good word for various reasons. I decided therefore to use the word "programming". I wanted to get across the idea that this was dynamic, this was multistage, this was time-varying. I thought, let's kill two birds with one stone. Let's take a word that has an absolutely precise meaning, namely dynamic, in the classical physical sense. It also has a very interesting property as an adjective, and that is it's impossible to use the word dynamic in a pejorative sense. Try thinking of some combination that will possibly give it a pejorative meaning. It's impossible. Thus, I thought dynamic programming was a good name. It was something not even a Congressman could object to.
If you take a look at these Google trend lines, there's roughly a correlation between an increase in neural networks, and an increase in artificial intelligence. I theorize that through some combination of media miscommunication, and intentional push from researchers to make neural networks, well, sound more artificially intelligent, that it simply became a synonym for neural networks. Because AI was not particularly popular or useful in its prior form, this became the definition for many people.
I added "chess computer" because, well, it's difficult to find a keyword to track the trend of "traditional artificial intelligence". Don't take it too seriously.
And thus this "artificial intelligence" arose as a natural extension of cool terms like "artificial neural network", as it's own term, to refer to neural networks (deep, by the way, just means it has many layers) since they (very, very, very, very, very) vaguely resemble an artificial brain.
Basically, one definition of "artificial intelligence" has little do with attempts to recreate human intelligence, or creating intelligent agents. Rather, the intelligence is the ostensible resemblance to the human brain. Of course, the fact that many of the tasks neural networks excel at involve high dimensional data like images or audio, things which have traditionally thought of to be in the domain of "humans", no doubt helps this claim.
And there's nothing really "wrong" with it, at least in my mind. Words warp in meaning as time goes on. At this point, the synonymity of "Artificial Intelligence" and "Neural Networks" is pretty much set; even DeepMind categorizes their work as "Artificial Intelligence", even the ones that would not fit the first definition.
What I'm curious about is the fate of the other AI. Will the overpowering hype of deep neural networks suffocate usage? Will minimax chess bots no longer be considered AI? Or will it continue to live side-by-side with the other term? Only time will tell.
A Primary Source
A fun thing you can do with GPT-2, which is no longer state of the art, but is widely available as free web applications (some people have either way too many AWS credits or way too much money), is to have it "chat" with you by prompting it with chat-like text. So let's asks an "AI" (by some definition) what "Artificial Intelligence is"
stu2b50 asks, "What is Artificial Intelligence? And what are you, GPT-2?"
GPT2 quickly replies, "I'm the first AI ever invented. The concept of AI has been around for thousands of years, but computers were not invented until about 20 years ago." "I want to hear why," says Dost, who says that computers are not really that intelligent yet. "Because computers are getting more complex. I don't know why that is. We can't explain why. It's just part of our biology and we just don't know why ." He pauses. " If you want to know why computers are getting more complicated, talk about the brain and how its functions are very complex."