Can a chatbot mean what it says?

Correlation vs Meaning
Today, the majority of chatbots are built using machine learning algorithms. The overwhelming tendency in mainstream machine learning is extracting correlations from a huge amount of data. Within that context, the defining idea of building a chatbot is the exclusive focus on correlations.
This approach may work, if the information content of a message is none other than the correlations. But correlation is not meaning.
Intuitively we know that meaning of a word changes when how we feel about that word changes. Most of the time, we do not reason about how we feel about something, we just experience it.
The elusive problem of information content
Let’s have a thought experiment. Let’s imagine an Alien race sends a message to a receiver on Earth:
Every Xvdgs is a hjudgs. But not every hjudgs is a Xvdgs.
What does this mean to the receiver on Earth? If the receiver never heard the words Xvdgs and hjudgs before, we can assert that Xvdgs can not indicate a nonlinguistic reality and cannot acquire meaning by being the expression of something non-linguistic. Xvdgs assumes its meaning through the logical relationship with hjudgs. So, the logical relation expressed with “Every Xvdgs is a hjudgs. But not every hjudgs is a Xvdgs” is the meaning. However, even the receiver is able to extract the logical relationship, something feels odd. But let’s say, the receiver sent a message back to the sender…
Every car is a vehicle, but not every vehicle is a car.
Is the meaning changed? From an analytical point of view, “Every Xvdgs is a hjudgs. But not every hjudgs is a Xvdgs.” has the same logical relationship. But does it feel the same?
How about…
Every headache is a pain, but not every pain is headache.
Relationship is still the same. But does it feel the same? The real question is : Is the information content of the message different now?
Headache and pain. They mean something to a human being. If the hypothetical Alien race does not have the experience of pain, a big chunk of information is out of reach for them.
Let’s have another thought experiment. Try to imagine this sentence in your mind’s eye:
- It was a hot and humid day. Under the rain, I walked sweating.
And now, try to imagine this sentence:
- It was a vxchy and jugsdf day. Under the gyhshs, I walked ssaas.
And try to imagine this sentence if you can:
- 0.75,0.55, 0.40, 0.95,0.65, 0.70,0.95
The last one is a hypothetical example of how a machine learning system would construct "it was a hot and humid day" sentence. Each word is an output of a classification process. For the machine, the meaning of the sentence would not signify anything else other than a list of probabilities. The information content is not an experience. Imagine the machine has a dictionary. It looks up to its dictionary and translates 0.55 probability into word hot within a certain context of other words. Here hot is a symbol which represents a probability. This is not the way words get their meanings for humans.
In cognitive science and semantics, the symbol grounding problem asks how it is that words get their meanings. Could the meaning of a word be a mental exercise, a creation of our cognitive process? Is meaning a concept which cannot be perceived through sensory organs but results from reason? Or is there a way to specify the meaning of a word by connecting it up with some underlying reality?
As for the chatbots, the question is how we can encode more information into a word (or a symbol) other than its relationship with another word (or a symbol)?
Time to bring the body in
In the field of artificial intelligence, there is hardly any mention of the body. The main reason for the exclusion of the body is that today’s artificial intelligence operates on the premise that intelligence is a capacity for formal operations and functions that is not dependent on any particular form of embodiment.
Prevailing attitude is that a chatbot is a metaphysical entity and it exists independent of bodily processes, activities and engagements with the environment. It has a reality only emerging from its training dataset.
However, to have access to meaning beyond correlations, a chatbot should start where all animals start: being a bounded, embodied organism. On top this, it has to engage its various environments. Meaning arises when the organism acts purposively in the world that it inhabits. Conceiving of meaning in an embodied, experiential manner is to go beyond the confines of language based meaning. No traditional understanding of language in machine learning as having meaning only through statistical regularities and correlations could capture the richness of body-based meaning that is experienced.
Umwelt revisited
Jacob von Uexküll, an Estonian biologist who was a professor in Hamburg University, introduced the concept of Umwelt in the early twentieth century. The term Umwelt is the world as it is experienced by a particular organism. Even though a number of creatures may occupy the same environment, they all have a different umwelt because their respective nervous systems are designed by evolution to seek out and respond only those aspects of the environment. For example, an ant has a different Umwelt than a human even they share the same environment. The smell of another ant’s pheromone trail forms a large part of its umwelt, the smell of a perfume does not. Of course, a chatbot also live inside its own Umwelt. It sees the world in terms that are relevant to its dataset and underlying algorithm. In fact, its Umwelt is exceptionally narrow to understand the Umwelt of humans. Most of the time, flexibility of a chatbot is functionally specific and limited to a particular domain.
Making sense of objects through body-part relations
One way to make a chatbot having an expanded Umwelt is the use of body-part projections for understanding objects, events and scenes. Now, let’s imagine that a body-simulation accompanies a hypothetical chatbot. Humans use their own body-part relations to make sense of objects and spatial relations in their surroundings. A good example of this is the way we experience our own bodies as having fronts and backs, and so it seems natural for us to project these front/back relations onto other objects, such as trees, rocks, houses, none of which have inherent fronts or backs. Using body-parts as the initiator of meaning, we attribute them to objects and events, such as the “eye of the storm” or “heart of the problem”. Body-part projections are meaningful because they enact aspects of our fundamental ways of relating to, and acting within, our environment. A chatbot can establish a large number of intrinsically meaningful patterns given the nature of its body-simulation and the general dimensions of its surroundings (stable structures in its environment simulation). Our bodies experience regular recurring patterns as we grow and develop. For example, the fact that humans exist and operate within earth’s gravitational field generates recurring experiences of up/down. Or through our numerous daily experiences with containers and contained spaces, we develop a concept of container that consists of a boundary that defines an interior and an exterior. Regular recurring patterns (such as up/down, left/right, front/back, containment, balance, loss of balance, center/periphery, straight/curved etc.) can be the basic building blocks of meaning for a chatbot as well as humans. Words could be the symbols relating to the experiences of its body-simulation.
Generalizing through metaphors
For our hypothetical chatbot, a limited inventory of basic primitive body-simulation experiences could be enough to build more complex linguistic structures. “Experientially based” words and phrases can be the basis for spatial and temporal inferences to understand an abstract domain. As an example, we can think of this sentence:
“water on the stove goes from cold to hot”
This is a metaphor of motion along a path. It provides us information about the change-of-state process for water. Change of state is understood metaphorically as change of location. Here, experiential knowledge of source-path-goal movement is the basis to draw appropriate inferences in a different domain.
The heart of the matter
The key idea here is that the very understanding of meaning is experiential. We understand language by simulating in our minds what it would be like to experience the things that the language describes. For our chatbot, the neural activations involved in sensory, motor, and affective simulations within a specific context are what it is to grasp the meaning. The notion of understanding and conceptualization as on-line neural simulation could be the general theory of language understanding. To make chatbots mean what they say, we should give them virtual bodies coupled with a simulated environment. Then, we can have a real chat with them.