top of page


For anybody interested in artificial intelligence, I want to break the news with this post:

- Current machine learning practices which use datasets have nothing to do with building intelligent agents

- What the world needs is not another statistical inference algorithm on steroids under the disguise of “artificial intelligence”. We need intelligent beings who can solve life’s challenges for us with us.

Here, I will propose an alternative approach. What is new about this approach that it does not use datasets. Actually, it lives inside data, and it generates data by itself. But you do not need to collect data to feed. No need for 100,000 cat photos.


Figure 1: Building an information rich environment should be the starting point.

Where we start is not the “intelligent agent” itself. Such an agent can not exist in a vacuum. Intelligent behaviour always exists in an ecological niche where the agents can make use of the available information. So, we start building an ecological niche rich with information so that our agents can exploit. You may think of it very much like a open-world computer game environment.


Figure 2: Interaction within the environment should generate feedback

The ecological niche that we are going to create should have some recurrent properties like:

- laws of physics

- gravity

- friction

- electromagnetic forces

- properties of the interaction with the environment

Here, the most important property is the feedback from the environment. If there is an interaction with the environment, such action should always generate information as feedback. If the rocks fall into the water, there should be a splash. If the wind blows, sand should scatter.


Figure 3: Virtual bodies can start from humble beginnings.

Feedback is useful as long as there is someone to pick it up. To make use of feedback in the service of intelligence, we can evolve agents who can sense the information in the environment and act upon the information they sensed.

At this stage, our open-world computer simulation will be inhabited by artificially evolved agents with virtual bodies. Agents should be able to learn and adapt through feedback.

To learn from the feedback, our agents will need to have a body.The body is the ultimate interface that interacts with the environment. An agent with a body affects the environment and learns from feedback.

For now, we should assume that mysteriously, we managed to evolve agents with sensors to pick up the information in the environment and motor abilities to act on the information. I will address how to evolve agents later in the article. But let’s keep in mind that virtual bodies can start from humble beginnings. At this point, we do not consider a complex multicellular body as our initial state. In the initial state of the agent, complexity would be a stretch of the imagination.


Intelligent agents have goals to pursue. Thus, they have meaningful existences in the environment. Otherwise, they would be indistinguishable from the environment. Having a goal creates a boundary between the agent and the environment.

Where are the goals come from? A surviving agent cannot escape from having goals. The ecological niche has properties which constrain the agent in ways “a purpose” emerges. Building very simple innate internal goals into virtual agents would kick-start the survival efforts. However, there should be no intervention to increase goal complexity. Complex goals should emerge naturally as an outcome of increasing complexity in the environment.

The complexity of the goals increases if the environment has more to offer. In such a case, the increased complexity of the agent enables increased exploitation of the environment. Intelligence is an off-shoot of the complex goals derived from the abundance of information in the environment.


Figure 5: Environmental opportunities enable agent complexity.

Depending on the opportunities of exploitation, the complexity of sensory and motor systems will increase to be useful.

However, one system cannot evolve ahead of the others.

Growing a pair of extra limbs is useless if the visual system cannot detect the tree branches to hold. The environmental opportunity demands almost simultaneous evolution of different systems.

Simultaneous evolution increases the complexity of the agent drastically within a short time.

Agent’s morphology, materials, control systems as a whole reflect the existing environmental opportunities to exploit. At this stage, we should utilize an evolutionary algorithm which bridges the gap between the morphology, materials and control systems and the environmental opportunity.


Figure 6: To exploit newly emerged information, agent morphology may change

Changes in the environment generates more information. Fast adaptability to changes is the ability to exploit newly emerged information sources in the environment. To exploit newly emerged information from the changes in the environment, agents may evolve their morphology. If the morphological changes cannot catch up with the environmental changes, agents utilize more computation to solve challenges of life or in this case open-world computer simulation.



In reinforcement learning, agents do not evolve to bridge the gap between the environmental opportunities and their ability to exploit them. Rather, the algorithm tries to change the behaviour of the agent to maximize the accumulated reward feedback from the environment. Morphology and control system of the agent is given and unchanging. Reward responses are the bread crumbs of the agent that are used by the algorithm to modify the behaviour. Sometimes, rewards increase when the agent gets closer to its goal. However, complex goals do not emerge when the information in the environment increases. Most of the time, learning occurs in a stable environment and the amount of information to be exploited stays the same.

In our case, we modify the agent itself.

For our evolutionary algorithm, an agent’s ability to exploit the opportunities in the environment is an input. The algorithm maps this input to the output. The output is the opportunities in the environment for that specific agent. If the environment changes and consequently, if the input-output gap gets bigger, the complexity of the agent increases to catch up with the changes. The algorithm does not attempt to maximise a cumulative reward value. Rather, a match is sought between the agent and the environment. The whole process is totally unsupervised.


We don’t need them. The evolving agent is shaped by the gap between its morphology and computing abilities and the opportunities in the environment and the speed of environmental change.


We are building a killer app to demonstrate the power of “dataset free” unsupervised, evolutionary agent development process. Divera AI is working on it. Coming soon…

23 views0 comments
bottom of page