Artificial Intelligence – Part II: The Rise of Modern AI

I’m starting to prepare the follow-up to the introductory AI presentation I gave last year, and I realized I never published the second part of the first talk – so here it is!

In the first part we see the impressive development in many areas already in progress at the beginning of 1900 and exploded with WWII. We stopped with the birth of AI in 1956 and now we are gonna see the step of its evolution to current days.

I just report the content of my slides, that was “bullet-based” to have a summary to follow in the discussion, but I think it is enough clear.

1956–1970 – The Age of Optimism

This period was marked by strong government funding and very high expectations. Both the USA and the USSR considered Artificial Intelligence a potential strategic asset, which fueled ambitious research programs and widespread optimism.

Key milestones:

  • Logic Theorist (1956)
    One of the first AI programs, capable of proving mathematical theorems using symbolic logic.
  • LISP (1958)
    John McCarthy introduced a programming language specifically designed for AI research, which would dominate the field for decades.
  • ELIZA (1966)
    An early chatbot that simulated a psychotherapist through simple pattern matching, often perceived as more intelligent than it actually was.
  • SHRDLU (1970)
    A system able to understand natural language commands within a restricted virtual world of blocks, showcasing both the promise and the limits of early NLP.

Main limitations:

  • Highly rigid systems, based on the belief that enough logical rules could reproduce human reasoning and language.
  • Little or no handling of ambiguity, context, or learning.
  • Very limited practical results compared to the level of investment and expectations.

1970–1980 – The First AI Winter

During the 1970s, enthusiasm around Artificial Intelligence rapidly faded. Governments began cutting funding as early promises failed to turn into real-world results.

What went wrong:

  • Symbolic AI failed to deliver on its bold claims: programs followed rules, but did not truly “think”.
  • Available hardware and data were insufficient to support more complex or realistic models.
  • AI systems showed very limited practical applicability, outside of tightly controlled environments.

As expectations collapsed, AI research entered a long period of stagnation, later known as the First AI Winter

1980–1987 – Expert Systems

In the early 1980s, interest in Artificial Intelligence returned, especially in industrial and professional contexts, thanks to the rise of expert systems.
These systems did not aim at general intelligence, but at reproducing human reasoning within very specific domains.

How expert systems worked:

  • Knowledge base: a collection of facts and rules describing a specific domain.
  • Inference engine: applied logical rules to draw conclusions or make decisions.
  • User interface: allowed users to ask questions and receive explanations.

Typical applications included:

  • Medicine: computer-assisted diagnosis.
  • Technology and industry: configuration of complex systems.
  • Geology: mineral and oil exploration.

Example of rule-based reasoning:

  • High fever + skin rash → possible measles

Hardware context of the time:

  • Commodore 64 (1982): 1 MHz CPU, 64 KB RAM
  • Amiga 1000 (1985): 7 MHz CPU, 512 KB RAM
  • IBM 3090 (mainframe): ~50 MHz CPU, 16–512 MB RAM

Main limitations:

  • Difficult to maintain: adding or modifying rules was complex and error-prone.
  • High costs: required domain experts and specialized engineers to build and update.
  • No learning capability: systems did not improve with experience.

1987–1995 – The Second AI Winter

By the late 1980s, expert systems began to collapse under their own limitations.
Problems that were manageable at small scale became increasingly difficult to control as systems grew.

Key reasons for the second downturn:

  • Structural limitations of expert systems, which did not scale well to larger or more complex problems.
  • Lack of sufficient data and effective machine learning techniques.
  • Drastic reduction in funding, as confidence in AI once again declined.

As a result, AI entered a second period of stagnation, known as the Second AI Winter.

However, something was beginning to change…

The 1990s – A Slow Recovery

During the 1990s, Artificial Intelligence began a quiet and largely spontaneous comeback. There was no new grand theory, but a combination of technological progress and more pragmatic goals
slowly brought AI back to life.

What enabled this recovery:

  • Dramatic improvements in hardware performance, both in consumer and high-end machines.
  • A shift toward data-driven approaches, moving away from hand-written rules.
  • A renewed focus on practical applications, rather than the creation of an “artificial mind”.

Hardware evolution of the decade:

  • Intel 386 (1986–1992): 16–40 MHz CPU, 1–16 MB RAM
  • Intel 486 (1989–1995): 25–100 MHz CPU, 4–64 MB RAM
  • Intel Pentium (1993–1996): 60–200 MHz CPU, 8–128 MB RAM
  • IBM “supercomputer” (1990): ~75 MHz CPU, 2 GB RAM
  • Cray supercomputer (1993): ~300 MHz CPU, 8 GB RAM

A symbolic milestone:

  • 1997 – Deep Blue defeats Garry Kasparov at chess, demonstrating the power of specialized AI systems combined with massive computation.

Conceptual shift of the era:

  • The rise of Machine Learning, where systems improve by learning from data.
  • Emergence of new-generation neural networks.
  • Evolution of expert systems, integrating statistical and learning-based techniques.

This period did not produce spectacular breakthroughs, but it set the technical and conceptual foundations for modern AI.

Early 2000s – Deep Learning

In the early 2000s, several independent factors finally converged, making a new approach to AI practically feasible.

What changed:

  • Massive computing power at low cost, thanks to GPUs and distributed clusters.
  • The explosion of the Internet, providing unprecedented amounts of data (social networks, Wikipedia, blogs, and digital archives).
  • These conditions made it possible to apply Deep Learning in practice: machine learning based on neural networks with many layers.

Theoretical foundation:

  • Universal Approximation Theorem
    A neural network with a single hidden layer and a sufficient number of neurons can approximate
    any continuous function with arbitrarily small error.

While this theorem had been known for decades, only in the 2000s did hardware and data availability make it realistic to exploit it at scale. This marked the beginning of modern, data-driven AI.

Modern neural networks are inspired, at least conceptually, by the biology of the human brain.

Neural Networks

The human brain is an extremely powerful and efficient system based on neurons, that are its base unit. Neuron is a cell with a lot of branches used to connect to the nearby neurons and transmit signals. A single neuron is nothing, but billions of them allow us to have an incredible “computational power”.

Neuron, from Wikipedia

Here some statistics:

  • The brain contains about 86 billion neurons.
  • Each neuron connected through 1,000 to 10,000 synapses.
  • An estimated 100 to 1,000 trillion synaptic connections in total.
  • Roughly 2,500 TB of information capacity and up to one quadrillion operations per second.
  • Weighs about 1.3–1.4 kg, yet consumes only 20% of the body’s energy (around 20 watts).

This massive parallelism is what artificial systems attempt to approximate.

Artificial neural networks are composed of artificial neurons that are far more simple than natural ones. They usually are represented as nodes of a graph. A node has input branches, each with a value that the neuron weights with a function, and output branches whose values are computed by the internal function of the neuron.

Artificial neuron, from Wikipedia

An artificial network is organized into multiple layers:

  • Input layer: receives raw data.
  • Hidden layers: transform and extract relevant features.
  • Output layer: produces the final prediction or decision.

Learning happens through the adjustment of weights and activation functions, allowing the network to gradually improve its predictions based on data. Here a representation of a simple network (each node represents the more complex structure presented in the previous image).

Artificial Neuron

Example: character recognition (OCR)

We use the character recognition to show in practise how a neural network really works. Its target is to read an image and to recognize the letter contained. Consider this image:

  • Input:
    • Images of size 28×28 pixels, corresponding to 784 input neurons. Each pixel is mapped on one input neuron.
  • Processing:
    • Around 4 hidden layers to progressively filter and reduce image features
    • Approximately 10,000 neurons involved in feature extraction.
  • Decision:
    • One dense connection layer (~3,000 neurons)
    • One decision layer (~128 neurons)
    • One output layer with 26 neurons, one for each character.
  • Scale:
    • Roughly 1 million connections among about 15,000 neurons.
    • The network looks similar to this one, but still more complex:
  • Training:
    • Trained on hundreds of thousands of labeled images.
    • The network automatically determines the optimal weights.
  • Result:
    • Around 98% accuracy in character recognition.

Rather than being explicitly programmed, the network learns by example, discovering patterns that are too complex to be written as rules.

Modern neural networks

This example was the typical case study at university 15-20 years ago but modern neural networks for generative AI are much more powerful and BIGGER!

Nowadays networks can generate text or images. To obtain this, language and images must be represented in a numerical form that captures meaning, context, and structure.

How modern models handle this complexity:

  • Words are represented as high-dimensional vectors, typically ranging from 768 to over 2,000 dimensions.
  • Models are composed of many stacked layers:
    • around 96 layers in models such as ChatGPT-4.
  • The scale is enormous:
    • approximately 175 billion parameters (nodes and connections) in ChatGPT-3,
    • roughly ten times more in ChatGPT-4.

Training data:

  • These models are trained on vast portions of the Internet, like Wikipedia, books, articles, publicly available texts
  • Rather than memorizing content, they learn statistical patterns in language and images.

These techniques can obtain incredible results in several fields:

  • Computer vision: Used in facial recognition, medical diagnostics, and self-driving cars.
  • Natural Language Processing: Powering chatbots, translation systems, and voice assistants such as Siri and ChatGPT.
  • Creativity and generative AI: Enabling the creation of images, videos, text, and music with models like DALL·E and Stable Diffusion.

These systems do not “understand” in a human sense, but they are extremely effective at generating plausible, coherent, and context-aware outputs across many tasks.

Conclusion

In the original exposition I also presented applications, tools, risks and problems related to the AI (deep fake, unemployment, environmental problem, fraud, human ability regression, possible birth of a dangerous super AI etc..) but I’ll write about this in future articles (related to the follow up of this presentation).

I leave here my conclusions (or better, my current opinion, with the velocity at which these technologies change we can not have conclusions…)

We are living in an extraordinary historical period, comparable to only a few moments in the last hundred years. However, technological progress does not automatically come with an equivalent growth in social awareness or education. There is a real risk that society is not yet mature enough to handle such powerful tools, and that many uses will be careless, misguided, or outright harmful. At the same time, AI does not fundamentally enable things that were impossible before; rather, it simplifies and accelerates almost everything. Those who understand and use it effectively can gain significant advantages.

As with any complex topic, the most important aspect is not blind enthusiasm or fear, but understanding the subject and forming one’s own critical conclusions, instead of relying solely on hearsay or hype.

Actually, AI is an incredible accelerator, but it still needs humans to provide direction and purpose. For at least the next few years, we are not obsolete. However, we cannot afford to stand still, looking around and “longing for the world of the past”; otherwise, we risk missing the train and being unable to keep up with the pace of today’s evolution.

Leave a comment

Your email address will not be published. Required fields are marked *

zagonico.com
Privacy Overview

This website uses cookies so that we can provide you with the best user experience possible. Cookie information is stored in your browser and performs functions such as recognising you when you return to our website and avoid to put in moderation queue your comments after the first.