35 - Yoshua Bengio on Why AI Labs are “Playing Dice with Humanity’s Future”

The Most Interesting People I Know

1×

0:00

Current time: 0:00 / Total time: -47:57

-47:57

35 - Yoshua Bengio on Why AI Labs are “Playing Dice with Humanity’s Future”

The 2nd most cited living scientist on existential risk from AI

Garrison Lovely

May 09, 2024

Transcript

I'm really excited to come out of hiatus to share this conversation with you. You may have noticed people are talking a lot about AI, and I've started focusing my journalism on the topic. I recently published a 9,000 word cover story in Jacobin’s winter issue called “Can Humanity Survive AI,” and was fortunate to talk to over three dozen people coming at AI and its possible risks from basically every angle.

My next guest is about as responsible as anybody for the state of AI capabilities today. But he's recently begun to wonder whether the field he spent his life helping build might lead to the end of the world. Following in the tradition of the Manhattan Project physicists who later opposed the hydrogen bomb, Dr. Yoshua Bengio started warning last year that advanced AI systems could drive humanity extinct.

A full transcript is below (jump to it here).

The Jacobin story asked if AI poses an existential threat to humanity, but it also introduced the roiling three-sided debate around that question. And two of the sides, AI ethics and AI safety, are often pitched as standing in opposition to one another. It's true that the AI ethics camp often argues that we should be focusing on the immediate harms posed by existing AI systems. They also often argue that the existential risk arguments overhype the capabilities of those systems and distract from their immediate harms. It's also the case that many of the people focusing on mitigating existential risks from AI don't really focus on those issues. But Dr. Bengio is a counterexample to both of these points. He has spent years focusing on AI ethics and the immediate harms from AI systems, but he also worries that advanced AI systems pose an existential risk to humanity. And he argues in our interview that it's a false choice between AI ethics and AI safety, that it's possible to have both.

Yoshua Bengio is the second-most cited living scientist and one of the so-called “Godfathers of deep learning.” He and the other “Godfathers,” Geoffrey Hinton and Yann LeCun, shared the 2018 Turing Award, computing’s Nobel prize.

In November, Dr. Bengio was commissioned to lead production of the first “State of the Science” report on the “capabilities and risks of frontier AI” — the first significant attempt to create something like the Intergovernmental Panel on Climate Change (IPCC) for AI.

I spoke with him last fall while reporting my cover story for Jacobin’s winter issue, “Can Humanity Survive AI?” Dr. Bengio made waves last May when he and Geoffrey Hinton began warning that advanced AI systems could drive humanity extinct.

We discuss:

His background and what motivated him to work on AI
Whether there's evidence for existential risk (x-risk) from AI
How he initially thought about x-risk
Why he started worrying
How the machine learning community's thoughts on x-risk have changed over time
Why reading more on the topic made him more concerned
Why he thinks Google co-founder Larry Page’s AI aspirations should be criminalized
Why labs are trying to build artificial general intelligence (AGI)
The technical and social components of aligning AI systems
The why and how of universal, international regulations on AI
Why good regulations will help with all kinds of risks
Why loss of control doesn't need to be existential to be worth worrying about
How AI enables power concentration
Why he thinks the choice between AI ethics and safety is a false one
Capitalism and AI risk
The "dangerous race" between companies
Leading indicators of AGI
Why the way we train AI models creates risks

Background

Since we had limited time, we jumped straight into things and didn’t cover much of the basics of the idea of AI-driven existential risk, so I’m including some quotes and background in the intro. If you’re familiar with these ideas, you can skip straight to the interview at 7:24.

Unless stated otherwise, the below are quotes from my Jacobin story:

“Bengio posits that future, genuinely human-level AI systems could improve their own capabilities, functionally creating a new, more intelligent species. Humanity has driven hundreds of other species extinct, largely by accident. He fears that we could be next…”

Last May, “hundreds of AI researchers and notable figures signed an open letter stating, ‘Mitigating the risk of extinction from AI should be a global priority alongside other societal-scale risks such as pandemics and nuclear war.’ Hinton and Bengio were the lead signatories, followed by OpenAI CEO Sam Altman and the heads of other top AI labs.”

“Hinton and Bengio were also the first authors of an October position paper warning about the risk of ‘an irreversible loss of human control over autonomous AI systems,’ joined by famous academics like Nobel laureate Daniel Kahneman and Sapiens author Yuval Noah Harari.”

The “position paper warns that ‘no one currently knows how to reliably align AI behavior with complex values.’”

The largest survey of machine learning researchers on AI x-risk was conducted in 2023. The median respondent estimated that there was a 50% chance of AGI by 2047 — a 13 year drop from a similar survey conducted just one year earlier — and that there was at least a 5% chance AGI would result in an existential catastrophe.

The October “Managing AI Risks” paper states:

There is no fundamental reason why AI progress would slow or halt when it reaches human-level abilities. . . . Compared to humans, AI systems can act faster, absorb more knowledge, and communicate at a far higher bandwidth. Additionally, they can be scaled to use immense computational resources and can be replicated by the millions.

“Here’s a stylized version of the idea of ‘population’ growth spurring an intelligence explosion: if AI systems rival human scientists at research and development, the systems will quickly proliferate, leading to the equivalent of an enormous number of new, highly productive workers entering the economy. Put another way, if GPT-7 can perform most of the tasks of a human worker and it only costs a few bucks to put the trained model to work on a day’s worth of tasks, each instance of the model would be wildly profitable, kicking off a positive feedback loop. This could lead to a virtual ‘population’ of billions or more digital workers, each worth much more than the cost of the energy it takes to run them. [OpenAI chief scientist Ilya] Sutskever thinks it’s likely that ‘the entire surface of the earth will be covered with solar panels and data centers.’”

“The fear that keeps many x-risk people up at night is not that an advanced AI would ‘wake up,’ ‘turn evil,’ and decide to kill everyone out of malice, but rather that it comes to see us as an obstacle to whatever goals it does have. In his final book, Brief Answers to the Big Questions, Stephen Hawking articulated this, saying, ‘You’re probably not an evil ant-hater who steps on ants out of malice, but if you’re in charge of a hydroelectric green-energy project and there’s an anthill in the region to be flooded, too bad for the ants.’”

Full transcript

This has been lightly edited for clarity.

Garrison Lovely 7:25
Just to jump in, can you give us a bit of background about yourself?

Yoshua Bengio 7:29

Yeah. I started my grad studies in 1985. And I fell in love with the idea of understanding intelligence and building intelligent machines, by working on a few principles, like the laws of physics that might explain intelligence and, of course, also allow us to build intelligent machines. And the inspiration from the brain with neural networks is really what drove that passion, and still does.

Garrison Lovely 8:01

Nice, and you have recently come out as somebody speaking out about the possible existential risk from artificial intelligence. And some critics will say that this is just speculative, or it's not real or evidence based. How would you respond to that?

Yoshua Bengio 8:17

I think it's dishonest. Because, of course, it didn't happen yet. It's not like there's a history of humanity disappearing in the past that we can do statistics over. But the reality is, we have no choice but to speculate about future scenarios that could be dangerous, that haven't happened. But we have to consider the scenarios. And we try to make judgments so that we can make decisions. For example, in terms of regulation, in terms of what even scientists should be doing or not doing, just like other sciences have done. Like in biology, you're not allowed to do anything you want. We’ve tried to protect the life on Earth; we’ve tried to protect humans from pathogens and all sorts of things. It's a similar thing.

Now, I understand that some of my colleagues are uneasy about that, because I can't, you know, I can't write a simulation of the behavior of humans and future research progress, like, say, climate models could do. And so, it's hard to be quantitative about these things. But if we're in denial about these possibilities, then we're actually taking very serious risks.

I don't know what's going to happen, but I have to consider those possibilities. And I have to think of what we could do that we would not regret. Because it's… it’s not so terrible to regulate, for example. And yet it would buy us a kind of insurance policy against things we don't understand sufficiently and that could have drastic consequences for our society.

Garrison Lovely 10:15

Yeah. And when you first got into artificial intelligence, how was this idea of AI driven x-risk perceived in the field?

Yoshua Bengio 10:25

Hm. I only got to even be exposed to the idea about a decade ago. I know that work in some communities… thinking about this dates to at least one and a half decades before that. But in the machine learning community, really, it was not a subject.

But I actually supervised a student, David Krueger, who was very interested in these things. I supervised his master's, and then he continued with my colleague, Aaron Courville, for his PhD at Mila. And so he was very concerned about these questions. He had been for a decade, when he was doing his grad studies with me. And so I didn't really pay too much attention to this. I thought that - like many people in machine learning - ‘this was so far away that, well, what could we do? There are so many benefits of AI that are gonna come anyways, before it becomes dangerous.’ And obviously, I've changed my mind this year, this winter [2023]. I think I, like many people, didn't really want to look in places that were not comfortable. But the reality of GPT forced me to reassess my projections of where we might be going in coming years and decades.

Garrison Lovely 12:05

Yeah. And so you went from like, a 20 to 100 year timeline to a five to 20 year timeline, you've written.

Yoshua Bengio 12:10
Yeah, exactly.

Garrison Lovely 12:11
And how do you think the idea of existential risk from AI is perceived now in the machine learning community?

Yoshua Bengio 12:19

Oh, it's changed a lot. Like used to be like 0.1% of people paid attention to the question. And maybe now it's 5%. [Laughs]

Garrison Lovely 12:28

Oh, wow.

Yoshua Bengio 12:30

No, I think there's still a majority who feel uncomfortable about the question. First, because you really have to read about the topic to start understanding risks, which otherwise seem futile. Like, if you had asked me 10 years ago - or even maybe five years ago - if it would be dangerous to have AI’s that are smarter than us, I might have said, ‘we'll just build them so that they will do what we want.’

Well, turns out, it's easier said than done. And right now, we don't know at all. Like, we don't have the inkling of a way to do this in a way that's sufficiently, convincingly safe. So it took... So I read Stuart Russell's book in 2019.

Garrison Lovely 13:34

Human Compatible, right?

Yoshua Bengio 13:36

Exactly. Thank you. And so it started to dawn on me that, well, the problem is not so easy. And so when my timeline changed, like, one plus one equals two; we don't know how to make AI that is safe. And it may come in five years, maybe more if we're lucky. It gives us more time to find solutions. And then more recently, I realized: Oh, gee, it's not just that we don't know how to make an AI that is safe. But even if we did, somebody could decide to take the same methodology, but apply it in an unsafe way. Like bad actors or, simply, you know… there's a small minority of people who actually want to see AI systems that are smarter than us and are independent from us and have their own history, life, interests, and self preservation. And so you don't even need all the complicated arguments from, you know, the alignment issues. This could arise simply because humans want to make machines in their image. And once that happens, if those machines are not very strongly attached to our wellbeing, then we might be in trouble. At least we don't know what would happen. What happens when there is a new species on this planet that has its own self interests, and it can surpass us if it wants in many domains? It can overpower us, then what? It's not a comfortable situation for us. I don't know what the outcome would be. I don't know if it means extinction. But I don't want us to get even close to this; it seems very risky to take that chance.

Garrison Lovely 15:29

So, Larry Page, one of the co-founders of Google has reportedly said that arguments against building super intelligent AI are speciesist. And you know, we're just prioritizing humanity over the machines.

Yoshua Bengio 15:43

Yeah, yeah. Richard Sutton has made the same arguments.

Garrison Lovely 15:46

Yeah, what do you make of those?

Yoshua Bengio 15:50

Uh… it gives me the creeps. This is exactly what I'm afraid of. That some human will build machines that are going to be - not just superior to us - but not attached to what we want, but what they want. And I think it's playing dice with humanity's future. I personally think this should be criminalized, like we criminalize, you know, cloning of humans.

Garrison Lovely 16:15

What would be the criminalized thing?

Yoshua Bengio 16:18

To bring about, or to work towards bringing about, AI systems that could overpower us and have their own self interest by design.

Garrison Lovely 16:36

Mhm. And there's actually some recent polling of Americans showing large majorities oppose the creation of artificial super intelligence. But that is exactly what DeepMind and OpenAI - and I think now Anthropic - have stated that they want to do… or at least they want to create artificial general intelligence, and many people think that would quickly lead to super intelligence. So yeah. What do you make of the fact that this is unpopular, and yet, these companies are just choosing to do this?

Yoshua Bengio 17:03

Because it's gonna potentially make them super, super rich?

Garrison Lovely 17:09

Mm. So it's profit motive?

Yoshua Bengio 17:12

No, I actually think a lot of the people are quite conscious of this, and have signed, for example, the declaration in May about the existential risks. So, what I think their bet is… which might be reasonable, but I think needs care… is, maybe we can have our cake and eat it too, right? So maybe we can build machines that are safe, and very powerful. And they can help us solve all kinds of problems and make us all better off, you know, medically and environmentally and socially. And, you know, even our democracies could benefit. In the best of worlds.

And, that's... that's a reasonable plan, if we can solve the two problems. One is technical, and the other is social. So the technical problem is alignment, like how do we build machines that actually do what we want and not something that could eventually harm us? So that's like a machine learning algorithmic question.

The second problem, as I said, is social, or political. Like how do we make sure that the recipe for creating a safe AI - assuming it exists one day - is what people use? Because there might be some disadvantages that maybe the safe AI will be slower and more expensive. So if we have regulations that say, well, you need to use protocols... Like when you build a bridge, you have to follow some protocols so that it doesn't fall, right? Same thing for AI, but it's going to be more expensive. You know that when a pharmaceutical company designs a drug, the safety part of it costs something like 97% of the total cost?

Garrison Lovely 19:01

Wow.

Yoshua Bengio 19:01

So yeah, it is going to increase the costs to make safe things. But we already do that in other areas. So the problem is, there may be some people who will not follow the rules. That's the social problem. Like how do we set up the coordination between people, between countries, between companies, so that we all follow sufficient safety protocols, or we make sure that only trustworthy people are allowed to build those things? Not just anybody is allowed to design a new drug, or not anybody is allowed to build a new plane. You need to be trustworthy to show that you're able to do it safely and all that. So maybe there are solutions, but they're not easy and... there may be very little time both for the technical part of the solution, like aligned, safe procedures for superhuman AI, or AGI, and even less time because politics and legislation and treaties are very slow to solve the coordination problem, like making sure we we avoid the appearance of an AGI that is dangerous.

Garrison Lovely 20:23

Yeah, I mean, it seems like a pattern that's played out again and again, is: somebody worries about the risks from AI; they decide, well, somebody's going to build it, so I'll build it first. And this has happened a few times now.

Yoshua Bengio 20:34

Yeah, a geopolitical challenge is maybe one of the hardest because we can always write a law that says, at least the legal companies have to follow this rule. But where's the law that's going to force China and Russia to follow our rules?

Garrison Lovely 20:49

Right, right. And yeah, this is one of the arguments that people always make against any kind of regulation in the United States, not just in AI, but specifically within AI.

Yoshua Bengio 20:58

Oh, no, that's a false argument. But I understand. So we... We can have regulation that protects, like the Western public, let's say, from misuse, and other issues of safety, to some extent, within our borders or in the legal system. But, I think that it's not enough and that we also need to prepare what I call a Plan B. What if North Korea, Iran, or China develops its own AGI for nefarious purposes? So it's not one or the other. I think we need to do both.

Garrison Lovely 21:50

And you've talked about a kind of CERN, combined with the IAEA, the International Atomic Energy Agency, I believe. And bringing these two ideas together to work on AI. Can you just expand on that idea?

Yoshua Bengio 22:04

Yeah, there's a lot of ideas about international organization that have been floated around; I don't think that any existing model is quite right. But… let's see, there are different things that we need. One is a kind of IPCC. So this is an organization that's fairly cheap. That doesn't do research as such. IPCC isn't doing research on climate; it's taking all of the science that exists, and it's creating a big summary for decision makers, for governments. So we need the same thing in sort of ‘frontier’ AI, including, you know, some of the good things, but maybe more importantly, how they can act to avoid the bad things. So that's one thing we need.

We then need local, national efforts, like registration, but we would like… so that we know what are the big models that are out there, and governments can say, ‘oh, you're not following basic rules that we have set up. So we're going to remove your registration for this model; it's not safe.’ But we need those various national regulations to be coordinated as much as possible. I mean, just for business reasons first, but also to make sure there's a minimum standard in as many countries as possible. And so for that, we need some sort of agency that's really about monitoring. So that's kind of similar to the atomic energy organization that is about just sharing the good practice, and trying to harmonize across countries and potentially even coming up with a scheme... which we need for climate as well. So that if a country doesn't follow some minimum standards of safety, then there will be a commercial cost for that country. So for example, their AI derived products, we would not buy. It's similar to what people are talking about, with... if you don't have some kind of carbon tax or some sort of incentive for doing the right thing in your country, then the products that you're building that have emitted carbon should be taxed when they come to my country because I don't want my consumers to be helping you produce more carbon than you should. And so, this sort of club… this club of countries that say, we all agree on some standard, and all the countries that agree on these standards will trade with each other freely without an extra barrier. And for those that don't, then we will create extra barriers to trade, in particular, around AI. But you know, in principle, it could be even broader. And that will create an incentive for countries to be part of the club that respects those baselines.

Garrison Lovely 25:34

Gotcha. And I want to talk a little bit about open source–

Yoshua Bengio 25:38

Wait - but there are other things that are needed. So for now, I talked about an IPCC, which is about summarizing the research for decision makers. And I talked about an organization that's more about harmonizing, coordinating… governance, essentially, and rules around many countries.

There's a third thing that's needed, which people call something like a CERN of AI. So, now it's about doing research. But in a coordinated way; there are many reasons why you want to coordinate. So what kind of research first of all? Well, we could do all kinds of research on AI. But in particular, there's a real urgency to avoid these big risks that we discussed, to find technical solutions and governance solutions to the… appearance of gradually more and more powerful AI systems. So we want to come up with schemes to evaluate whether an AI system is dangerous and various capabilities that it could have.

And when I say dangerous, by the way, everything I'm saying covers, not just the existential risk. Really, all of those regulations are about all the risks… from misuse of all kinds, including hurting human rights, cyber attacks, disinformation, and all sorts of things, some of which are already covered in existing proposals - like the European one - and some are not. So there's misuse; there's also systemic collapse. So people are concerned that as we introduce AI in society, without paying attention to what the consequences could be, we could have financial crises, we could have job market crises, we could have democracy crises. Because it is going to be destabilizing. And we need to do research to better understand what can go wrong, so that we can mitigate.

And then of course, there is the loss of control, which I think is a better term than existential risk, because we could lose control to an AI, but the AI doesn't destroy us. It's just that maybe people are going to be suffering, or there's going to be some sort of cost to bringing this down. And we don't want to take those risks, but it's really a spectrum. In the case of loss of control... think of it like, there's an animal that we lost control of, and you know, it could kill people. If the animal is not too powerful, then maybe it doesn't do too much damage. If it's very powerful, then it could be doing a lot of damage, in the extreme cases, essentially. So existential risk, or like, extinction of humanity is a, hopefully, unlikely event. But there are a whole spectrum of other things before we get there. So for all of these risks, we don't understand well enough what can go wrong. I mean, there are lots… lots of things that are written already. But clearly, we don't understand well enough.

And we need also to work on the mitigation. And the mitigation is, as I said, of two kinds: it's the mitigation on the technical side, and also on the governance side. And on the governance side, the questions are not just about what sorts of rules we make to protect the public from misuse or whatever, but it's also about power concentration. I mean, for me, it's a fundamental thing that people don't talk enough about - which I talk about in my Journal of Democracy paper that came out within the last few weeks - which is: As we build more and more powerful tools, AI systems, we need gradually more powerful, democratic oversight. We want to make sure that these tools are not abused for accruing even more power and getting to excessive power concentration. So it could be that it's going to start by economic dominance, but even that is problematic. We know we have antitrust laws for a reason. It's not good for markets if there's too much concentration. But then, of course, that economic power can easily turn into political power. AI could be used to influence elections potentially. And if you're rich, because you're making a lot of money with AI, then you can turn that potentially into political power. And history is full of that.

And there's the military dominance. So this is.. this is a big scary thing. There's already a lot of work going on to develop weapons, or military tools, based on AI. How do we make sure this doesn't slip into something really dangerous, eventually? Like, the end scenario of all of these power concentrations is a world government with an authoritarian regime that uses AI to watch over everyone. And it’s gonna stay there like the Third Reich for 1000 years or something, right? We don't want to get there. And how to avoid that is all about the democratic governance. How do we make sure that those who have these frontier AI’s are acting in ways that are aligned with our values and aligned with our democratic principles? And that all the voices in our society are heard in making those decisions? These are difficult questions, and we need to do research on that too.

Garrison Lovely 31:20

So something you said at the top of the call was that you don't see a trade off between focusing on more immediate term harms and existential risks, or catastrophic risks from AI. But many people who are focusing on more immediate harms do see this as, like, not necessarily zero sum, but mutually exclusive to some degree. And they'll say, x risk is a distraction, or sucking all the air out of the room.

Yoshua Bengio 31:44

Yeah, I think it's… I think it's not an honest argument. As I said, we all want progress, safety, and democracy. And we don't have to choose. We want all of these things. So it would be stupid to choose one completely at the expense of the other. Or at least not reasonable. So yeah, we have to navigate solutions that keep all of these values important.

And the power concentration issue is behind many of the concerns that are raised as current harms. Which is, by the way, what I've been involved in for the last eight years... working with social scientists and philosophers and so on about the ethical aspects of AI and social impacts of AI. It's not something new to me; I've been working on this for many years. And for me, it's just a natural continuation to think of what can go wrong next, as we move towards even more powerful systems. It's all about human rights; we want to defend humanity against misuse, against potential loss of control, against systemic collapse. All of these things are bad for humans, who will suffer.

It's like if you were saying, 'oh, we have to choose between climate mitigation: like avoiding more CO2 in the atmosphere, and climate adaptation: well, we have to deal with the changing climate, and we have to build differently and so on.' Well, no, you don't choose. You have to do both, because we can't completely solve the problem one way. Yeah, I think it's a false argument. And it's a sad division, because I think everyone who cares about humanity and suffering, and has empathy for the others who could be victims of the technology that we're building should work together to make sure that the technologies we build work, and bring benefits, not harm.

Garrison Lovely 34:05

Yeah. And what do you think is the relationship between capitalism and how AI is developing?

Yoshua Bengio 34:15

Don't get me started. [Laughs] Well, so… So I'm going to start by saying a few positive words about capitalism. Right, so it is creating a setting where there are investments to innovate. And in many cases that has been positive, but there are also problems. Think of climate change, for example. There were very strong incentives for fossil fuel companies to lie so that they would continue doing their pollution. And here we are, this is actually another existential risk. So I see right now a dangerous race between companies. We might, in coming years see... I don't know, we're not there yet. But right now there is a race between companies that's very obvious. And, we're lucky that the leadership of many of these frontier labs is actually quite self conscious that there are major risks of various kinds that I talked about. But there could be an incumbent… there could be new companies coming up that have less concern about the dangers to society, and would be happy to cut corners on safety and human rights. And they would win the market, because it's going to be cheaper if you don't take care of the social impact. That's why governments need to absolutely and urgently start regulating, controlling that technology.

So yeah, capitalism left to its own devices - it’s well known by economists - doesn't pay attention to what's called externalities. The things that we all pay for, like safety, pollution, that are… the burden is collective. Whereas the cost of making it safer is going to be on the shoulders of the company that does that effort. So it's not sustainable.

So we see two kinds of solutions. Either we start putting more and more regulation to make sure everyone follows some safety guidelines and protects human rights, or we take it out of the hands of capitalism altogether. And we end up with... kind of, labs that are working for the public good, period. Think about a lot of the work that has happened in areas like research on fusion, for example, or there's also companies working on this research on nuclear weapons. So things that are very powerful and very expensive but could have a huge social impact often have been done by governments in terms of research. It’s not that we don't know how to do that. The CERN is a public thing. Space exploration too, at least until recently, was a public thing. So we could choose to go a public funding route, and it might be a safer thing. Because then we reduce the conflict of interest between maximizing profit and doing it in a way that's safe and preserves human rights and democracy.

Now, this is not the current situation. So I think in the short term, it's not going to happen. But it might be… I think it's plausible that at some point, governments will understand that as AI becomes more and more powerful, that there's this really powerful tool there. And they don't have a handle on it. They don't own it. They don't have people who understand it. They can't build them themselves; it's in private hands. I don't know if they'll accept that when they understand how much power it's going to bring. But yeah, we need to deal with this potential conflict of interest. And there are several ways, but we'll see. If we can reduce that conflict of interest one way or the other, I think we'll all be better off.

Garrison Lovely 38:53

And one final question for me, I know you have to go. But what would you see as a leading indicator of artificial general intelligence, or a few leading indicators?

Yoshua Bengio 39:12

That's a good question. You know… there's been debatable recent work that looks at the generality of the systems that we have, and it's hard to quantify that. But it's important. So if I go a little bit more technical –

There is a qualitative change in the generality of the systems that we now have. And one way to think about it is: it can learn a new task. Well, it doesn't even have to learn a new task. It can figure out a new task without having to be retrained, just by giving it a few examples: 'Oh, I want to teach you a new game. And you're not even going to practice it, I'll just tell you the rules and give you a few examples of… a few people starting a game.' And we can do that with these large language models, the frontier ones. This, surprisingly, works in many, many cases.

So, this is very different from when we trained AI systems… on one task at first, but then we got to these multitask systems where they were trained on like, five tasks, or 10 tasks. But that's it; these were the tasks they learned. But now, it's: Oh, they know enough general stuff about language, of course, but also, you know, humans and society and so on. They know enough that you just need to explain what you want as a new task… in words, and then they can do it. It’s not perfect, but there's been a… regime change, which is also what makes these things dangerous. Because you could specify a goal, that is nefarious, that is malicious, that could be harmful if executed.

Now, in order to see how this could be dangerous; it depends on what expertise this AI has. So if it has expertise in programming… and right now, they're pretty good, but not as good as the best programmers, then this could become a threat from a cybersecurity perspective. If they have expertise in synthetic biology and how we create new viruses or new bacteria, well, then it could be a bio weapon risk, and so on. So depending on their domains of expertise and how strong they are in each of these domains, then that capability of learning a new task - figuring out a new task from an explanation or a question - becomes dangerous. Then, of course, people talk about other capabilities like deception ability. Could they... Do they know how to lie in order to achieve something? Well, that's already been demonstrated. But how good are they at this? Probably not so much right now. But, I think we need to be on the lookout. And then I think there are technical reasons for why this is happening that have to do with the way they're trained. And I'm hoping that we can find methods to train large scale AI systems in the future that don't have these issues.

Garrison Lovely 42:46

Can you give some examples?

Yoshua Bengio 42:49

Yeah. So these large language models… they're trained in two ways that can each bring some problems. So first, they're trying to imitate how humans respond to a particular context. And that's just like imitation learning. They would do like they think a human would. Of course, humans have goals and humans have contexts. And it's in this way, they could achieve goals that could be dangerous, by imitating how humans achieve goals.

But then there is something potentially worse, which is when we do some reinforcement learning. And there we try to train them towards getting, you know, ‘+1,’ good feedback. But that actually is a very dangerous slope. There's a lot of academic work that suggests that this could lead to behavior that's very far from what we want - which would take too much time to explain - but there really is a slippery slope there, in terms of how badly they could behave compared to what we want.

Garrison Lovely 44:02

This is like instrumental convergence - is that the idea you’re referring to?

Yoshua Bengio 44:05

That's an example. There's another one you can look up called reward hacking.

Garrison Lovely 44:09

Right. So it's like if people are the thing that gives me my reward, and they control my reward. I want to get more reward. I might want to control people or obviate people in some way.

Yoshua Bengio 44:19

Yes, yeah. Or just control the keyboard on which they're saying ‘yes, this is good.’

Garrison Lovely 44:23

Right. Like the wire heading kind of just like getting straight to the thing you want. Yeah. Was there anything else you'd like to share?

Yoshua Bengio 44:34

Yes, so, I've been talking in this Journal of Democracy paper - which you will find easily on the web - about the need for, what now some journalists are calling a ‘humanity defense organization.’ So the idea is that we should urgently put in guardrails, nationally and internationally. But it's not going to be 100% failsafe; there will be organizations, countries, bad actors who will just not follow those rules. Just like 'Oh, but what if China doesn't, you know, apply the same safety rules?' And so we need to prepare for that scenario.

So how do we prepare for that scenario? Think about cyber attacks. Well, if North Korea has an AI that's superhuman in programming abilities, and can do cyber attacks a lot better than any human, the only way we can defend against that is by having our own AI systems that are expert at cyber defense, and can counter that. Right. But so we need to set it up. We need to make sure that we're not just looking at commercial applications of AI, but also defensive applications. And now, we have to be careful, like, who decides what to do? And the democratic governance around this… needs to make sure, for example, it is not used for military offensive purposes, because then we have other problems. I think in that paper I talked about a multilateral agreement between a few democracies, and… maybe international community involvement - like the UN or something - to make sure that the organizations that work on these defense questions, first, follow the best safety rules. So we don't create the runaway AI that we're trying to avoid in the first place. And second, apply the power that they have in their hands in a way that's aligned with democratic oversight and democratic and human rights principles. It's… it's not going to be easy, but I think I don't see any other way. So we have to play all the ways to protect ourselves… with regulation, with international treaties, with research to better understand the risks and mitigate them. And in case none of these things work, we need to kind of prepare some ways to protect our society.

Garrison Lovely 47:24

Yeah. And if the existential threat from artificial intelligence is real, all countries have an interest in addressing it.

Yoshua Bengio 47:32

Yes, absolutely.