Peter Wildeford, Chief Advisory Executive at the Institute for AI Policy and Strategy, joined the podcast to discuss forecasting 101, the U.S. government's forecasting track record, integrating forecasters into government, AI’s societal impacts and opportunities, AI’s improving software skills, AI-powered forecasting systems, future AI trajectories, and more.
Available on YouTube, Apple Podcasts, Spotify, or any other podcast platform.
Our music is by Micah Rubin (Producer) and John Lisi (Composer).
Relevant Links
Can Policymakers Trust Forecasters? (Gavin Leech and Misha Yagudin, Institute for Progress)
What Were You Thinking? Biases and Rational Decision Making (Ted Thomas and Robert J. Rielly, InterAgency Journal)
Vague Verbiage in Forecasting (Good Judgment)
Assessing and Predicting Technology Outcomes program (U.S. National Science Foundation)
Generating Evidence Using the Delphi Method (Dmitry Khodyakov, RAND)
A practical guide to structured expert elicitation using the IDEA protocol (Victoria Hemming et al., Methods in Ecology and Evolution)
AI 2027 (Daniel Kokotajlo et al., AI Futures Project)
Response to OSTP RFI on AI Action Plan (Institute for AI Policy and Strategy)
What is the Government Doing to Prevent Wildfires? (Western Fire Chiefs Association)
Incorporating AI impacts in BLS employment projections: occupational case studies (Christine Machovec et al., U.S. Bureau of Labor Statistics)
Translator employment share of total U.S. employent grew from 2023 to 2024 (Basil Halperin, X)
Lost in Translation: Artificial Intelligence and the Demand for Foreign Language Skills (Pedro Llanos-Paredes and Carl Benedikt Frey, Oxford Martin School)
Why the AI world is suddenly obsessed with Jevons paradox (Greg Rosalsky, NPR’s Planet Money)
What the Story of ATMs and Bank Tellers Reveals About the ‘rise of the Robots’ and Jobs (James Pethokoukis, American Enterprise Institute)
For Some Recent Graduates, the A.I. Job Apocalypse May Already Be Here (Kevin Roose, The New York Times)
Forecaster reacts: METR’s bombshell paper about AI acceleration (Peter Wildeford, Substack)
Details about METR’s preliminary evaluation of o3 and o4-mini (METR)
Predicting Empirical AI Research Outcomes with Language Models (Jiaxin Wen et al.)
Approaching Human-Level Forecasting with Language Models (Danny Halawi et al.)
Q4 AI Benchmarking: Bots Are Closing The Gap (Tom Liptay et al., Metaculus)
When will an AI program be better than humans at making Metaculus forecasts? (Peter Wildeford, Metaculus)
Timestamps
00:01:19 - Why explicit probabilities beat vague predictions
00:05:34 - How forecasting skill develops through practice
00:11:39 - Forecasting failures in 9/11 and Iraq War
00:15:17 - Why no one can predict AI's exact future
00:21:06 - Building AI threat assessment capabilities
00:26:27 - Seven key AI risks facing government
00:31:35 - AI’s opportunities and benefits
00:37:20 - Why employment projections may miss AI’s impact
00:44:50 - AI software capabilities double every 7 months
00:52:58 - When AI forecasters will beat humans
00:58:19 - 2025 as a pivotal year for AI development
01:03:13 - AI will transform society more than the internet
Transcript
This transcript was generated safely by AI with human oversight. It may contain errors.
(Cold Open) Peter (00:00):
AI is probably better than a typical person at forecasting just because a typical person doesn't have much experience.
Jakub (00:16):
Welcome to the Center for AI Policy Podcast where we zoom into the strategic landscape of AI and unpack its implications for US policy. I'm your host, Jakub Kraus, and today's guest is Peter Wildeford. Peter is Chief Advisory Executive and Co-founder at the Institute for AI Policy and Strategy, or IAPS, a think tank focused on enhancing national competitiveness in AI and mitigating emerging risks. We discuss forecasting 101, the US government's forecasting track record, integrating forecasters into government, AI's societal impacts and opportunities, AI's improving software skills, AI-powered forecasting systems, future AI trajectories, and more. I hope you enjoy.
(01:08)
Peter, thanks for coming on the podcast.
Peter (01:15):
Yeah, thanks for having me. Looking forward to chatting.
Why explicit probabilities beat vague predictions
Jakub (01:19):
Yeah. You are a board member at this forecasting platform called Metaculus, and you have a really good track record there. I want to just go over for the audience the definition Metaculus gives of forecasting, which is "the practice of putting explicit probabilities, dates and numbers on future events - calculating the odds [with] both models and human judgment." So what do you think more policy makers should know about forecasting?
Peter (01:59):
Yeah, I think there's a few important things to know about forecasting. I think firstly, we actually, we're always forecasting all the time, we're just doing it implicitly. But all policy makers are making kind of intuitive judgements about the future. For example, what will particular policies accomplish? Will they backfire in unexpected ways? Will they be able to be enforced correctly? Will they be future-proofed as technology changes? Will they be fit to purpose for what the policies are trying to do? So we're all implicitly forecasting, but what I think Metaculus seeks to do and what the study of forecasting seeks to do - say from Philip Tetlock's 20 years of work on studying forecasting - is try to be very explicit about this, taking a more CIA analysis-style approach to all sorts of different things in life.
(03:04)
And so I think there's a few problems with forecasting as it's practiced implicitly. I think firstly is Philip Tetlock, who studied forecasting in political and geopolitical domains, found that a lot of political experts made predictions that were no better than chance. Basically you could perform the same just by flipping a coin and going with whatever the coin says. That's not really good if you're making really high stakes decisions based on your forecasting.
(03:39)
I think another issue is what Tetlock refers to as vague verbiage such as using "it may happen," "it could happen," "it might happen." Those are three different things, but they all kind of sound the same but also sound a little bit different and it's a bit confusing. For example, when Kennedy was launching the Bay of Pigs Invasion, he was basing it off of a Joint Chiefs' assessment of a fair chance of success, where apparently by fair chance the Joint Chiefs was referring to less than 50-50 likely, but Kennedy interpreted that as a higher likelihood of success. So avoiding these fiascos with very explicit statements about levels of confidence I think could just lead to better decision making. So I think the kind of main idea behind forecasting is better decision making.
(04:40)
I think lastly is that when you're assessing forecasting, you could actually develop a track record, like an actual quantitative track record by keeping track of the predictions you've made and how they've turned out. And then you can find that people who tend to have a good track record in the past also tends to have a good track record in the future. And so rather than listening to various TV pundits who kind of spout off but without actually really having a good record of making statements, I would recommend listening to I guess forecasters like myself - to be a bit grandiose - or websites like Metaculus that have developed these track records and speak in very specific, easy to understand ways and ultimately leading to better decision making.
How forecasting skill develops through practice
Jakub (05:34):
Yeah, it's really interesting that some experts are doing no better than chance. I think we assume that they're going to do a lot better than chance. So people's next question might be, can you do much better than chance? What's so special about the people who spend time on Metaculus? How much of an improvement could they really get?
Peter (06:04):
Yeah, honestly, I don't think there's really anything magic about it. Basically, like many other skills, it improves with practice. So if you try actually making explicit forecasts where you put specific probabilities on specific events like you can do on the Metaculus platform and you keep track of your performance over time, then you can kind of see what you get right and what you get wrong, and if you kind of do post-game analysis, you can then kind of reflect on your mistakes and improve. Honestly, it's kind of like playing any sort of sport. You're not going to expect to do amazing at football on your very first game, but if you practice and then you film the games and you roll the tape and you look back at what went right, what went wrong, and you do specific drills, you can improve your football skill. I think the same is true of forecasting as well.
Jakub (07:04):
Yeah, I think for some of these domains you were talking about how policymakers are making lots of forecasts about whether a policy will work out, whether a tax policy will do something good, whether tariffs will have a certain effect, whether doing something with the minimum wage, I'm kind of listing a bunch of economic policies, but you could do it for AI policy, you could do it for healthcare. So when people hear the forecasters are doing so much better, I think there still might be some hesitancy because these forecasters might be predicting unrelated things. So do you think it's important still to have the subject matter expertise? Maybe we just have the existing healthcare policy wonks take a few practice rounds or courses on forecasting, practice it get better at it for a year until they start having some diminishing returns and then they'll be doing much better at their job. Is it just a matter of training people up in a little forecasting?
Peter (08:10):
Yeah, I mean, I think for high-stakes forecasting where you're making a go/no-go decision based on an expectation of the future, I think having some forecasting training where that level of precision matters I think would be pretty beneficial. I think one thing that Philip Tetlock's research has pointed to that's something I'm quite interested in is just this idea of generalist forecasting skill, which basically is if you're kind of well practiced at the art and science of putting quantified numbers on things that are very hard to quantify, that this actually can work in a lot of different domains - even domains that you don't have much prior exposure to, you can still kind of outperform domain experts just because the domain experts know an awful lot about their domain, but just are not very skilled or practiced at putting numbers on those hard to quantify things.
(09:15)
And usually the best way to thread the needle on this according to Tetlock's work is what's called forecaster subject expert teaming, which is basically you form a team where you have domain experts sort of analyze the question, break it down into reasons for and against, and try to put a probability on it, and often they're very skilled at breaking down the problem and very skilled at figuring out what matters and what doesn't, but less skilled at actually putting the final number on the probability. And that's again, not because forecasters are better than everyone else, but just due to it being a separate skill that needs separate practice. And then generally the forecaster on the team can then come in and read the breakdown, and they're generally much less good at creating that sort of breakdown, but much better at reading the breakdown and then figuring out actually what number that implies. And then usually through their powers combined the domain expertise combined with the forecasting skill at putting numbers on things, you end up with a stronger result than the domain experts alone or the forecasters alone.
Jakub (10:37):
So there might be some benefit to combining domain experts with the generalist forecasters.
Peter (10:44):
Yeah, I think that's generally the area I'm most excited about. In my professional work, when I am trying to generate higher stakes forecasting where the precision, the accuracy really does matter, generally the thing I try to do is recruit a small panel of domain experts and forecasters and kind of work in sort of an iterative loop where each person attempts to forecast individually and then brings this together and discusses as a group and then goes back individually and refines their forecast based on everything that has been said and then kind of creating a weighted average across those forecasts for the final answer. Yeah, so I think the Delphi method is one such example of this as well as the IDEA method.
Forecasting failures in 9/11 and Iraq War
Jakub (11:39):
Yeah, the Delphi method I think is originally from RAND. How much forecasting has already been going on in policymaking?
Peter (11:52):
Yeah, I think it kind of really depends on the area. I think obviously government intelligence communities I think forecast quite frequently. A lot of Tetlock's original studies were kind of in that sort of intelligence context, and there's certainly a lot of forecasting expertise within the US government within certain parts and within RAND. I don't think that necessarily permeates across the entire policymaking world, which is a very large world. And of course there have been very high profile intelligence failures that have honestly really just been forecasting failures, and sometimes it's not even putting the wrong number on things, but I think failing to ask the right question in the first place and not even anticipate that something needs to be forecasted.
Jakub (12:43):
Do you have specific examples that you're thinking of there?
Peter (12:47):
Yeah, I guess I think a really famous intelligence failure is both 9/11 and the Iraq War. I think with 9/11, obviously a key question would just be what is the likelihood of a terrorist attack on US soil leading to a certain number of deaths? What is the likelihood that airplane hijacking would be the key threat model that leads to this terrorist attack versus other alternative threat models that could have happened, but didn't? And then based on that, the US government could make a decision, like how much level of resourcing should there be to airline security enforcement? Obviously, airline security enforcement went up dramatically after 9/11. I think ideally there would've been more airline security before 9/11, and hypothetically if we had known to ask the right questions and make the right assessments, we could have oriented policy around that. And I think also a lot of US intelligence apparatuses separately had individual pieces of the puzzle that could have been used to make an accurate intelligence assessment, but it was more of a communication and coordination failure. And this is also what led to the creation of the Director of National Intelligence, a role currently held by Tulsi Gabbard, whose main job is to coordinate all the intelligence agencies and make sure they are sharing the information needed to make accurate assessments.
(14:28)
And then of course, with the Iraq War, obviously the key question was does Saddam Hussein in fact have weapons of mass destruction? With the idea that if Saddam Hussein was about to make a nuclear weapon or a biological or a chemical weapon, then the US would indeed be quite justified in invading Iraq. But of course, there were no actual weapons of destruction found with Hussein. So that was kind of like another failure of forecasting where we were indeed asking the right question - we were asking whether there were such weapons - but led to very overconfident assessment that ended up being kind of devastatingly wrong of course.
Why no one can predict AI's exact future
Jakub (15:17):
It sounds from these examples - things like 9/11, the Iraq War - that doing forecasting can solve enormous problems. I think sometimes people hear, especially the probabilities and math side of forecasting and they feel a bit squeamish because it (1) might give a false sense of confidence - I mean, maybe the best forecasters, it gives a bit better than others. (2) it might encourage people to focus on questions that are easier to forecast accurately, where you can do better than just a 50-50 guess. So, and shifting a bit to AI, what are the really useful things forecasting can do and can be done quantitatively?
Peter (16:14):
Yeah, I mean, I think kind of on forecasting, I think it can be a bit daunting, but again, it's something we do every day. Every morning we decide what to wear, we decide whether to take an umbrella with us or not, and we're doing this based on either implicit or explicit weather forecasting. And as smart as the weather man is, and as good as weather models are today, we don't know whether it will actually rain with a hundred percent certainty. And there have been many cases where we don't think it will rain, don't take an umbrella and then get rained on, and it's really unpleasant. We don't have that crystal ball. So when we're saying 80% chance of rain, basically just saying 80 out of a hundred possible futures we might face will have rain, but there's still the 20 where you don't, or I guess vice versa, in your case, if there's only a 10% chance of rain, there's still 10 days out of a hundred where it's going to rain even though you don't think it will. And you don't... I mean some people do blame the weatherman, but it's kind of just a failure of the weatherman from being a perfect forecaster of the future and instead just trying to present a range of possibilities.
(17:33)
I think this can be really confusing with AI as well, where I think there's a lot of hype but also a lot of uncertainty. And anyone who is coming on your podcast and telling you exactly what the future holds year by year and exactly what's going to happen with AI, they don't actually know what they're talking about. That's not a good forecast because there's just way too much uncertainty with AI. There's a lot of worlds where AI moves much faster than we expect and catches us all off guard. Also, a lot of futures where AI ends up being maybe a slower deal or less a big deal than we think, and it's just kind of important for the forecasting community, the technological analysis community to be kind of presenting sort of a range of possibilities and figuring out how to communicate that well similar to how a weatherman communicates the weather.
Jakub (18:36):
Yeah. The first thing that came to mind on predicting the future is the popular AI 2027 report that came out. It had two different endings, so it was showing that there's different ways things can go. But how from a forecasting perspective do you view the value of that scenario storytelling forecast document?
Peter (19:08):
Yeah, I mean I think AI 2027 is a really excellent meticulous piece of work. I think some people have derided it as just being science fiction, and honestly, they are correct. It is a piece of science fiction. It's depicting a particular concrete story that hasn't happened and may well not happen. But it's kind of more like hard science fiction where they really try the best they can to try to represent a vivid scenario based on what technical analysis and forecasting suggests is quite plausible. And so it's still a story I think worth taking seriously, but it is just one possible view of the future. And also when they're making many, many different claims about what will happen based on what else will happen based on a third thing that will happen, all that cascading interactions just magnifies to make it just incredibly unlikely that any one particular story will be what happens.
(20:18)
So I think the way to interact with it is basically being sort of a science fiction story where it's very vivid in terms of what may happen and just trying to concretely imagine that there is some chance that things could play out broadly like this, and it then behooves us to be prepared for that and to anticipate that. But then also there's a lot of worlds where the story is just frankly incorrect or that is correct in some ways, but wrong in others, or that is mostly but not entirely correct. Basically no professional forecaster, no matter how good they are, is going to say a hundred different claims and get all one hundred right.
Building AI threat assessment capabilities
Jakub (21:06):
That makes sense. For your response to the AI action plan through the Institute for AI Policy and Strategy or IAPS, you had this recommendation to establish a Rapid Emerging Assessment Council for Threats - the acronym is REACT. And it could "rapidly convene cross disciplinary subject matter experts to assess sudden emerging or novel AI related threats to critical infrastructure or national security." And that's in the name "React" - giving the government capacity to react and respond to changing AI developments. And still sticking with the forecasting trend, I wonder if there's a way to develop proactive capacity around AI. I think one recommendation in your action plan response is related to this directing the Office of the Director of National Intelligence to assess AI capabilities in places like China or Russia or strategic adversaries, and there's also some recommendations to strengthen AI evaluations, strengthen measurement tools around AI. So I think all of that stuff could be used for forecasting. I think maybe there's other info you might want to gather for forecasting. But what would be your ideal vision if you were running a relevant government agency like Commerce or the ODNI or the White House Office of Science and Tech Policy? What kind of AI forecasting questions would you be going for?
Peter (23:00):
Yeah, I wish I was able to be in charge of all that. That sounds quite interesting. I think the biggest challenge for the US government is just AI is a very new technology, a very critical technology, and a very poorly understood technology. But given its importance that we can see and also feel, it's just incredibly clear that the government needs to know what's going on with AI and that we need all sorts of different ways, we need to be able to react. We also do need to be able to forecast and anticipate and that the government just needs more technical talent throughout the entire government who can understand AI and anticipate it and be able to forecast and orient accordingly.
(23:54)
I guess if I could make an analogy, potentially you might think of it as fire. I guess AI's sometimes been compared to a lot of different things. I think like AI, fire can be used for good, it can be used for evil, it's an important power source, but also can burn down buildings. And when you think about how does the government react to the threat of fire, the government does not want all the buildings to burn down. So we have a fire department that when fire is detected, you dispatch the fire trucks and you go and you put the fire out. So that's kind of what we're thinking about when it comes to react.
(24:33)
But there's other layers of defense against buildings burning down too. Ideally, we don't wait for the buildings to burn down and then put out the fires, but we anticipate and forecast what is it that might cause a building to catch on fire? How can we prevent that from happening in the first place? How can we mitigate that? And so we do have parts of the government that are dedicated to assessing fire risk and figuring out what ways things should be built, what ways things should not be built to reduce the risk of a building catching fire in the first place, reduce fire hazards. And that involves a lot of forecasting skill. It also involves particular building codes. You could have rules of how you build your buildings. This doesn't harm building innovation or mean that China's going to outcompete us at building apartments. Instead what it means is that we're just building buildings that we don't want to burn down and that we need specific kind of guidelines to tell us how to manage fire safety within buildings. And I think the AI approach could be broadly similar too where we combined the Rapid Emerging Assessment Council to kind of be like the firetruck where if there's areas of AI we don't know catch on fire, we can go and put the fire out. But that also we might have the building codes equivalent for AI or the forecasting equivalent for AI where we can understand where might AI catch fire and how can we avoid those fires without needing to send the fire trucks in the first place. Because no one likes it when their building's burning down.
Seven key AI risks facing government
Jakub (26:27):
It's a good analogy to fire, I think it's illustrative. If you were to continue on this thread, what would that look like? What specifically would be these fire building codes or fire risk prediction mechanisms in the context of AI? Is it just something like forecasting when AI will get above X percent on benchmarks? Are there other things that can be done?
Peter (27:04):
Yeah, I mean I think AI is a very multifaceted problem. There's a lot more than just what can AI do, but there's a lot of different ways AI could help the world tremendously, but also a lot of ways where it could catch fire, so to speak. And I kind of think of there maybe being seven key challenges when it comes to AI. I think first there's kind of various forms of misuse, what happens when the bad guys can set fire to things. Basically the arson equivalent of AI, which would be using AI to launch massive cyber attacks against critical infrastructure, using AI to help terrorists build chemical weapons, biological weapons. The same kind of AI that can tutor us in science and advanced medical cures also can tutor us in virology and advance the creation of novel diseases. So I think that's an important area to watch out for.
(28:10)
I think also as AI becomes more widely integrated across the economy, across government, we know that AI itself is beginning to be able to take independent actions. Those actions right now are not really that impressive. To be honest, I haven't even had AI successfully be able to book me a hotel room, let alone take over the world, but we can already see that AI is increasing in sophistication and that eventually the actions it's going to take are going to be increasingly sophisticated as well. And so we may have a genuine concern about losing control to a rogue AI system.
(28:55)
Likewise, back on the misuse side, may have concern about losing control of our AI systems to an enemy nation like China or Russia or North Korea. Adversaries that seek to kind of harm us and would be very interested in developing their own AI tools to do so, very interested in stealing American AI tools, data poisoning or otherwise manipulating AI tools to wreak havoc.
(29:26)
Additionally, it's like possible there could be wars fought over AI at some point. There's also potential for AI to cause a lot of undue concentration of power or corrupt society. As well as just kind of like a catchall, all of the above for ways AI might strategically surprise us or destabilize society based on being able to develop technology or develop in a way that we don't anticipate.
(29:57)
And so we need to... I think when rolling AI out for all of its positive benefits, just like we use fire to power all our homes and to keep us warm. We're not trying to ban fire by any means, but we do need to be aware of all the various ways that AI or fire could go wrong - the AI equivalent of arson, the AI equivalent of wildfires, the AI equivalent of accidentally catching fire in the microwave while you're cooking. Various ways fire can go wrong, various ways AI can go wrong. I hope I'm not overusing the analogy, but just kind of pointing to a lot of different things the government's going to need to keep an eye on.
(30:48)
And I didn't even actually talk about labor impacts as well. There's also, even if AI is going well, there's going to be the whole question of as AI takes over more and more jobs and operates those jobs better than humans typically do, what happens to those humans that are displaced? How is society going to adapt to that? So just like a lot of adaptation that's going to need to take place, a lot of ways we're going to have to carefully keep an eye on this technology to make sure that the good uses are emphasized and the bad uses are mitigated or prevented. And that's going to be a huge forecasting challenge.
AI’s opportunities and benefits
Jakub (31:35):
That's a really comprehensive look at, well, I'd say that's a lot of the risks side. I guess first stopping there, do you have an equivalent framework for looking at the opportunities of AI and the benefits?
Peter (31:56):
Yeah, definitely. Yeah, I don't want to be a negative Nancy about AI by any means because I think there is a tremendous AI opportunity for America, for the whole world. I've been trying to think more about stories where AI goes just really well for us as well. And I think basically we're already seeing the very beginnings of AI being able to assist scientists with innovating in medicine, innovating in other scientific domains, innovating in material science, physical science, industry, logistics, and all these ways really just can complement our existing innovators and entrepreneurs and actually usher in a golden age of American innovation. And I'm cautiously optimistic that this transformation, if done with a fundamental commitment to human freedom, could really just kind empower all of us to live better lives. Each individual human, they could have unbounded creativity. Right now when we want to do things, we're kind of limited by our own resources and our own abilities, but with AI powered abundance, AI powered creativity, there really wouldn't be any potential limits to what we can do.
(33:26)
But it is going to be very important... I'm not trying to create some sort of paternalistic utopia where AI systems dictate our choices for our own good, where AI systems know better than we know how to live our own lives. But where AI systems kind of enhance and uplift existing human creativity and give us tremendous economic prosperity, but still where kind of human freedoms are really center to that. And each individual human still gets their values and preferences respected and preserved and retain the ability to make decisions that are best for them, even if those decisions differ from what an optimization algorithm might recommend.
Jakub (34:16):
Then taking the frameworks of the risks, the benefits, let's say you have a 50 person government team or maybe for the sake of brainstorming fuel a 500 person team in the government at your disposal, and you can direct them in different buckets of different things to actually forecast. What would the buckets be? What specifically are the main areas to forecast on? Or if that's not a helpful way to cut things, what are the questions they should be asking?
Peter (35:01):
Yeah, I mean, I think honestly if we've gotten to the point where there's a 50 person to 500 person team within the US government that's thinking very thoughtfully about the potential risks and benefits and opportunities of AI, I mean, I think that's already kind of the victory that I've wanted to achieve in the first place. So I haven't really thought that far in terms of what specifically they do, but I think honestly, if they're the right people and they are in government and that kind of quantity and with that kind of technical skill and they are listened to, that sounds like a pretty ideal place to be already.
(35:35)
I think in terms of what they should orient around, I think thinking about AI as a very multifaceted issue, I outlined several challenges. I outlined several risks and several opportunities. I think government really does need to sort of walk and chew gum at the same time. The government can't afford to laser focus on only one issue and then completely ignore five others because I think as good as we are at forecasting and as skilled as we may be, there's still going to be an awful lot of unknown unknowns and still an awful lot of uncertainty when AI unfolds.
(36:16)
Again, kind of like the weather, the government needs to be prepared for it to rain and it needs to be prepared for it to be sunny regardless of what the weatherman says. Just because even a 5% risk, a 1% risk, those still happen all the time. And we prepare for all sorts of things that are really unlikely. We buckle our seat belts in the car even though a car crash is unlikely. We prepare for war with countries even if we hope that that will never come to pass.
Jakub (36:49)
Some people don't eat cookie dough.
Peter (36:52)
To be honest, I eat cookie dough. So again, sometimes it's a risk benefit analysis too, and sometimes we can accept risks and yeah, we can secure the benefits of cookie dough while mitigating the risks of salmonella. And I think we can do the same thing with AI as well.
Why employment projections may miss AI’s impact
Jakub (37:20):
I think I'd like to zoom into one particular area here, the employment side. You talked about it a little bit. So the Bureau of Labor Statistics had some employment projections recently that incorporated AI, but I noticed they wrote their projection methods are "not designed to capture extremely rapid technological change and therefore assume that the overall pace of technological change will be consistent with past experience." That's the full quote. So I think that overall assumption might be tenuous if AI models are automating lots of cognitive labor and tasks and jobs that are currently done remotely in the future. And included in that is research on making better AI software and better robots as well. So do you have any recommendations for this area of employment? How could BLS or other offices and entities forecast employment when accounting for potential rapid technological change through AI doing science and development?
Peter (38:42):
Yeah, I think you've kind of really captured on a really key point, which is that AI is not going to be a gradual incremental technology. I would say first of all, I'm not an economist. I'm not experienced with labor statistics, and I would definitely the Bureau of Labor statistics for doing these incorporations of AI impacts in the first place. I think it's a good starting place, even if their methods are not necessarily designed to fit the way I anticipate AI will actually unfold. I guess it's a really good start that they're considering this at all. But I think I would, if I were to make some inexperienced recommendations, some armchair forecasting for BLS, I think I would recommend they consider other models for how AI might permeate the labor force.
(39:39)
Because I think you have multiple things happening at every given time. Basically each month, more and more companies adopt AI that already exists. And more and more companies, I think generally, maybe they don't fire workers and replace them with AI, though I imagine that does happen some of the times, but maybe they also just hire less than they otherwise would because AI systems can pick up the slack or maybe when people quit naturally, their roles don't get backfilled and instead get kind of partially automated. And also maybe people's jobs are being reassigned based on what AI can do or cannot do. And then certainly there's probably a lot of jobs that are just kind of getting replaced dramatically. There's certainly much less need for translators now that AI is very experienced at translating. Maybe it can't do everything a human can do in the translation domain, but it certainly can do like 90% plus of all the translating. Likewise, AI...
Jakub (40:48):
To jump in here, I saw a post on Twitter, so that's not the strongest evidence, but it was from an economist who is looking at some of the data on employment statistics. Translator employment seems to be going up, which is a little bit puzzling. But yeah, I take your point overall that it seems like jobs are being affected. It seems like entry level jobs, people are having some trouble there. Maybe that might be from AI. So yeah, keep going.
Peter (41:17):
Yeah, no, I think that also gets to these labor impacts being really hard to predict, which is I think why it's just good to be doing tracking in the first place. But I would kind of keep an eye on the front line of change such as just job openings in the first place. Are those going down? That may happen before you see widespread unemployment, but kind to your point on the translators, unsure if that's true or not, but if it is true, it could also speak to what's called the Jevons paradox, which is kind of this idea that as some things get cheaper - like translation skill or automation of translation - you may actually want to buy more of it because a single translator can now do so much more translating than they would do previously if they can be mainly supervising and administering and checking the work of AI systems.
(42:17)
But also with each increase in adoption of AI that we already have there also ends up being new developments in AI systems, new capabilities. And so there's multiple changing dynamics unfolding at the same time, and those really kind of stack on top of each other, which kind of makes a more gradual wave seem just kind of less likely if it's increasing adoption on top of increasing capabilities on top of increasing compounding labor effects.
(42:50)
But yeah, definitely want to maintain some humility about not really knowing how this will all unfold. I think another example was when ATMs became widespread, a lot of human tellers used to be doing those functions, but then their functions kind of changed, and I think bank employment overall wasn't affected. And it may have even gone up again due to Jevons paradox type things of just business to consumer banking becoming a lot easier, becoming cheaper to open up more branches and just technological connection making a lot more people want to bank in the first place.
Jakub (43:31):
Yeah there's self-checkout at supermarkets too.
Peter (43:35):
Yeah. Yeah, that's definitely another area where you kind of see a form of automated employment, even though that doesn't actually really have much to do with AI at all. But does that mean that more stores can open up and then there's still more cashier jobs overall, or does that mean that cashier jobs are plummeting? I think these are good statistics that BLS should be keeping an eye on.
Jakub (44:01):
Are there any questions on Metaculus or other forecasting platforms that you've been interested in watching how the forecast is evolving over time related to AI?
Peter (44:15):
Yeah, I think there are some good employment questions on Metaculus that can keep track of some of the labor impacts. And also just questions in general about how AI might unfold, or also what the longer term implications of AI might be. I think AI questions end up being some of the most popular on Metaculus. There's a whole section for AI questions. I can give you kind of a link to include in your show notes if that's a thing that you do.
AI software capabilities double every 7 months
Jakub (44:50):
Yeah, I do. Yeah, I could put that in there. So on this, you were talking about BLS projections, not necessarily accounting for how you see AI unfolding. I think you have a good Substack post that unpacks some of that. It's on the Model Evaluation and Threat Research, or METR, team's latest research paper. They're studying AI autonomy and software engineering - specifically I believe, trying to figure out how good AI is at doing AI research. The paper has a finding that the length of tasks - I'm reading from the paper - measured by how long they take human professionals on. Oh just for background, I think I had mentioned it already, but this is software engineering tasks and they're kind of specific and maybe you'll talk about that. But the length of tasks measured by how long they take human professionals, that generalist autonomous frontier AI model agents can complete with 50% reliability has been doubling approximately every seven months. And that's held over the last six years. Then this paper came out and then a little bit later, maybe a month or two, OpenAI gave METR early access to their latest reasoning models o3 and o4-mini and METR found those models might be going even faster than that original seven month doubling time.
(46:30)
So as a statement alone, without going into the nuance, it sounds like wow, AI is really quickly doing complex tasks, tasks on its own related to AI research, and it's doing them for longer and longer stretches of time. And not only is it doing that and doubling every seven months, maybe it's doubling seven months now, then doubling six months a little later, then doubling five months later than that. So you had the Substack post going into that. What is the nuance or the caveats you think policymakers should be knowing when they're interpreting this paper's findings?
Peter (47:12):
Yeah, thanks. Was very excited to put together that post and I think really appreciated the paper from METR. And I think a lot of hard work has just been going into getting that frontline data to track the trends, similar to our BLS conversation about just getting the data to be a really important part. But then the even harder part is how to interpret this. And I think the most important things to keep in mind when interpreting this finding is it is very much specifically about software engineering tasks. But moreover, also, all tasks are very clearly defined and solitary and static and contained and simple and basically kind of a lot of real work doesn't necessarily really look like that. Like, in the real world, we need to coordinate with each other, we need to figure out what it is that we mean in the first place, and we need to do messy things with really poor feedback. And I think one of the interesting things to see in the paper was that AIs don't do very well when there's really low feedback environments. I think that's kind of an area where humans tend to excel.
Jakub (48:37):
What's a low feedback environment?
Peter (48:40):
So what I mean by low feedback is basically I think with a lot of these cases, the AI is trying to solve a problem with a known answer. And so it attempts a solution and then it can observe whether the solution succeeds or fails. And if the solution fails, it can observe an error message or specific information about why it failed, and then it can kind of craft the second iteration around that feedback. But in the real world, it's actually we don't necessarily get that level of instant feedback where we can see if we failed or why we failed. For example, I'm doing this podcast with you, maybe I'm doing a really great job and everyone loves this podcast. Or maybe actually I'm really terrible and this is going to be your least podcast of all time. But I'm not getting live feedback right now. You're not being like, "hey, you're really not good at this podcasting thing." So I'm just winging it, talking to you right now. Very low feedback environment, and I'm not going to see whether the podcast does well or does poorly until it's really too late to shape it. And that's the kind of thing that AI tends to be pretty poor at.
Jakub (50:01):
Lucky for you.
Peter (50:03):
Yeah, if you're doing an AI interviewing an AI on your podcast, maybe it'll be terrible, but it won't really get a chance to adjust without that feedback. I feel like, yeah, maybe podcast guests are going to be some of the last professions on earth. So looking forward to starting my career here in podcast guesting. But yeah, so kind of very difficult to get this data in the first place for the METR paper, even more difficult to translate that to real world performance.
(50:40)
But I do want to also caution your listeners. Maybe a lot of you are listening to me and being like, "oh, that METR paper must be like a pile of junk and AI will never amount to anything." I think that it's also important to kind of take the other side as well, which is basically we are seeing models succeed at tasks that they used to not succeed. This level of success is indeed increasing over time. And we've kind of just began the process of training AI agents to be able to take specific job relevant actions. We've just started training models to be able to experiment and iterate, and a lot of these skills can speed up. So I think literally three years ago there wasn't an AI on the planet that could code software. And now, I mean, I say this actually having been a software engineer for several years professionally, AI honestly can do a lot of software relevant tasks better than I can. And that's just really truly phenomenal. And just thinking where AI might be in the software domain, let alone other domains in just a few years time is pretty tremendous. And I think also if AI ended up becoming really good at programming software, that may actually be some key skills where AI could then figure out how to develop even stronger AI systems and you could even get a runaway feedback loop. So all this to be said, it's again a very large range of futures, some futures where AI doesn't really amount to much at all for a long time. And some futures where AI just takes all our jobs in just a matter of years. And I tried to meticulously map out what I was viewing as the range of futures based on the data in the METR paper and trying to translate that into a more general prediction for labor automation and AI impacts in labor.
When AI forecasters will beat humans
Jakub (52:58):
Speaking of taking jobs, I want to talk about a little bit AI's role in forecasting, whether the forecasting jobs are safe. I think this seems like a separate, just a random profession, but it does actually tie in a bit to potential speed ups. There's some interesting science speed ups you might be able to do if instead of doing every single research experiment, you just predict whether it will succeed or fail. And then for the ones where you're unsure, then you do the full experiment. There have been some papers on AI forecasting, sometimes using language models as a cog in a machine, as a step in part of a bit of a broader pipeline, and they found some great successes. There was also on Metaculus, an ongoing, I think, series of quarterly prize contests, and it's looking at who can build the best AI forecasting system. They recently analyzed the results from quarter four of last year, 2024, and the post was called Bots are Closing the Gap. And it found - reading from the post - "Metaculus Pro Forecasters were better than the top bot 'team.'" So humans still in the lead, "but not with statistical significance." And there's even a Metaculus question called "When will an AI program be better than humans at making Metaculus forecasts?" That might be one of the last questions to resolve, I guess. The aggregated community prediction is 50% by 2031. So how good is AI at forecasting today?
Peter (54:53):
Yeah, I think actually it's funny that you mentioned that question because that's a question actually I personally contributed to the Metaculus platform, and I contributed that back in 2021. So kind of tracking how well AI is at taking over my favorite hobby of forecasting is definitely something I've had my eye on for a while.
(55:15)
I would say that I think AI right now are increasingly skilled at some areas of forecasting, especially if you think that a really key aspect of forecasting is rapidly getting up to speed in a particular new domain and just really trying to learn the key factors that make a probability higher or lower. AI is actually exceptional at that. AI can search the internet, tree all the relevant documents and research and news articles, and analyze them and digest them into key factors. And that's actually something AI is extremely skilled at. So even as a forecaster myself, even as doing this as a human, I still rely significantly on AI tools at minimum to try to digest all the relevant information into key factors for or against and then be able to present that to myself.
(56:25)
And then I think, of course, the final part is taking all those key factors and putting a number on it. And I think this is also an area where AI is developing emerging skill, but where it's still not quite at the level of a human forecaster, at least a skilled human forecaster. I think AI is probably better than a typical person at forecasting just because a typical person doesn't have much experience. But if you're competing at the highest levels of forecasting, I think there's still kind of a lot of ways where AI can get things wrong or embarrassingly wrong.
(57:07)
I think generally what I've seen is AI is pretty good at figuring out the revealed wisdom of what the news is saying, what people are saying. So if there's people quoted and the media saying something seems 50-50, the forecast will pick it up as being 50-50. But I think what the true forecasting skill is right now is kind of figuring out something that's counterintuitive, figuring out something that the conventional wisdom has missed, and then actually being, not being a conspiracy theorist, but actually being right about what was missed. And I think that's usually what powers some of my best forecasts. And I haven't really seen AI be able to do that yet, but it's certainly getting a lot more close every year. And so I think I would probably agree with Metaculus that maybe around 2031 or so, I don't think I may have much to add to AI forecasts anymore. Kind of similar to how I can't beat AI at chess even if I was really well practiced at chess, which I'm actually not.
2025 as a pivotal year for AI development
Jakub (58:19):
Wow, okay. Maybe this is, there's certainly benefits there if you put this in the US government. As we were circling back to the start of the interview, we were talking about the need for forecasting expertise. If it becomes that you can have forecasters on tap by just hooking up a GPU and running the AI model on it, then could be good. I don't know what the net effects would be if everyone had access to that. But we're coming to the close of the interview. So how do you predict the next roughly two, three years of AI will go? What do you think are the main trends audience members should be watching and seeing unfold?
Peter (59:11):
Yeah, first I guess want to comment on the net effects of AI forecasting skill. I mean, I do think that would just be extremely beneficial, especially if all of us had quite wise informed advisors that could really help us accurately anticipate the effects of our actions. I think just kind of informing everybody would just sort of lead to a better society, especially if we were still free to make mistakes or do things the way we want, at least as long as we're informed. I think that's pretty key to improving society while kind of maintaining human freedom like I was mentioning. And so I think that's one of the areas I'm really optimistic about AI.
(59:58)
I think in terms of what to keep an eye on, I think what I've been trying to communicate is that I certainly have ideas and opinions about how the next two years will unfold. But I think kind of like a weatherman, I sort of want to give you more of a range of possibilities and just suggest to you different paths the future may go. So I think I probably won't just tell you one specific story that I think is likely or not likely, but instead suggest things to keep an eye on. I think this year, 2025, so as of today, AI agents don't seem particularly helpful. I haven't really benefited from these ideas of AIs that could book flights or book hotels or help you shop for groceries. It's just not really working or coming together for me right now. But this is an area of extremely heavy investment, extremely heavy effort. I would think that it's plausible that by the end of the year, there actually will be pretty good AI tools that can just really help automate a lot of simple things in your life. I think if we're getting to the point at the end of the year where these tools are starting to be quite helpful, I think that suggests a pretty rapid AI future where AI is going to be able to go from being useful at fairly mundane quick things to useful at much more complex things. Kind of doubling quickly like the meter paper suggests in just a few years time. I think if we come to the end of the year and AI agents are still kind of crap like they are now, that's definitely a bearish signal suggests that maybe AI is actually going to take much longer to have widespread labor impacts. I think it will still happen, but maybe it will take several decades as opposed to several years. And I think we'll be able to get some early signs on that in 2025.
(01:02:01)
I think similarly, we've seen a lot of increases in the skills of reasoning agents being able to do some complex math and science tasks. I think we don't yet really know how well those reasoning skills will scale. And I think, again, by the end of the year, if we haven't seen continued strong increases in science and math skills of AI, that suggests again that this is going to be a much slower road or maybe new innovations are needed that we don't have. But if we are continuing to see rapid increases, I would expect that would mean we would see even more rapid increases over the next few years. So actually I think 2025 is really a pretty key make-or-break year where we'll be seeing whether we'll be getting super fast scaling or whether scaling is actually going to just be more slow and steady. But I guess I think we probably know enough right now, at least to know one way or another, whether it's decades from now or maybe years from now, one way or another, AI is going to be a really big part of our life.
AI will transform society more than the internet
Jakub (01:03:13):
And before we close, any final thoughts to add to that? Anything on the interview as a whole, forecasting AI?
Peter (01:03:24):
Yeah, I mean, I think the main point I've been just trying to communicate is that there's a range of possibilities with AI. I don't think any one person knows exactly what's going to happen, but what we do know is that this is going to be a tremendously important technology. I think some people say AI is transformative, but the internet was transformative. But what I'm trying to say is I think AI will be much more transformative than the internet. And I'm not saying this as some hypester. I don't have a course to sell you. I don't work for a tech company. I'm saying this extremely literally, with all the precision of a forecaster, that whether it takes years or decades, one way or another, AI is going to be literally a very transformative technology. And that this means basically it's all of our responsibilities, not just policymakers, but really every single person, every single citizen, all of our responsibilities to be quite informed about what AI can do in our lives and what the benefits and opportunities are, but also what the risks are and how we can be really thoughtful about how this is being rolled out into society.
Jakub (01:04:41):
Great. That's a good place to close. And if people want to follow up on your work, where should they go on the internet?
Peter (01:04:52):
So I'm quite active on Twitter, or I guess they call it X now, whatever, but you could go to x.com/peterwildeford, W-I-L-D-E-F-O-R-D, and follow my tweets. I tweet a lot about AI.
(01:05:13)
Likewise, if you're interested in the analysis, the forecasting about AI and other things, I have a Substack as well at peterwildeford.substack.com. Yeah, definitely very excited to have everyone sort of check out my work and interested in providing more forecasts that people find valuable.
Jakub (01:05:36):
I have, in preparation for the interview, read a lot of the Substack posts, and I think people would really benefit from reading them. I think they're really thoughtful and fun and engaging to read, so I recommend that.
Peter (01:05:49):
Thanks. And I'm trying to just help people, like I said, keep track of trends in AI, understand what's going on, trying to break it down in a way that I think people would appreciate. And it's not just AI too. I also just wrote about the India Pakistan conflict. I've been following Ukraine and other geopolitical trends. So yeah, just trying to help people be more informed.
Jakub (01:06:13):
Awesome. Okay. Well, Peter, thank you so much for joining the podcast.
Peter (01:06:18):
Yeah, no problem. Thanks for having me. It was really great.
Share this post