Search Captions & Ask AI

AI in Sports – Wharton Professors Adi Wyner and Cade Massey | AI in Focus Series

November 10, 2023 / 27:27

This episode of the Analytics at Wharton podcast features discussions on artificial intelligence and its applications in sports analytics with guests Kade Massie and AI Wier. Topics include the differences between AI and statistics, the integration of data science in sports, and the future of player evaluation and injury prediction.

Kade Massie, a practice professor in operations, information, and decisions, discusses the challenges organizations face in adopting AI and machine learning models for decision-making in sports. He emphasizes the need for collaboration between data scientists and traditional decision-makers.

AI Wier, a professor of statistics and data science, explains the importance of understanding the differences between AI and statistics, particularly in the context of noisy data in sports. He highlights the significance of integrating various data sources for better decision-making.

The conversation also touches on the concept of algorithm aversion, where decision-makers are hesitant to trust models, and the role of uncertainty in model outputs. Both guests agree that the future of sports analytics will involve better biomechanics and improved player evaluation methods.

Overall, the episode provides valuable insights into the current state and future potential of AI in sports analytics, emphasizing the need for a blend of human expertise and advanced statistical methods.

TL;DR

Kade Massie and AI Wier discuss AI's role in sports analytics, focusing on statistics, decision-making, and future advancements in player evaluation and injury prediction.

Episode

27:27
00:00:00
welcome welcome to the next episode of
00:00:03
the analytics at Wharton AI at Wharton
00:00:06
podcast series on artificial
00:00:08
intelligence while I've enjoyed all the
00:00:10
episodes today's really a special one
00:00:12
for me not just because of my love of
00:00:13
sports but of course cuz I'm
00:00:15
interviewing two of my colleagues and
00:00:17
co-hosts of another show here on
00:00:18
SiriusXM Wharton Moneyball so it's my
00:00:21
honor to introduce my colleague Kade
00:00:23
Massie Kate is a practice professor in
00:00:25
our operations information and decisions
00:00:27
Department he teaches researchers and
00:00:29
consult on how to improve
00:00:30
decision-making in organizations and
00:00:32
probably the part you could not have
00:00:34
written it better than this thank you
00:00:35
Kate especially by blending models and
00:00:37
experts he's also the faculty director
00:00:40
of Wharton people lab faculty
00:00:41
co-director of the Wharton Sports
00:00:43
analytics and business initiative and of
00:00:45
course as I mentioned co-host of Wharton
00:00:47
Moneyball so Kade Welcome to our podcast
00:00:50
thanks Eric delighted to be here I'm
00:00:52
also joined by my longtime friend and
00:00:54
colleague in the statistics and data
00:00:55
science department AI weiner I really
00:00:58
think it's important to read ai's uh bio
00:01:00
here a short bio because of all the
00:01:02
different things he's doing around data
00:01:04
science statistics artificial
00:01:05
intelligence at the school ai's a
00:01:08
professor of statistics and data science
00:01:09
whose research has spanned many areas
00:01:11
including applied probability modeling
00:01:13
information Theory and machine learning
00:01:15
he's written many articles in
00:01:16
statistical methodology and also
00:01:18
applications including other episodes
00:01:20
like we've had just now Neuroscience
00:01:22
medicine climate science and of course
00:01:25
extensively in sports analytics we're
00:01:27
along with Cade he's the co-director of
00:01:29
our Sports Analytics business initiative
00:01:30
he also runs the Penn Sports research
00:01:32
seminar the summertime Moneyball Academy
00:01:35
and he's also one of my co-hosts of
00:01:37
Wharton Moneyball and he's the director
00:01:38
of the undergraduate program in
00:01:40
statistics and data science so AI
00:01:41
welcome to the podcast yes thank you
00:01:43
Eric great to be here and that completes
00:01:44
our 28 minutes of talking today but but
00:01:47
let's get serious so AI why don't we
00:01:49
start with the following and this is a
00:01:50
question I get all the time and I
00:01:52
imagine you do too what is AI and how
00:01:56
does it differ if any with what you and
00:01:59
I might call statistics well it is
00:02:02
actually a great great blending of the
00:02:04
two I mean statistics have been around a
00:02:05
long time um if we go back historically
00:02:08
AI was designed to try to figure out how
00:02:10
people think and build models that do
00:02:12
that and then sometime in the 90s people
00:02:15
realized that the smart way to do
00:02:17
artificial intelligence is not try to
00:02:19
reconstruct the thought process but just
00:02:21
get tons of data and this was a huge
00:02:24
breakthrough that I think it really
00:02:25
happened in the '90s in machine
00:02:26
translation and in machine speech
00:02:28
recognition and then eventually vision
00:02:30
and then eventually vision and so what
00:02:32
happened was is in instead of trying to
00:02:34
actually build machine learning and
00:02:37
artificial intelligence people
00:02:38
discovered that the right way to do that
00:02:40
was statistics now fast forward to today
00:02:42
we still have statistics and we have
00:02:43
machine learning and Ai and they are
00:02:45
still somewhat similar and also
00:02:47
different so I have a rule of thumb
00:02:49
about what kinds of problems are machine
00:02:51
learning and what kinds of problems are
00:02:53
statistics although you have to remember
00:02:55
there's an enormous um feedback and and
00:02:58
and intersection between the two and we
00:02:59
all use each other's methods and the way
00:03:01
I like to think about it is this what is
00:03:04
the difficulty in solving the problem
00:03:07
and if the difficulty is complexity so
00:03:09
for example Vision self-driving cars
00:03:12
image recognition um trying to take the
00:03:15
entirety of the web and use that to
00:03:17
answer a question like you think of
00:03:18
large language models that's AI it's
00:03:21
something that a human can do and now
00:03:23
you trying to get the machine to do it
00:03:25
statistics is all about what I would
00:03:27
think of as noisy problems we have a
00:03:30
colleague in in criminology who tries to
00:03:32
predict whether or not someone commits a
00:03:33
crime after they've been in jail for a
00:03:35
while recidivism well you're never going
00:03:38
to get a perfect prediction Sports is a
00:03:40
classic place where a lot of our
00:03:42
problems are statistics for example
00:03:45
trying to figure out you know what your
00:03:47
batting average is going to be next year
00:03:48
or things like that but also um but
00:03:51
other problems come up in both areas
00:03:53
which are common to both where where
00:03:55
there is a lot of signal um but also um
00:03:58
and and but you can handle them with
00:04:00
statistical methods so the way I
00:04:02
generally describe it is what we call
00:04:04
signal versus the noise and the Machine
00:04:06
learning problems have high signal low
00:04:08
noise and the statistics problem have
00:04:10
high noise and lower signal so Kate let
00:04:12
me ask you related to that so um whether
00:04:15
it's let's maybe one could argue that uh
00:04:18
statistics has gotten its footing in
00:04:21
companies and decision-making maybe you
00:04:22
could argue machine learning has because
00:04:24
people like predict things that can help
00:04:26
predict the future uh AI is kind of new
00:04:30
how do you see companies reacting to
00:04:32
these different methods and are we now
00:04:35
in such I'll call it an enlightened
00:04:37
explosion world like everyone's just
00:04:39
saying of course I have to adopt it or
00:04:41
is that like not the reality of
00:04:43
today well certainly organizations are
00:04:46
are looking at opportunities for it and
00:04:47
they're excited about it and and
00:04:49
certainly vendors who think they have a
00:04:52
a new application for it or selling the
00:04:55
potential of that but ultimately it
00:04:57
always comes back to is somebody going
00:04:59
to to rely on that algorithm or model
00:05:03
for a decision they have to use the D
00:05:05
thing it's it's one thing for it to spin
00:05:07
and do pretty things it's another for a
00:05:09
human to depend on it and what we've
00:05:12
seen reliably is that that's a pretty
00:05:15
steep hill to climb that people are
00:05:18
loathed to trust decisions with to to
00:05:21
models when models aren't perfect and in
00:05:24
many many applications models are
00:05:25
inevitably imperfect as soon as we see
00:05:27
them be imperfect we're an to lean on
00:05:30
them even if we know humans are
00:05:31
imperfect there's an asymmetry between
00:05:34
the penalties humans apply to models
00:05:37
that are imperfect versus humans that
00:05:39
are imperfect as soon as they see that
00:05:40
imperfect performance they're reluctant
00:05:43
they're they'd rather lean on a human
00:05:46
for you know if I'm going to use an
00:05:47
algorithm and kind of offload some of it
00:05:50
that has to be much more accurate than
00:05:53
in some sense we as fallible
00:05:55
humans that's right that's what that's
00:05:57
what we've observed we we along with
00:05:59
some colleagues at at at pen we've we've
00:06:01
called that algorithm aversion and again
00:06:04
it's not to say that people just
00:06:06
innately don't trust models we're happy
00:06:08
to to to play with models to lean on
00:06:10
models but if we see them performing
00:06:12
perfectly we're much harsher in our
00:06:14
treatment of them than we are humans so
00:06:16
AI I know you wanted to jump in here and
00:06:18
we'll let's get to you before we jump
00:06:19
into the main topic of today which is of
00:06:21
course our all our passion which is AI
00:06:23
and sports but please you had a followup
00:06:25
so I guess my followup to you Kate is
00:06:27
that I think um we might want to
00:06:28
privilege human versus a model when they
00:06:31
both make mistakes at approximately the
00:06:32
same rate but how much is that
00:06:34
disadvantage do we really trust we
00:06:36
really kill models even if they're
00:06:37
better than humans just because they
00:06:39
make
00:06:40
mistakes yeah so I can't speak to the
00:06:42
exact threshold where you might go back
00:06:44
to the model but where we've run
00:06:46
experiments we've manipulated exactly
00:06:48
that and and and people can observe that
00:06:50
the model outperforms but in let but
00:06:53
they hold models to a higher standard
00:06:54
there seems to be a model of a standard
00:06:56
of perfection for models that they don't
00:06:57
use when it comes to humans well AI
00:07:00
let's jump in first with you with kind
00:07:02
of the application of AI in sports
00:07:05
analytics so what do you see as you
00:07:07
mentioned kind of Statistics being the
00:07:10
field of I let me just see if I got this
00:07:12
right clearly AI is you're going to have
00:07:14
high low noise massive data statistics
00:07:18
are going to have typically smaller data
00:07:20
sets High noise which means you have to
00:07:21
rely more on you know mathematical
00:07:23
models for decisionmaking how do you see
00:07:25
Sports analytics today how much of it do
00:07:27
you see the use of statistics how much
00:07:29
do you see the use of AI is there some
00:07:31
combination what do you see happening
00:07:33
right now okay so certainly there's an
00:07:35
incredible amount of statistics and the
00:07:37
reason why statistics is being used so
00:07:39
much
00:07:40
today primarily as people see the value
00:07:42
in it I think that's been a giant sea
00:07:44
change over the years brought about by
00:07:46
the Moneyball Revolution Oakland a etc
00:07:49
etc people realized that using data to
00:07:52
make decisions is a great thing to do
00:07:54
and that data wasn't Advanced I mean
00:07:55
Bill James really revolutionized Sports
00:07:57
analytics with just counting and
00:07:59
percentages and clever ways of adding
00:08:01
and subtracting things not fancy so
00:08:03
statistics has certainly um become
00:08:06
essential to the operation of successful
00:08:08
sports teams and in lots of ways player
00:08:10
evaluations none of this is particularly
00:08:12
complex um there are of course machine
00:08:15
learning advances that have been made
00:08:17
that have been useful certainly the
00:08:18
tracking data and that's for decision
00:08:20
makings on something like we argue about
00:08:23
on our radio show all the time should we
00:08:24
have a robotic umpire we certainly have
00:08:27
the Hawkeye data in baseball we have all
00:08:28
this backing data from Sports vision and
00:08:30
sports View and that tracks where the
00:08:32
ball is and where the players are and
00:08:34
that leads to kind of mixture models um
00:08:37
statistics mixed with with sort of
00:08:39
machine learning which is these
00:08:41
massively giant tracking data terabytes
00:08:43
of of information that need to be
00:08:45
processed so you can still build
00:08:46
statistical models so there still high
00:08:49
noise in the sense that you can't
00:08:51
predict whether or not we're not able
00:08:53
yet to predict whether or not say a
00:08:54
running back is going to break through
00:08:56
and get a large gain um that's still
00:08:59
high noise but we have so much
00:09:01
information so we have to use kind of
00:09:02
machine likee models to to fit them so
00:09:04
that's what we're happening I'd still
00:09:05
see say that we're probably 90% in the
00:09:08
statistics domain although a lot of the
00:09:09
flashy stuff is coming out of the the
00:09:11
machine learning and the AI the things
00:09:13
on TV that where you everything is
00:09:15
labeled and you see all these great the
00:09:17
these great uh predictions they'll tell
00:09:19
you things like what's the probability
00:09:20
of a catch um that's kind of a mixture
00:09:22
of a statistics and a and a and an ml or
00:09:24
a machine learning model so Kade one of
00:09:26
the things you even talked about in your
00:09:27
own bio was the blend of models and
00:09:30
experts how do you see that blend
00:09:32
happening in the field let's talk about
00:09:34
sports since I know you do work with a
00:09:35
lot of sports teams how do you see that
00:09:37
blending of machine learning Ai and
00:09:40
human experts today what are some good
00:09:43
examples well it's sadly it's largely
00:09:46
uncomfortable that that that
00:09:48
historically these come from very
00:09:49
different communities and they don't
00:09:52
necessarily play well together and so
00:09:55
the organizations that actually are
00:09:56
blending them most successfully are are
00:10:00
are in some way forcing the groups to
00:10:02
work together because you know
00:10:03
traditional decision maker in NFL say
00:10:06
isn't super interested in dialing in the
00:10:08
computer science guy who just graduated
00:10:10
from MIT that's not a that's not the way
00:10:12
they usually make decisions the
00:10:13
organizations that get it well have
00:10:15
leadership to say we are going to do
00:10:16
this they don't make it one person's
00:10:19
model or another it's more the team's
00:10:21
model the sports that are furthest along
00:10:23
especially on the Personnel side are
00:10:26
baseball and baseball they they do have
00:10:28
models that are like the teams model
00:10:30
they they lean on them heavily when they
00:10:32
evaluate Personnel they've built those
00:10:34
things up over years these things are
00:10:36
not something someone writes down one
00:10:39
time and they're off and running they
00:10:40
tend to be highly iterative the modelers
00:10:43
get lots of input from the traditional
00:10:45
decision makers they get hypotheses from
00:10:47
the decision decision makers the
00:10:48
decision makers point out things the
00:10:50
models are missing these things are they
00:10:52
should be iterative and a dialogue
00:10:55
between the two sides that's tough to
00:10:57
pull off and so it's relative L rare So
00:11:00
kid let me just follow up with a
00:11:01
question on that you mentioned the idea
00:11:03
of kind of using whether it's humans or
00:11:05
Theory to come up with hypotheses that
00:11:07
are tested just you as a scientist how
00:11:10
important is that or can't you know
00:11:12
can't I just use Ai and machine learning
00:11:16
algorithms to just find patterns in the
00:11:18
data and that's what it is like why do I
00:11:20
I'm saying this in a factious way
00:11:21
because I have an answer but this is
00:11:22
about you guys not me why not why need
00:11:24
Theory why not just explore the data and
00:11:26
see what we find so ai's going to have a
00:11:29
deeper answer than I am for this because
00:11:31
it's very much in line with what he's
00:11:32
been talking about but but but sadly
00:11:35
sadly for some of us in many domains of
00:11:37
sports we just don't have the data to
00:11:39
support that kind of exploration so we
00:11:41
have to bring more structure to the
00:11:43
conversation we can't just set the
00:11:45
algorithm loose and see what it tells us
00:11:48
there are certain places where that can
00:11:49
happen but by and large we just don't
00:11:51
have enough data
00:11:54
for an algorithm to learn reliably so
00:11:57
for example I'm I'm off often worrying
00:11:59
about personnel and we just don't see
00:12:01
enough people there's a lot of variables
00:12:04
that you could lean on we don't have
00:12:05
that many observations we need to bring
00:12:07
a little structure we need that human
00:12:09
hypothesis in order to make traction
00:12:11
that's the way I would go about it I'm
00:12:12
curious how OD he's gonna talk about it
00:12:14
well I'll start with just simply
00:12:15
agreeing I mean we think we have a lot
00:12:17
of data because we have lots and lots
00:12:19
and lots of observations but it's not
00:12:20
nearly as many as you think you'd want
00:12:22
because you have so many variables as
00:12:24
well and a lot of times the data is not
00:12:26
new data it's just more and more data of
00:12:29
more or less the same thing and what you
00:12:30
need in statistics to do good
00:12:32
forecasting and good model building is
00:12:34
lots of independent data we really don't
00:12:36
have that so you think about the
00:12:37
incredible amounts of observations you
00:12:39
can take using tracking data it's
00:12:41
terabytes in a game that's but that's
00:12:43
still one game it's still only a couple
00:12:45
hundred plays and only one Victory right
00:12:47
so all of this is is highly correlated
00:12:50
which means we just don't have as much
00:12:51
data as you think so I'll start with
00:12:53
that the second thing is when you're
00:12:55
dealing with noisy models and these are
00:12:57
player performances are fundamentally
00:12:59
noisy meaning people you cannot look at
00:13:02
a player in the minor leagues or or
00:13:03
watch them swing or pitch or or run and
00:13:06
predict what they're going to do on the
00:13:07
field it's just not possible they're
00:13:08
human beings they're not they're not
00:13:10
robots which means that it the and this
00:13:13
is a technical word overfitting is easy
00:13:16
to do and if you throw a machine
00:13:17
learning AI model at a high noise not so
00:13:20
many independent observations you are
00:13:23
going to get it wrong this is a
00:13:26
fundamental area of research and this is
00:13:28
what we're doing and what I'm working on
00:13:29
with my grad students and my students to
00:13:31
figure out how to use Ai and statistics
00:13:34
in this exact setting to build good
00:13:36
models and I'm just it's just not
00:13:38
trivial you just can't throw it in and
00:13:40
crank it as if it's uh uh just
00:13:42
automatically run for you so we're here
00:13:44
on the AI at Wharton and analytics at
00:13:46
Wharton podcast series we're on our
00:13:48
episode on AI and sports this is Eric
00:13:50
bradow professor of marketing statistics
00:13:51
and data science here at the arton
00:13:53
school vice dean of analytics and I'm
00:13:55
here with my friends and colleagues Kade
00:13:56
Massie practice professor in the oid
00:13:58
department and AI weiner who's professor
00:14:01
of statistics and data science so Kade
00:14:03
let me ask you um I would think that a
00:14:07
lot of the work that maybe firms are
00:14:10
doing today is about collecting these
00:14:13
novel data sets whether it's you know we
00:14:15
have an episode on Neuroscience so maybe
00:14:17
it's brain data or maybe it's motion
00:14:20
tracking data that AI said or maybe it's
00:14:22
eye tracking data how much do firms
00:14:24
recognize that in some sense we don't
00:14:27
have the data we need and that we need
00:14:29
to kind of do you know whether it's
00:14:30
putting sensors on players or measuring
00:14:33
sleep or rest how much do firms realize
00:14:35
that in some sense it's better data
00:14:37
that's going to solve these problems not
00:14:39
necessarily more correlated data I think
00:14:42
there's pretty healthy appreciation of
00:14:43
that now of course there's variation the
00:14:46
teams teams vary in their quality of
00:14:48
ownership quality of management and not
00:14:50
everyone sees the opportunity there but
00:14:52
increasingly teams do see the
00:14:53
opportunity and of course some sports
00:14:55
are again farther ahead on this than
00:14:57
others I mean sport like baseball they
00:15:00
are way down the path and they because
00:15:02
they're so far down the path they are
00:15:04
getting quite inventive in the in the
00:15:06
kinds of data they're looking for in
00:15:07
search of an edge sports like football
00:15:10
and hockey there's much more variance
00:15:13
but the sophisticated teams in those
00:15:15
sports are in fact being creative and
00:15:18
one way to think about it we I think we
00:15:20
tend to think that there's one big model
00:15:22
sitting inside the firm somewhere that's
00:15:24
that's spitting out answers whether it's
00:15:25
play calls or who to draft but in fact
00:15:28
these organizations have a bunch of
00:15:30
little models and they're they're mostly
00:15:32
going out and solving one problem
00:15:33
getting one Insight adding one you know
00:15:36
measure to a player's evaluation using
00:15:39
different little models whatever they
00:15:41
can get their hands on it's much more
00:15:43
the the the propagation of these smaller
00:15:46
models and eventually they'll they'll be
00:15:47
integrated but right now it's the
00:15:49
propagation of smaller models All In
00:15:51
Search of small edges yeah the one thing
00:15:52
we hope at least I don't know maybe you
00:15:54
guys would disagree with this I'd love
00:15:55
your thoughts um there can be different
00:15:57
models for different purpos as a matter
00:15:59
of fact there should be probably um but
00:16:01
there should be a common data set so
00:16:03
could you talk about I know you work
00:16:05
with a lot of students how do you guys
00:16:07
let's say you want to solve a problem
00:16:08
like for example I know we're going to
00:16:09
be doing a high school sports
00:16:11
competition and you're going to doing
00:16:12
something around soccer how like how do
00:16:15
you collect a data set like suppose you
00:16:17
have like one data set which is motion
00:16:19
tracking and you have another data set
00:16:21
which might be training and another data
00:16:22
set like how do how do you even think
00:16:24
about integrating these disparate data
00:16:27
sets together when you're trying to
00:16:28
solve a problem so interesting uh I
00:16:31
actually would call that data science so
00:16:33
the idea that a now wait we've got
00:16:35
statistics but and now we've got machine
00:16:37
learning we've got AI now you want to
00:16:39
add data science yeah so we our
00:16:40
department has been the statistics
00:16:42
department for many years that's that
00:16:43
was my appointment and all of a sudden I
00:16:44
got an extra an extra title on
00:16:46
statistics and data science and uh
00:16:49
people still wonder what what is it and
00:16:51
what exactly does how is it different
00:16:53
from statistics and obviously these
00:16:55
things overlap and I would say that the
00:16:56
new um the the new direction is that
00:16:59
because we have so much data it's so
00:17:01
large and it needs to be integrated and
00:17:03
managed and um curated if you will uh
00:17:06
wrangled sometimes people is the word
00:17:08
they use that's that task is data
00:17:10
science and that pushes Us in the
00:17:12
direction of Cs computer science
00:17:14
engineering and that's a hard task um I
00:17:18
personally don't work in it our my
00:17:20
colleagues in engineering and Cs and
00:17:21
some of my colleagues and statistics do
00:17:23
more of that it's a challenge and I
00:17:25
would say that teams invest heavily
00:17:28
in the kinds of personnel who can do
00:17:31
those things and it's a big it's a big
00:17:33
Direction and it's expensive because
00:17:35
they're highly in demand in lots of
00:17:37
areas so having people who can who can
00:17:40
collect data integrate them make
00:17:42
dashboards software this is not modeling
00:17:45
this is not statistics it's not even
00:17:47
machine learning or AI it's just the
00:17:49
support structure expensive and and
00:17:51
needs to be done we don't even have a
00:17:53
lot of the the basic data set so one of
00:17:55
the things that we in the in the
00:17:56
University land we have to deal with
00:17:59
only what's public and sometimes that's
00:18:01
hard and there's often very good data
00:18:03
with the individual teams and some
00:18:06
consulting firms have it and you can of
00:18:08
course buy it but a lot of that data set
00:18:09
is data cost is outside of our our
00:18:12
budgets um and so this is always a
00:18:14
challenge and we're we're working on it
00:18:16
so Kate I know you have some thoughts on
00:18:17
this as
00:18:18
well I just wanted to emphasize AI got
00:18:21
there eventually but I wanted to
00:18:22
emphasize that there's the backend side
00:18:25
of that and there is the front end side
00:18:26
as well and as analy list we tend to
00:18:29
lose sight of both of those things so
00:18:31
the computer science part of integrating
00:18:32
these data sets building the the
00:18:34
plumbing and the infrastructure is vital
00:18:36
you can't do anything and often the
00:18:37
first hire in this space for an
00:18:39
organization going down this road is on
00:18:41
the Cs side and then Audi eventually
00:18:43
talked about like the dashboard the
00:18:45
front end these organizations all have
00:18:47
these they do have these database
00:18:49
systems that people drop reports into
00:18:52
and people pull reports out of and it is
00:18:54
the single primary way that that the non
00:18:58
analyst interact with the data so it's
00:19:00
that front end that dashboard and
00:19:02
related issues on the front end is vital
00:19:05
and that's development I mean this is
00:19:07
again this is not this is not the
00:19:09
traditional data science that we think
00:19:10
about and yet it's vital to the way data
00:19:13
are used in the organizations so AI let
00:19:15
me start with you um when you think
00:19:17
about the most sophisticated use or
00:19:20
interesting use for you of
00:19:23
AI now I guess I have to say AI machine
00:19:25
learning statistics or data science
00:19:27
today
00:19:28
what are what is it for you what's the
00:19:30
one that you find the most interesting
00:19:33
and if you want you could rely on what's
00:19:35
interesting to you is what you and your
00:19:36
students are doing research on what's
00:19:38
most interesting to you that's going on
00:19:39
today and then Kate I'd like to ask you
00:19:41
if you know after that you know if we're
00:19:43
sitting here in 10 years what do you see
00:19:45
us talking about well we'll still be on
00:19:47
our Wharton Moneyball show in 10 years
00:19:49
and we'll just talk about it there but
00:19:50
AI what what's the most sophisticated
00:19:52
application you see today well there's a
00:19:54
lot of really sophisticated ones that
00:19:55
I'm not sure have really borne fruit yet
00:19:58
so so there have been a bunch of
00:19:59
competitions in football that have
00:20:01
released tracking data and some of that
00:20:04
information has been used and
00:20:05
Incorporated by teams by by um ESPN or
00:20:09
or the NFL in what they call Next
00:20:11
Generation stats to provide you the kind
00:20:13
of information you couldn't have dreamed
00:20:14
of you know years ago so one example
00:20:17
would be um you see a running back has
00:20:19
handed a ball and you want to know well
00:20:21
on average what would an average running
00:20:23
back do in that situation and then you
00:20:24
can compare what the actual running back
00:20:26
in that situation did and you and you
00:20:28
post that that Delta that differential
00:20:30
and that's an example of either poor
00:20:32
performance or great performance but you
00:20:34
wonder whether or not that model was
00:20:35
really high quality and and and some of
00:20:38
our students have had real have looked
00:20:40
at this and and and and it's been
00:20:41
interesting but I don't think we're
00:20:42
there yet to see that these things have
00:20:44
really provided value because the data
00:20:46
is is hard to make good sense out of it
00:20:49
so that's one kind of problem um the
00:20:50
problem that I've been working with with
00:20:52
our our students is actually on
00:20:53
uncertainty so a model produces a a
00:20:56
forecast all right well let me get right
00:20:57
to it then I was going we we'll probably
00:20:59
talk about this on our radio show
00:21:00
tomorrow but let's talk about
00:21:01
uncertainty for a second the New York
00:21:03
Giants are fourth in one from the 17
00:21:06
yard line in the game yesterday and the
00:21:08
question is should they kick the field
00:21:09
goal forget that they missed it there
00:21:11
has to be a lot of people like you
00:21:13
always go fourth in one and you've shown
00:21:15
that's not true so could you talk about
00:21:17
briefly the role of uncertainty in these
00:21:20
models and that most people think it's a
00:21:22
yes no decision when it's really not
00:21:25
okay so there is a couple of issues here
00:21:27
so when you build a model the model will
00:21:29
give you what we call a point estimate
00:21:31
and the point estimate will tell you for
00:21:33
example the probability of winning the
00:21:35
game if you go for it and the
00:21:36
probability of winning the game if you
00:21:38
don't go for it and then you can easily
00:21:40
just pick the one that's higher now the
00:21:43
problem with that is we don't really
00:21:44
know the probability we've had to
00:21:45
estimate it using a model and there is a
00:21:48
whole bunch of actual probabilities that
00:21:51
are all kind of equally supported by the
00:21:53
data and what we want to do is take all
00:21:55
that information and say wait a minute
00:21:57
if I had gotten a different set of games
00:21:59
different Universe of of of information
00:22:02
and tried to build the same model with
00:22:04
that would I've come up with the same
00:22:06
decision and if every single time I do
00:22:08
that it's always go for it then you can
00:22:10
be clear that that's a a firm decision
00:22:12
if on the other hand only 55% of the
00:22:15
historical somewhat what we call
00:22:17
simulations the word actually is called
00:22:19
bootstraps of the of the his historical
00:22:21
data produce one result then you have to
00:22:23
throw up your hands and say I don't
00:22:25
really know and you have to communicate
00:22:26
to that to the coaches on on the field
00:22:28
to say you know we don't really know you
00:22:31
decide so Kade let me ask you since I
00:22:33
believe in some of your research you've
00:22:35
covered the topic and then we'll get to
00:22:36
the the future in 10 years we you've
00:22:38
covered the topic or you've done some
00:22:39
research on you talked about algorithm
00:22:41
aversion but even how about risk
00:22:43
aversion so how does uncertainty play
00:22:46
into you know you're asking a human
00:22:48
decision maker to trust a model when
00:22:51
there's massive potential uncertainty he
00:22:54
she they see the situation in the game
00:22:57
and then decide hey you know what my
00:22:59
read of this is the uncertainty does not
00:23:02
swamp out what I've historically done
00:23:04
how do decision makers think about risk
00:23:06
aversion in using models well I think
00:23:10
the primary role of risk aversion there
00:23:13
is is to there's an asymmetry in the way
00:23:17
you you get it wrong if you get it wrong
00:23:20
doing the conventional thing the
00:23:22
punishment is less harsh than if you get
00:23:24
it wrong doing the unconventional thing
00:23:25
that's that's the strongest Dynamic
00:23:27
there
00:23:28
that affects especially coaches using
00:23:30
these models or not in decision- making
00:23:32
on the field but what I love about what
00:23:35
um Audie and Ryan are doing in this
00:23:37
research is they're introducing this
00:23:39
idea to model output that we've been
00:23:42
talking about for humans for a long time
00:23:44
people need to know what they know and
00:23:46
what they don't know they need to know
00:23:48
when they're sure and when they're less
00:23:49
sure and they need to be able to say and
00:23:52
own and explain as part of my
00:23:54
recommendation this is one with high
00:23:56
confidence this is one moderate
00:23:57
confidence or this this is one with
00:23:58
light confidence we've talked about that
00:23:59
with humans for decades and now Audi and
00:24:03
Ryan are saying hey we should do the
00:24:05
same thing with our models and it's
00:24:07
vital because those guys on the field
00:24:08
are going to be incorporating whatever
00:24:10
comes out of the model with other
00:24:12
factors and we as analysts have to
00:24:14
recognize there's always other factors
00:24:15
the models rarely if ever have every
00:24:18
possible consideration so there's always
00:24:20
other factors and so it does depend on
00:24:22
how sure the model is if the model is
00:24:24
absolutely certain then it's going to
00:24:26
swamp other factors but it's not always
00:24:27
is going to be absolutely certain and we
00:24:29
haven't talked about models in these
00:24:30
terms in the in the past it's fantastic
00:24:33
fantastic new development from these
00:24:35
guys so in the last minute or so so AI
00:24:38
tell me uh if we were sitting here 10
00:24:40
years from now what have you been
00:24:42
working on and what do you think the
00:24:43
field of AI and machine learning in
00:24:46
sports has done over the next 10 years
00:24:48
or if we're sitting here 10 years from
00:24:49
now over the past 10 years what are the
00:24:51
big new advances well I'm fairly certain
00:24:53
that we're going to have unbelievable
00:24:54
Graphics we're going to have great
00:24:56
statistics that are going to be computed
00:24:57
on the fly I think when our broadcast
00:25:00
experience will be really different um
00:25:02
I'm hoping that we've made progress on
00:25:04
some of the really big questions which
00:25:07
are still amazingly unanswered which is
00:25:09
injuries we have really no clue how to
00:25:11
predict an injury although there are
00:25:12
lots of startups trying to offer that
00:25:14
that that product if you will even in
00:25:16
it's infant stage to to teams because we
00:25:19
just really have a hard time with that
00:25:20
um I think player evaluation and the
00:25:22
complex sports will become much better
00:25:25
really good at it in baseball there's
00:25:27
still some things to work on I'm working
00:25:28
on them can't can't keep keep my hands
00:25:30
off of it but I think in football and in
00:25:32
basketball and in soccer um you'll see
00:25:35
much better ways of evaluating a players
00:25:38
building rosters and this is going to
00:25:40
become necessary every team's going to
00:25:42
have to have it it'll be Baseline right
00:25:44
now in some sports you don't have to
00:25:45
have anyone and you're not automatically
00:25:47
behind ex except in baseball I think
00:25:49
it's going to be everywhere every team
00:25:51
will have a substantial uh staff doing
00:25:53
the routine things that just need to get
00:25:55
done um that's the extent of my
00:25:57
imagination I'm I'm curious to know what
00:25:59
Kate has to say me too well I on the
00:26:02
same page with you think the newest
00:26:04
Frontier is biomechanics and the
00:26:06
greatest opportunity there is in injury
00:26:07
reductions that's the biggest
00:26:09
development we're gonna see the steepest
00:26:11
development will be in biome mechanics
00:26:13
over the next five years um the one that
00:26:16
I hope I I agree with AI that player
00:26:18
evaluation is going to get better I
00:26:19
think in particular the reason
00:26:21
baseball's always been ahead is that we
00:26:23
can kind of just add up the players in a
00:26:26
linear model and get a reasonal output
00:26:28
for the for the team we know that we
00:26:30
suspect there are interactions on
00:26:33
basketball courts football fields soccer
00:26:35
pitches and we don't have a great way of
00:26:36
modeling those interactions right now
00:26:39
that means there are players that we
00:26:41
think are doing more than they actually
00:26:42
are to to advance the team success and
00:26:45
importantly there are players who are
00:26:46
making very important contributions that
00:26:47
are being underappreciated right now and
00:26:49
I'm hopeful I believe that the models
00:26:51
will get better at identifying those in
00:26:53
the next five maybe 10
00:26:55
years well I'd like to thank my
00:26:58
colleagues friends Wharton Moneyball
00:27:00
co-hosts uh Kade Massie practice
00:27:02
Professor here at oid also the faculty
00:27:05
director of Wharton people lab faculty
00:27:07
co-director of Wasabi and co-host of
00:27:09
Wharton Moneyball and my colleague AI
00:27:11
Wier professor of statistics and data
00:27:13
science and while he's running
00:27:14
everything we're doing here in sports
00:27:16
and statistics I'd like to thank both Ai
00:27:18
and Cade for joining me for this episode
00:27:19
on AI and
00:27:25
sports

Badges

This episode stands out for the following:

  • 70
    Best concept / idea
  • 60
    Best overall

Episode Highlights

  • Introduction of Kade Massie
    Kade Massie, a practice professor, joins the podcast to discuss AI and sports analytics.
    “It's my honor to introduce my colleague Kade Massie.”
    @ 00m 21s
    November 10, 2023
  • AI vs Statistics
    A deep dive into how AI and statistics differ and intersect in decision-making.
    “We still have statistics and we have machine learning and AI.”
    @ 02m 42s
    November 10, 2023
  • Algorithm Aversion
    Exploring why humans are reluctant to trust imperfect models over their own judgments.
    “There's an asymmetry between the penalties humans apply to models versus humans.”
    @ 05m 34s
    November 10, 2023
  • The Importance of Data Integration
    AI emphasizes the critical role of backend data integration in AI applications.
    “You can't do anything without the computer science part.”
    @ 18m 31s
    November 10, 2023
  • Understanding Uncertainty in Models
    A discussion on how uncertainty affects decision-making in sports analytics.
    “We don't really know the probability; we estimate it using a model.”
    @ 21m 44s
    November 10, 2023
  • Future of AI in Sports
    Experts predict significant advancements in biomechanics and player evaluation in the next decade.
    “Injury reductions will be the biggest development we’re gonna see.”
    @ 26m 07s
    November 10, 2023

Episode Quotes

  • AI was designed to try to figure out how people think.
    AI in Sports – Wharton Professors Adi Wyner and Cade Massey | AI in Focus Series
  • There's an asymmetry between the penalties humans apply to models versus humans.
    AI in Sports – Wharton Professors Adi Wyner and Cade Massey | AI in Focus Series
  • We have to use kind of machine learning models to fit them.
    AI in Sports – Wharton Professors Adi Wyner and Cade Massey | AI in Focus Series
  • AI got there eventually, but there's the backend side too.
    AI in Sports – Wharton Professors Adi Wyner and Cade Massey | AI in Focus Series
  • We don't really know the probability; we estimate it using a model.
    AI in Sports – Wharton Professors Adi Wyner and Cade Massey | AI in Focus Series
  • Injury reductions will be the biggest development we’re gonna see.
    AI in Sports – Wharton Professors Adi Wyner and Cade Massey | AI in Focus Series

Key Moments

  • AI and Statistics02:02
  • Algorithm Aversion05:34
  • Sports Analytics Discussion09:02
  • Data Integration18:21
  • Model Uncertainty21:44
  • Risk Aversion23:20
  • Future Predictions26:07

Words per Minute Over Time

Vibes Breakdown

Related Episodes

NBA Shockwaves, Why the Chiefs Still Rank No.1, and the Power of Data
December 01, 2025
Captions not detected. You can watch the video, but not search it. If you think this is an error, contact support.
01:00:01
NBA Shockwaves, Why the Chiefs Still Rank No.1, and the Power of Data
How AI and Analytics Are Changing Quarterback Evaluation and NFL Outcomes
January 08, 2026
Captions not detected. You can watch the video, but not search it. If you think this is an error, contact support.
01:07:55
How AI and Analytics Are Changing Quarterback Evaluation and NFL Outcomes
NBA Analytics, Tanking, and the Future of Team Building
February 19, 2026
Captions not detected. You can watch the video, but not search it. If you think this is an error, contact support.
01:04:12
NBA Analytics, Tanking, and the Future of Team Building
How the NFL Uses Data to Shape Rules and Create New Metrics
February 06, 2026
Captions not detected. You can watch the video, but not search it. If you think this is an error, contact support.
01:00:07
How the NFL Uses Data to Shape Rules and Create New Metrics
Inside College Football’s Data-Driven Evolution and Decision-Making
January 22, 2026
Captions not detected. You can watch the video, but not search it. If you think this is an error, contact support.
01:10:36
Inside College Football’s Data-Driven Evolution and Decision-Making
Brandon Copeland on How NIL Is Reshaping the Power Structure in College Sports
August 08, 2025
Captions not detected. You can watch the video, but not search it. If you think this is an error, contact support.
52:32
Brandon Copeland on How NIL Is Reshaping the Power Structure in College Sports
The Math Behind Sports Rankings and Golf Analytics
May 07, 2026
Captions not detected. You can watch the video, but not search it. If you think this is an error, contact support.
01:08:01
The Math Behind Sports Rankings and Golf Analytics
Baseball Analytics, NFL Parity, and College Football Playoff Odds
November 16, 2025
Captions not detected. You can watch the video, but not search it. If you think this is an error, contact support.
01:01:01
Baseball Analytics, NFL Parity, and College Football Playoff Odds
What Is the Future of AI?
November 10, 2023
Captions not detected. You can watch the video, but not search it. If you think this is an error, contact support.
27:13
What Is the Future of AI?
NBA Playoff Analytics, Victor Wembanyama, and the Hot Hand Debate
May 20, 2026
Captions not detected. You can watch the video, but not search it. If you think this is an error, contact support.
01:03:03
NBA Playoff Analytics, Victor Wembanyama, and the Hot Hand Debate
How Analytics Are Changing the Game in College Football
September 16, 2025
Captions not detected. You can watch the video, but not search it. If you think this is an error, contact support.
01:00:12
How Analytics Are Changing the Game in College Football
When Analytics Meet Chaos in Football Playoffs
January 15, 2026
Captions not detected. You can watch the video, but not search it. If you think this is an error, contact support.
01:10:28
When Analytics Meet Chaos in Football Playoffs