StatsBomb: Advanced Football Analytics Through An Interactive Visualisation Platform

STATSBOMB is a UK-based football analytics and data visualisation company introducing common data analytics practices seen in business and tech to the world of football analytics. Through their recently launched (February 2019) STATSBOMB IQ data visualisation platform they offer immediate accessibility to valuable football insights from all major leagues and players across the globe.

The company was founded in January 2017, after self-described data geek Ted Knutson - now CEO and co-founder of STATSBOMB - traded a decade in the sports betting industry to partner with Charlotte Randall - Chief Operating Officer - and “produce the best possible analytic toolset for football clubs to use in player recruitment, team analysis, and opposition scouting”. What started as a blog sharing ideas about applied statistics in football turned into a reputable business collecting vast amounts of football data and offering an interactive visualisation platform enabling them to establish a global customer based including major clubs, federations, media, broadcasters and gambling organisations. In their ambition to establish themselves as an industry leader, STATSBOMB has recently acquired Egypt-based sports data collection company ArqamFC, gathering over 5,000 data points per match. Ted Knutson claimed that this move will allow them to offer double the amount of data points than any other provider.

STATSBOMB’s new data visualisation platform STATSBOMB IQ is the latest pioneering move by the company. Their dashboards, charts and graphs follow a similar aesthetic, clarity and data blending to those displayed by Tableau, possibly the largest data visualisation package in tech. While most, if not all, charts come already built out-of-the-box, their interactivity and filtering tools allow for sufficient customization to answer a wide range of analytical questions.

Salah’s 2018/19 STATSBOMB Profile

Salah’s 2018/19 STATSBOMB Profile

Messi’s 2018/19 STATSBOMB Profile

Messi’s 2018/19 STATSBOMB Profile

The platform has an outstanding processing performance when switching between the various sections and quickly display vast amounts of data on the screen. From player radars to shot maps, shot distributions, defensive activity, xG trendlines, corner maps and even player comparison showing similarities or complementary skill sets, STATSBOMB IQ is a reliable and robust tool offering immediate access to a complete picture of the latest football data within the click of a button.

Barcelona - Liverpool - 2019-05-01.png
Liverpool - Barcelona - 2019-05-07.png

The company also offers consultancy services to ease users into their data tools and provide them with the right assets to navigate their platform. This assistance when interpreting their large dataset - they collect more than twice the events per match than their competitors - is key in order to make their service digestible. However, the easy navigation through the clearly defined themes makes this task quick to grasp. Some of these themes include:

  • Pressure: analysing how players and teams press and how they perform under pressure

  • Shooting: including the location of attacking and defending players to provide both attacking and shot defending insights.

  • Goalkeeping: detailed actions down to goalkeeper positioning and movements that can be tied to the insights of gathered from the quality of the shot.

Cristiano Ronaldo Serie A 2018_2019.png
Lionel Messi La Liga 2018_2019.png

While the company does not intend to replace videoanalysis, it does emphase on the compatibility of their data visualisation features to reduce the time spent by analysts and coached reviewing player and team footage during performance evaluations. By spotting the right patterns and trends in the data, a more focused approach to videoanalysis can be adopted that will narrow down the areas to further investigate. One thing is certain, their stunning data visualisations bring a refreshing approach to football analytics providing invaluable insights and introducing tools to the field of applied sports analytics that are closer aligned to today’s available technologies.

StatsBomb IQ Platform.jpg

Performance Indicators in Football

Micheal Hughes et al discussed in 2012 in their article "Moneyball and soccer - an analysis of the key performance indicators of elite male soccer players by position", how team sports like football offer an ideal scope for analysis thanks to the numerous factors and combinations, from individual to teams, that can be used to identify performance influencers.


The article suggests that, in a sport like football, in order for a team to be successful, each player must effectively undertake a specific role and a set of functions based on the position the play in on the field. Through a study carried out with 12 experts and 51 sport science students, they aimed to identify which are the most common performance indicators that should be evaluated in a player's performance based on their playing profile. They started by defining the following playing positions in football:

  • Goalkeeper
  • Full Back
  • Centre Back
  • Holding Midfilder
  • Attacking Midfilder
  • Wide Midfielder
  • Strikers

Each performance indicator identified by position would be then categorized into the following 5 categories:

  • Physiological
  • Tactical
  • Technical - Defensive
  • Technical - Attacking
  • Psychological

Through group discussions between the experts and the level 3 sport scientist, they came up with the following traits required for each of the above positions.

Source: Moneyball and soccer by Michael Hughes et al (2012)

Source: Moneyball and soccer by Michael Hughes et al (2012)

The study identified that most performance indicators of outfield players were the same across position, with only the order of priority of each PI varying by position. Only goalkeepers had a different set of PIs than any other position. While these classifications of skills by position were done in a subjective method (ie. group discussion), it is a good first step towards the creation of techno-tactical profiles based on the players position and functions on the field, as pointed out by Dufour in 1993 in his book 'Computer-assisted scouting in soccer'. The above table provides a framework in which coaches and analyst can further evaluate the performance of players in relation to their position. However, tactics and coaching styles or preferences may cause the order of priority of each PI within each category to vary by team. The article also suggests that a qualitative way of measuring the level of each performance indicator should be used to evaluate a particular player.

The above suggests that positions may play a key role when assessing performance in footbal. From a quantitative perspective, when analysing the performance indicators to determine success or failure, or even to establish a benchmark to which to aim for, there are several metrics an analyst will look to gather through notational analysis:


  • Shooting game
    • Total number of goals
    • Total number of shots
    • Total number of shots on target
    • Total shot to goal scoring rate (%)
    • Total shot on target to goal scoring rate (%)
    • Shots to goal ratio
    • Shots on target to goal ratio
    • Total number of shots by shooting position (ie. inside the box)
    • Total number of shots by shot type (ie. header, set piece, right foot, etc.)
    • xG (read more)
  • Passing game
    • Total number of passes
    • Total pass completion rate (%)
    • Total number of short passes (under X metres away)
    • Total short pass completion rate (%)
    • Total number of long passes (over X metres away)
    • Total long pass completion rate (%)
    • Total number of passes above the ground
    • Total chip/cross pass completion rate (%)
    • Total number of passes into a particular zone (ie. 6 yard box)
    • Total zone pass completion rate (%)
    • Pass to Goal ratio
    • Total number of unsuccessful passes leading to turnovers (ie. interceptions)
    • Total pass turnover rate (%)
  • Defensive game
    • Total number tackles
    • Total number of tackles won
    • Total tackle success rate (%)
    • Total number of tackles in the defensive third zone
    • Total number of tackles won in the defensive third zone
    • Total number of fouls conceded
    • Total number of fouls conceded leading to goals conceded (after X minutes of play without possession)
    • Total number of pass interceptions won
    • Total number of possession turnovers won


  • Attacking
    • Total number of set pieces
    • Total number of attacking corners
    • Total number of free-kicks (on the attacking third zone)
    • Total number of counterattacks (ie. based on X number of passes between possession start in own half to shot)
    • Average duration of attacking play (from possession start to shot)
    • Average number of passes per goal
  • Possession
    • Total percentage of match possession (%)
    • Total percentage of match possession in opposition's half
    • Total percentage of match possession in own half
    • Total number of possessions
    • Total number of non-shooting turnovers
    • Ratio of possessions to goals
    • Total number of passes per possession
    • Total number of long passes per possession
    • Total number of short passes per possession
  • Defensive
    • Total number of clearances
    • Total number of offsides by opponent team
    • Total number of corners conceded
    • Total number of shots conceded
    • Total number of opposition's passes in defensive third zone
    • Total number of opposition's possessions entering the defensive third zone
    • Average duration of opposition's possession

It is important to note that teams may adapt both their tactics and style of play based of the various circumstances they face in a game. For example, a team scoring a winning goal in the last 10 minutes may chose to give up possession in order to sit back in their defensive third during the remaining of the game. When using quantitative analysis to determine the success or failure again the performance indicator, it is important to take context into consideration for a more complete and accurate analysis.

Notational analysis: a synonym of today's performance analysis

While motion analysis and biomechanics constitute important areas in performance analysis, one of the most popular and fundamental pieces of performance analysis in sport is the use of notational analysis. Notational analysis is the identification and analysis of critical patterns and events in a performance that lead to a successful outcome. Hughes (2004) defined notational analysis as "a procedure that could be used in any discipline that requires assessment and analysis of performance". The information used for notational analysis is usually gathered by observing a team's performance in a competitive environment. By notating numerous events that take place on the pitch, such as striker positioning, defenders' tackle success rate or midfielders pass completion rate, an analyst can identify strengths and weaknesses and provide these results to coaches who then use them to adapt training sessions or share accurate feedback with players and the entire team.

The importance of notational analysis comes from the limited recalling ability that coaches, as human beings, have when remembering specifics of the performance of their teams, and how these can be biased by their beliefs and other motives. As Hughes and Franks described, by receiving objective data of what happened during a game, a coach can make a more informed decision by enhancing his or her abilities to accurately assess the events of a game and improving the quality of feedback he is able to provide to the players. A big miss by a striker might be recalled by coaches and other players more vividly than the same's striker effective positioning or successful dribbling in the same game. At a professional level, we often hear pundits and fans rate a player's performance in a game based on a small number of noticeable actions that took place, such as a missed penalty or a defender's mistake that led to a one-on-one chance by the opposition team. However, through notational analysis, a more complete view of that player's performance may provide a more accurate perspective on the players contribution in the game and inform any future decisions towards that player, such as training structure or upcoming match presence.

Different teams in different sports will define their own frameworks of performance indicators that allows them to identify the areas in the game they are most interested in evaluating. This means that there is a wide range of information that is captured today in notational analysis depending on the environment the analyst is working in. This is in part due to the lack of a common set of performance indicators being identified as the key to sporting success, particularly in team sports where it is practically impossible to account for every single events that could lead to winning a match. A football team may consider percentage of shots on target, possession percentage and pass completion rate to be their performance indicators to benchmark themselves against for a game, while a different team in the same sport may want to consider possession percentage on the opposition's last third, defensive tackles won and total number of shots. As Hughes stated in 2011, while all these may be considered valid information to collect, the lack of a common framework across sport may be slowing down the research and analysis to develop notational analysis further.

There are certain challenges in notational analysis, particularly when it comes to live events. A single analyst notating events and patterns in real time may be subjected to human error or miss certain actions. This is why most sport statistics companies and elite sporting organisations employ several analysts to collect the same performance indicators on a live game, allowing to compare notated statistics between analysts with the purpose of improving the accuracy of the data collected. Another challenge of the notational analysis process is subjectivity, were events notated that have a certain degree of ambiguity may be captured differently by different analysts. While notational analysis aims to add objectivity when evaluating a team's performance by quantifying the events, it is possible that the definition of such events may change depending on the interpretation the analyst capturing the event has on that action.

During the last two decades, a large number of new technologies have developed the methods and effectiveness of notational analysis in sport. While traditional analyst often used a pen and a notepad to notate all the various events they considered relevant, technologies like Opta, Dartfish or Sportscode have become a central asset for notational analysts in the industry. The use of a video camera and a video analysis software can now provide analyst with a wide range of features and tools to collect as much information as they require to assess performance against specific performance indicators.


Performance Indicators in Rugby Union

In 2012, Michael T Hughes, Michal M Hughes, Jason Williams, Nic James, Goran Vuckovic and Duncan Locke wrote an insightful academic journal discussing the performance indicators in rugby union during the 2011 World Cup. They gathered various materials from professional analysts working for coaches and player at the World Cup event, and verified the reliability and accuracy of their data against video footage from different matches.



This research study analyses the influence of the following key performance indicators in the final outcome of a game:

Scoring Indicators: 

  • Points scored

    • Total points scored in WWC 2011
    • Points scored per game
    • Points scored agains Tier A teams
    • Points scored per game against Tier A teams
  • Tries scored

    • Total tries scored in WWC 2011
    • Tries scored per game
    • Tries scored from set pieces
    • Percentage of tries scored from set piece
    • Tries scored from set pieces per game
    • Tries scored from broken play
    • Percentage of tries scored from broken play
    • Tries scored from broken play per game

Quality Indicators:

  • Total Possession - Times and Productivity

    • Minutes that ball is in play in the match
    • Rest minutes in the match
    • Minutes with possession in the match
    • Percentage of time with possession
    • Number of possessions in the match
    • Minutes per possession
    • Minutes of possession per point scored
    • Number of possessions per point scored
    • Minutes of possession per try scored
    • Number of possessions per try scored
    • Total number of line breaks
    • Total number of line breaks per game
    • Minutes of possession per line break
    • Number of possessions per line break
    • Total number of set piece line breaks
    • Total number of set piece line breaks per game
    • Percentage of set piece line breaks
    • Total number of broken play line breaks
    • Total number of broken play line breaks per game
    • Percentage of broken play line breaks
    • Number of phases in the match
    • Percentage of phases per possession
    • Attacking penalties won
  • Attacking Possession

    • Number of possessions in opposition's 22 line
    • Number of converted possessions in opposition's 22 line
    • Percentage of converted possessions in opposition's 22 line
    • Number of points from opposition's 22 line
    • Number of points from opposition's 22 line per game
    • Number of points per possession in opposition's 22 line
  • Kicking game

    • Total number of kicks at goal
    • Total number of kicks converted
    • Percentage of kicks converted
    • Penalties conceded

While these key performance indicators of a rugby union game or tournament can be useful to summarize the some elements of a team's performance, what M. Hughes et al (2012) found was the there was little correlation between each individual metric, or set of metrics, with the final outcome of the World Cup 2011 tournament. For example, France was identified as one of the worst teams in most of these metrics, though they were the runners-up of the tournament.


The paper also touches on the challenges individual player performance analysis in rugby union. Due to the nature of the sport, a specific position on the field will require its own set of performance indicators. The study suggests to analyse an individuals performance against common key performance indicators and use that individual's performance profile to run intra-position comparisons (Hughes et al, 2012). This also leads to the creation of position profiles, where strengths and weaknesses of players playing in each position can be identified. It is also suggested that the individual player profiles should be based in the context of the team's profile as well as the opposition team's strengths and weaknesses, as these elements will impact a player's performance profiling.

Similarly to most team sports, randomness and luck can play a big part in the final outcome of a rugby union match. Therefore, predicting the performance of a team based on a few data points might not be enough to correlate it to the final performance achieved by that team. There are many complex interactions that occur during a rugby union game between teammates and oppositions which are difficult to account for through today's available statistics. However, studies like the one carried out by Hughes et al (2012) are another step towards narrowing down the best procedures to follow to successfully apply analytics to rugby performance predictions and team sports in general.

The effect of GDPR in sports performance analysis

On 25th May 2018, a new Global Data Protection Regulation launched in the EU, significantly improving the control European citizens have on their personal data collected by third parties. While GDPR covers many complex areas around the subject of data collection, storage and transfer of personal data by third parties, the key topics that are normally highlighted when discussing this new regulation are that an individual must now provide consent prior to a third party collecting data about themselves and that said individual has also the right to request the data collected to be deleted at any point in time, as well as to revoke any prior consent given to collect personal data.

How does the new regulation affect sport organisations?

Like any other company in any industry, sport clubs and organisations also require to reassess the data they collect from their fans, volunteers, employees and any other member of the club. No organisation that collects and stores personal data of an EU citizen, even in sports, is exempt of the €20 million or 4% of yearly turnover fines if they are found noncompliant.

One of the biggest changes a club now needs to manage is around fan collected data, often used to increase fan engagement and delivering marketing campaign to grow the club's fan base. Like many marketing departments in numerous organisations collect a wide variety of information about their customers, such as interests, personally identifiable data (PII) purchase history and any actions individuals take on websites and physical events they attend, such as a football match. Clubs need to reevaluate the level of consent they receive to continue to store and collect all this data points about their fans and prospective supporters. Similarly, GDPR applies to the employer-employee relationship and data sharing. This means the clubs will also require consent from players, coaches and members of staff.

Aside from evaluating their data management and applying new procedures, clubs will also require to be able to demonstrate compliance by updating and making public their data privacy policies and new processes they put in place for GDPR. This includes clearly informing how individuals can request their data store, update it or remove it altogether, as well as the steps to follow to revoke consent if they wish to do so.

And how does it affect Sport Performance Analysis?

Player profiling is one of the various key tasks of a performance analyst. It can involve either evaluating your own player's performance or assessing the players from the rival club the team will be facing on their next fixture. An analyst would gather data on the player's recent performances, strengths, weaknesses and playing styles to compile detailed reports to present to the coaching team.

In Article 22, GDPR tackles profiling directly as it refers to it as building up a picture of the type of person someone is by evaluating certain personal aspects relating to a natural person’s performance at work, economic situation, health, personal preferences, interests, reliability, behaviour, location or movements. However, the new legislation also specifies that consent is only necessary if automatic decision-making is applied based on this profiling, and also only if such automatic decision-making creates any legal effects or significantly affects the individual in question. This means that the simple task of profiling should not require the consent of the individual unless sensitive personal data, such as health, race or other sensitive data, is collected during profiling.

This suggests that player profiling in sports can be interpreted as not requiring the player's consent. Firstly, decision-making based on profiling in this scenario is not an automatic one. This means that even though player profiles are collected to make decision on tactics, training session preparation or recruitment, there is always a element of human review of such profiles, usually by the coaching team, which could rule out the classification of this processes as being for "automated decision-making" as required in order to apply to GDPR guidelines. Secondly, the profiling carried out by analysts should not have any legal effects or significant affect the individual being profiles. The human intervention in reviewing these profiles also backs up this argument, as no "automatic" effects are generated by this activity.

There is, however, a counter-argument worth considering, and that is around the sensitive nature of the data used in profiling. Player profiling can include sensitive information about the player in question, particularly around his or hers health. Injuries are bound to appear in a majority of player profiles generated by analysts, particularly if the goal is to optimize injury prevention. In such cases, consent is required to be provided by the player as the profiling now contains sensitive data of that natural person. It is also worth considering the application of GDPR in the scouting of youth talent, were profiling is carried out by gathering data on minors where parental consent should be obtained. Data collected from minors cannot fall be considered as having a legitimate reason for gathering such information without prior consent.

Navigating the complex world of GDPR is undoubtably challenging for many teams and analysts. However, it is important to know the scenarios when consent is required to produce a piece of analysis involving player data and when, as Articule 6(f) states, there is a "legitimate reason" to collect data without consent. Nevertheless, while consent might not always be required, it is always important to evaluate the scope, transparency and long-term purpose of the profiling process before assuming no consent is required. This can include areas such as the player's right to decline their data from being collected and request the deletion of any previously collected data. One way or another, a performance analysis team now needs to consider the implementation of new processes around data management in their day to day roles.

What are Expected Goals (xG)?

What are Expected Goals (xG)?

Expected Goals, or xG, are the number of goals a player or team should have scored when considering the number and type of chances they had in a match. It is a way of using statistics to provide an objective view to common commentaries such as: ”He shouldn't miss that!” "He's got to score those chances!" "He should have had a hat-trick!”

Goals in football are rare events, with just over 2.5 goals scored on average per game. Therefore, the historical number of goals does not provide a large enough sample to predict the outcome of a match. This means that shots on target and total number of shots are now being used as the next closest stats to predict number of goals. However, not all shots have the same likelihood of ending up in the back of the net.

This is where xG comes into play. Expected Goals uses various characteristics of the shots being taken together with historical data of such types of shots to predict the likelihood of a specific shot being scored. Since xG is simply an averaged probability of a shot being scored, a team or player may outperform or underperform their xG value. This means that they could be scoring chances that the average player would miss or that they could be missing chances that are often scored.

xG is often used to analyse various scenarios:

  • To predict the score of an upcoming match using historical data of the teams involved. 
  • Assess a team’s or player’s “true” performance on a match or season, regardless of their short-term form or one-off actions on a pitch. It provides a data point on the number and quality of chances being created regardless of the final result.
  • Identify performing players in underperforming teams, or those who receive less playing minutes, by assessing which ones are more effective than the quality of their chances they receive would suggest. 
  • Understand the defensive performance of a team by assessing how effectively are they preventing the opponent team from scoring their chances.

Origin of the ExpectedGoals Model

In April 2012, Advanced Data Analyst Sam Green from sport statistics company Opta first explained his innovative approach to assessing the performance of Premier League goalscorers, inspired by similar models being used in American sports. However, it was not until the beginning of the 2017/18 season when BBC’s Match of The Day debut their use of xG by their popular football pundits to make xG a focal topic of conversation by many football fans. 

Over the years, Opta has collected numerous data points of in-game actions in all of the top football leagues. When creating the xG model, Sam Green and the Opta team analysed more than 300,000 shots and a number of different variables using Opta’s on-ball event data, such as angle of the shot, assist type, shot location, the in-game situation, the proximity of opposition defenders and distance from goal. They were then able to assign an xG value, usually as a percentage, to every goal attempt and determine how good a particular type of chance is. As new matches are played new data is collected to continuously refine the xG model.

There is no one specific model to calculate xG. When looking at xG it is important to consider that the xG value would depend on the factors that the analyst creating the xG model wants to incorporate in the calculations. Since its release to the public, the xG theory raised considerable attention in the analytics community, with many enthusiasts working and adjusting the model in their own ways in an attempt to perfect it. This means there are now several different xG models out there, each of them considering different factors. Some would consider whether it was a goal scored with their feet or with their head, other consider the situation that led to the shot and so on, but the final prediction each model outputs have shown to only vary slightly across different models.

How is xG calculated?

Opta’s xG model is based on the fact that the most basic requirement to score goals is to take shots. However, not all strikers score goals from the same number of shots. As Sam Green identified, in the 2011/12 season Van Persie only needed 5.4 shots to score a goal, while Luis Suarez took 13.8 shots for each goal he scored. However, they both shot the same number of times per game they played.


This is why Opta decided to look deeper into the quality of chances each striker received by adding the average location from which each shots was taken. However, they soon realized that location on its own was not enough. A penalty spot chance could come from a penalty kick, a header from a corner or a 1 on 1 against the goalkeeper, each with a very different likelihood of ending up in a goal. That is why Opta decided to incorporate additional data points to the model. Unfortunately, the exact model with all the factors considered by Opta has not been made public but a number of analyst have attempted to replicate or improve the model since its first release.

The xG model was designed to return an xG value for each player, team or chance depending on the dimension that the data is being analysed in: a full season, a particular match, a specific half in a game or group of goal attempts. Let’s say a player like Harry Kane takes 100 shots from chances that, based on historical Premier League data, have a probability of being scored of 0.202 (or 20.2%). Kane's xG value would be 20 expected goals scored (100 shots x 0.202). This xG number would contain an average of some ‘big scoring chances’ Kane took, such as penalties with 0.783xG, other non-penalty shots inside the box with varying xG values such as 0.387xG and maybe even shots outside the box with an 0.036xG value. The models attempts to balance the number of shots a player takes with the quality of these chances. For example, a player may get himself into very dangerous attacking positions inside the box in 23 occasions with high xG value and score the same number of goals than a player that continuously tries his luck from outside the box with 81 shots attempts that have a lower xG value.

Once an xG value has been calculated, a player or team’s performance can be evaluated on whether they are over or under-performing such value. In the above example, Harry Kane may actually score 25 goals during the full season, 5 goals above his 20 xG value, suggesting that his ability of converting chances is above-average and he can find the net in difficult scoring situations. Similarly, a player with a 20 xG value who has scored 15 goals suggests that he is missing chances that he probably should have scored.


Opta took xG a step further and assessed the impact the player had to a specific chance using their shot quality. They did so by factoring into the xG calculation the propensity to hit the target a shot taken by the player has and then comparing the former xG(Overall) value against this new xG(On Target) one. Their analysis showed that at the time Van der Vaart’s shooting saw his xG increase from 6.9xG to 10.3xG(On Target), suggesting that the type of shots he took were of higher quality than the average when xG was calculated before he took the shot. xG(OT) when compared to actual goals may also indicate how much a player was affected by the quality of goalkeeping he had to face. In the same season, Mikel Arteta scored 7 goals with just 3.5xG(OT) suggesting he got ‘luckier’ in front of goal as his shooting quality should have only given him just over 3 goals.

xG(OT) can be used to assess goalkeeping quality when used in reverse. Since it only takes into consideration shots on target, a keeper’s participation in these sort of chances is crucial to the final outcome of the play. De Gea conceding 22 goals with an 27xG(OT) suggests that he has blocked goals in situation were they are normally conceded.

Why are Expected Goals important in today's football?

Luck and randomness influences results in football more often than any other sports. We have all seem teams being dominated throughout a match and manage to score a last minute winning goal while having a lower number of chances than their opposition. But how sustainable is that? We have also seen world class strikers become out-of-form and spend a few games without seeing the back of the net. Is the player not taking advantage of the chances being provided by his teammates? xG allows us to assess the process over the results of a match, or performance of a player or team, by rating the quality of chances instead of the actual outcome.


The most used example to explain xG’s efficiency is the Juventus season of 2015/16. Juventus only won 3 out of their first 10 games but the difference between their actual goals and xG was considerably high. This meant that the had the chances but were not converting them, suggesting that their negative run of results might not last if they just get a bit luckier in front of goal. Sacking manager Massimo Allegri could have been a mistake, since after match day 12 their luck changed and ended up winning the league title with 9 games spare.

xG gives us a more accurate way of predicting match outcomes than by simply using individual stats. In the Premier League, only 71.6% of teams that had the most shots won the fixture, while close to 81% of teams that obtain a higher xG score win games. It eliminates historical assumptions that popular tradition in football has created and provides a statistically relevant point of argument to whether the performance of a player or team is above or below the average given a number of historical data points.


When using expected goals to see which players are hitting the target more or less than the numbers suggest they should, teams can scout promising prolific goalscorers if they consistently score more goals than the quality of chances they get. On the other hand if a player surpasses his expected goals for a few games but has no history of doing so in the past, it might come down to his form and luck rather than goalscoring talent, and he might struggle to sustain that over a long period of time.

Limitations of the Expected Goals model

The xG model is only as good as the factors being input into its calculations. These data inputs are limited by the data we possess today from companies such as Opta. Other factors, such as shot power, curl or dip on the shot or whether the goalkeeper is unsighted or off balance might not be considered in most xG models out there. Due to model being based on averages, the random nature of a football match and the rarity of goals in the sport makes it almost impossible to consider with enough statistical significance all historical factors that can cause a goal to be scored. xG should be used as indicative and supportive information for decision making purposes and generating opinions rather than a finite answer to the performance of a team or player.

As the model’s creator Sam Green puts it: “a system like this will also fail to predict a high scoring game. Since it is based on averages and with around half of matches featuring fewer than 2.5 goals, this is to be expected”. We also need to consider that a shot taken by a Manchester United striker should have a higher xG than one taken by a Stoke City player, suggesting that on average Man Utd would outperform their xG on a chance by chance basis while Stoke City would underperform it if the xG is calculated using averages from all English teams' shot history.

Criticism and the Future of xG models

The recent misuse of Expected Goals as a analysis metric during pundit commentary has encouraged numerous criticism. A team may score one or two difficult chances early in a game and sit back for the remaining of the 90 minutes, allowing their opponents to take many shots from different positions, thus increasing the opponents xG. One could then claim that the losing team achieved a higher xG therefore deserves the win. This is why xG should always be taken with additional context of the game before creating a verdict. Statistics can just tell us what happened in a game but a wider view is necessary to show you how it happened and give you a clearer idea on what’s yet to come. Certain in-game actions by players cannot be measured with a statistical model today, such as the ability of a defender in getting in front of a shot attempt despite never touching the ball.

There is also a strong resistance from the football community to the use of data. Football is a traditional and emotional sport by nature, with experience and accepted wisdom dominating people’s opinions. Most fans see the use of statistics as intrusive and challenging their popular and historic knowledge of “the beautiful game”. After experiencing their team lose, most of them are not interested in listening to television pundits discuss how their team performed against their expected goals. Despite analytics having plenty to offer to football performance analysis, there are still doubters. xG’s debut in Match of the Day shaked social media with instant mentions of “stat nerds” and claims that the numbers in football are “pointless” and “bollocks”. However, it has been made clear by Opta that xG is not intended to ever replace scouts and pundits but simply aid them in their analysis of a game.

Despite all this resistance and criticism by some pundits and football fans to accept this new era of football analysis, Opta and various sport analysts continue to evolve the use of statistics to analyse performance in numerous areas in football. Models such as xG are the first round of statistical systems and will soon be followed by upcoming ones such as Defensive Coverage, which will assess tackles, blocks, interceptions, man-marking and clearances. Football’s data revolution has started and will continue to see developments every season.