Icon

Feature Engineering

This below workflow demonstrates how 30 features are extracted from Male/Female Soccer data before using them for building a classifier.


------------------------------------------------------------------------------------------------------------------------------------------------------------

Details of the 30 extracted features are as follows:

------------------------------------------------------------------------------
Fouls and Passes Features Group
------------------------------------------------------------------------------
1. foul_committed_count: The number of events categorized as a foul committed. Any infringement penalized as foul play by a referee. Offsides are not tagged as a foul committed.

2. offside_events: Offside infringement resulting from a shot or clearance (non-pass). For passes resulting in an offside, check the pass outcomes section.

3. angle_mean: Mean of all pass angles in all events of a match. The angle of the pass in radians, with 0 pointing straight ahead, positive values between 0 and π indicating an angle clockwise, and negative values between 0 and -π representing an angle anti-clockwise.

4. angle_mode: Mode of all pass angles in all events of a match.

5. pass_length_sum: Sum of all pass lengths in a match. Filter out all pass events, then retrieve the length of each pass. Sum them up for each match.

------------------------------------------------------------------------------
Tactical Shifts and Substitutions Features Group
------------------------------------------------------------------------------

6. tactics_used_per_match: Count the number of different tactics that take place in a match. The tactics of a match represent the strategy the team employs, e.g., "343" means three defenders, four midfielders, and three strikers.

7. tactical_shift_count: Counts the number of events marked as Tactical Shift, indicating a tactical shift made by the team. These events typically show the players' new positions and the team's new formation, e.g. 343.

8. late_substitution_60: Measures the number of player substitutions from the 15th minute of the second period to the end of the second period.

9. substitution: Measures the number of player substitutions in the entire match.

10. no_goal_keepers_events: Measures the number of events with actions performed by the goalkeeper.

------------------------------------------------------------------------------
Overall Intensity Features Group
------------------------------------------------------------------------------

11. total_duration: Total duration (seconds) of all events per match.

12. total_no_events: Count of the total events in each match.

13. no_irregular_playpattern_events: Extracted from the play_pattern element in each single event data. Counts the number of events where the play pattern is different from the regular pattern, including from corner, from free kick, from throw-in, from counter, etc.

------------------------------------------------------------------------------
Pressing Features Group
------------------------------------------------------------------------------

14. no_ball_recovery: Counts the number of events classified as Ball Recovery, which is an attempt to recover a loose ball. Extracted from the type elements and filtered only for events labeled as 'Ball Recovery'.

15. Count(under_pressure): Counts the total number of events marked as under pressure. This designation indicates that the action was performed while being pressured by an opponent.

16. no_counterpress_events: Counts the number of events annotated as counter-press events, which are pressing actions within 5 seconds of an open play turnover.

17. no_defensive_events: Aggregates the total number of defensive events, including events with types such as pressure, dribbled past, 50/50, duel, block, interception, and foul committed.

------------------------------------------------------------------------------
Shots Features Group
------------------------------------------------------------------------------

------------------------------------
General shot information
------------------------------------

18. shoot_total_count: The total number of shots that happened in a match.

19. no_shot_per_match: Counts the total number of penalty or corner shots that happened in a match. (f19: This label needs clarification or modification as it seems incomplete or possibly incorrect.)

------------------------------------
Technique shooting information
------------------------------------

20. no_types_of_shot: This feature shows the number of different technique types of the shots taken in that match.

21. shot_backheel: Counts the number of shots using the backheel technique.

22. shot_half_volleyCounts the number of shots using the half-volley technique.

23. shot_diving_header: Counts the number of shots using the diving header technique.

24. shot_lob: Counts the number of shots using the lob technique.

25. shot_volley: Counts the number of shots using the volley technique.

26. shot_normalCounts the number of shots using the normal technique.

27. shot_overhead_kick: Counts the number of shots using the overhead kick technique.

------------------------------------
Distance from the Initial Shot Point to the Goal
------------------------------------

28. shot_length_2goal: Measures the 2-dimensional distance from the shot location to the goal location for all shots in a match.

29. shot_length_start_end: Average 3D distance from the shot location to the ending point. Some ending points have missing values in the 3rd dimension, and those are assigned 0. The starting shot location has no 3rd-dimensional information, so the 3rd information is assigned 0.

30. shot_speed: Average speed of the ball moved in a shot action in a match. It is measured by using shot_length_start_end divided by duration. The unit of this feature is meters per second (m/s).
------------------------------------------------------------------------------------------------------------------------------------------------------------

Data resources and more details about the original annotated feature documentation from Statsbomb:
https://github.com/statsbomb/open-data

Nodes

Extensions

Links