The configuration of OpenFootie comprises of the input/output data sources used by the application. The typical configuration is:
database = jdbc:mysql://localhost:3306/match_engine?user=root&password=123456
probmodel = data/match report.data
matchreport = data/match report.txt
playerstats = data/player stats.txt
statssummary = data/stats summary.txt
The configuration file is called config.txt and it is placed in the classpath. In this blog entry, we are going to take a look at the structure of each data source.
The database supported is MySQL. You can avoid using the database for running the application on your desktop, if you just want to give it a quick shot or you don’t mind changing hardcoded input for some variety between match simulation runs.
It wouldn’t be too difficult to add support for other databases, as well, but I am more inclined towards providing a custom binary format for game input, as it would be more appropriate. However, as I mentioned elsewhere, the database is a quick and dirty solution plus it was needed anyway for the host application on the web.
So, what’s in the database of OpenFootie? The input is the minimal it can be, as OpenFootie is initially intended as a library and not a full application. The minimum entities required are teams and players. To add meaning to the input, the players must be associated with certain abilities or weaknesses and the team must have a formation on the pitch.
For the team, we only need to store a name as minimal input. The data provided are most of the teams pariticipating in the old 2010 world cup. (As I am writing this, I apologize for not updating to the newer world cup, that has already passed. Although no further updates are envisioned for the software itself, I could still update the data. The project is not as neglected as it seems, from a broader point of view).
Next are the players. The player names are not real. With each team 11 players are associated, as no substitutions or tactics changes are supported from the command-line desktop application. Each player has a shirt number, unique within his team, used also by the application code for identification during the match. Finally, each player is associated with a number between 1 and 4 to denote his position.
The position field also implies the team’s tactics. First of all, intuitively, 1 means goalkeeper, 2 defender, 3 midfield and 4 forward. This is a different semantic from the one used by the web host application and both semantics are supported by the OpenFootie library. This is, of course, the simplest one which accomodates for the minimalistic nature of the command-line application. When the application reads the players of the selected team to form the input (lineup + formation), it infers the tactics by the positions of the team players and puts the players in corresponding positions according to the tactics and order of the players in the database (by their id, that is). So the order of midfielders (position = 3) of a selected team defines which players play at the left, right or centre. You don’t need to actually worry about that as the players skills match their implied positions. It would only be a little tricky if you would actually want to tweak the input from the database. In that case, you could alternatively try the test.HardcodedMatch class (currently available from the trunk) or the online demo application.
In a separate table, the player skills of each player are stored. The table’s structure is self-explanatory, while the rates of the attributes are from zero to 6+. Generally, a player with attribute skill of 6 or more is considered of having “excellent” value for that attribute, while I wanted to give an extra variety considering the difference in attribute rates of “excellent” players.
This is the file that represents a real football match in a custom language. The file used is in binary format, however it is converted from an XML file which is readable by a human. The file is not included in the 0.6 distribution, however it can be downloaded from the repository now in github. This is going to be a short specification of this XML file.
The concept behind representing and eventually reproducing a football match used by OpenFootie is to see a football match as a sequence of states having a cause and effect relationship. Not all events can be connected and not with the same frequence. For instance, a team awarded a penalty kick cannot concede a corner kick in the next “moment”. Similarly, it is a rare occasion that a goalkeeper scores a goal by kicking the ball from his area. On the other hand, a crossing of the ball will probably be followed by an aerial challenge or a shot may end to a goal scoring opportunity.
What is included in the file is a definition of events that may happen at a football match (this is not a comprehensive list but it is rather based on the first half of a CL game played some time ago). Next is the relationship with other events in a manner that reflects the probability of a particular event happening after another. Since it is a small sample, there is no special representation for the probability, but it is rather implied by the sequence itself. Although this would not be efficient (“search for all the results of a particular event and count them by type”) in case the sample was really big, it doesn’t matter for now.
The structure of the XML file will be covered in the next sections. The representation is rather simplistic, aiming for a statistical reproduction of the match, instead of a more user-friendly approach.
The coordinates are measured in an ordinal scale, rather than a ratio scale. For the length coordinates, the various positions are characterized as being in “Defence”, “Centre” or “Attack”. This is in reference to the team (defined as) having possession of the ball. I would also like to apologize if the terminology seems a bit strange (e.g. “Midfield” could be used instead of “Centre”). This is partly due to the casual way of giving names to things I needed while I was developing the engine and partly due to influence from my mother tongue.
For the width coordinates, only two areas are defined: “Axis” and “Flank”. Right or left flank definition does not matter for the representation of the match. It should be noticed, that since the match representation language semantics could not be comprehensive enough, especially in its first edition, the match engine implementation plays partly that role. Therefore, while the match engine is really an “interpreter” or “processor” of the language defined in our XML file, it intervenes in adding semantics dynamically. This was only a side note and we can see examples of this in subsequent documentation. So, the exact side of flank is only defined in “run-time” of the match, according to the actual match engine implementation.
A key factor of a specific instant of a football match is the pressure on the team having possession of the ball. We specify pressure with three values: “Clear”, “Avoid” and “Under”. These are more or less self-explanatory. The “Avoid” value for pressure represents the case of the team having the ball making an effort to move forward or move quickly to avoid the pressure.
The action signifies the way of connection between different states. It correponds, of course, to the real world meaning of the word, and specifically the action of the player having the ball. According to the current state and the action “chosen”, the transition to new state is defined. There is a number of different actions supported, each with a self-explanatory name. The possible actions to be taken in each state correspond to the mapping of real world actions to state modelling.
Another attribute you may notice is that of “ModuloRowId”. Please ignore this one as it is used for debugging purposes.
Each state is named “condition” in the XML file, in the sense that each state is the “if” part of an “if…then…” statement. Each condition has a “result” (another state) or a “challenge” and a “result”. The challenge is an intermediary state, which does not involve an action. For instance, a player makes a long pass which results in an aerial challenge before another player has possession of the ball in a state where he can choose his next action. Maybe you will remark that the concept of a challenge may not be necessary in terms of keeping things simple, and it could be ommitted. However, the sample and the corresponding language evolved naturally during the representation of a real match, and I was trying to depict abstractions of exactly what I was seeing. In general, some things might not be needed to be included in the first version of the probability model, however they don’t harm in their own right either (and they could make more precise statistics). We will see some examples of challenges in a following section.
Result tags are essentially the description of a state resulting from a “condition” state, along with some attributes which describe how the transition takes place. We saw from the previous sections that the only attributes needed to describe a state are the coordinates of the ball position and the degree of pressing by the opponents. The two additional attributes for result states are the “Team” attribute, which denotes whether the ball should go to an opponent or a teammate and sometimes a tag denoting the way the ball has changed possession, which is used for statistics. A side note here is that the player having possesion of the ball does not “know” the possible outcomes of his actions, and the choice of actions depends only on the frequency they are encountered in the probability model for a specific state. Even if the outcome would be 100% negative, the player would still “choose” the action according to the direction from the probability model file, without employing any kind of artificial intelligence.
Except of the “canonical” results described above, there are also the “special” results. If, for instance, the result is a foul from a tackling, there is no need to duplicate the next state, which is implied, however a “special” tag is included denoting that a foul took place.
The challenge tag may describe a variation of things relative to the real world. It could mean from a simple aerial challenge to a sequence of “states” where the ball hasn’t touched the ground for a minute. This implies that the time cost would vary if absolute measures of time were taken into account. Time itself is implied by allocating a specific number of states for each match. Based on the number of states used to describe that first half of the sample match, the magic number of “510” is allocated for the number of states of each match. This, of course, may change in the future as more samples are taken.
The “Y” atribute is informative as to where in the pitch the challenge takes place. The “X” coordinate is omitted for challenge represenations. The “Team” attribute denotes which team starts the challenge (and it does not always describe the conclusion of the challenge, as there may be a result tag which denotes that). Finally, an “Ending” attribute may be included for describing how the challenge has concluded.
Analyzing each and every state transition rule in the data file is out of scope of this article. However, some examples would be really useful.
<condition Y='Attack' X='Flank' Pressure='Avoid' Action='Ball Control' ModuloRowId='87'>
The initial state is that the team having possession of the ball is attacking and has the ball in the “attack” area (let’s say around the penalty box or in the box). Since the X coordinate has value ‘Flank’, the ball should be about either left or right of the penalty box. The player having the ball is under pressure and tries to avoid the defender(s), by holding the ball (action = ‘Ball Control’). The result in this case is winning a free kick (result special=’Foul’). Notice that there is no need to define the coordinates of the result state (they should be the same as the “current” state).
<condition Y='Defence' X='Axis' Pressure='Clear' Action='Forward Pass' ModuloRowId='48'>
<result Y='Centre' X='Axis' Pressure='Avoid' Team='Own'/>
The ball is in the half of the team which has possession, in the ‘axis’ or the center of the field, without any pressure. The player having the ball decides to make a ‘forward pass’ which results in a teammate (Team=’Own’ in the result) having the ball in the opponent’s half (Y = ‘Centre’), trying to avoid the pressure of the opponents.
<condition Y='Centre' X='Axis' Pressure='Avoid' Action='Back Pass' ModuloRowId='49'>
<challenge Y='Centre' Team='Opp' Ending='Loose Ball'/>
<result Y='Centre' X='Axis' Pressure='Avoid' Team='Opp'/>
This transition has a challenge as part of the result, or reaches the result state through a challenge. The team having the ball, has it on the opponents’ half in the centre of the X axis, and they are under pressure which they are trying to avoid. The action chosen is a ‘Back Pass’, which results in a challenge. This example demonstrates that a challenge is an intermediate state in which no team has clearly possession of the ball. You can imagine that the opponent team intercepted the pass (Team = ‘Opp’ in ‘challenge’ element), which has the result of the ball being ‘loose’ (see ball possession change modes section). Finally, the opponent team has the ball in the half of the team which originally had possession of the ball, trying to avoid the defenders’ pressure.
Ball possession change
For statistics reasons, the information of how possession of the ball changed is also recorded in the data file. The following modes are taken into account:
- Normal: Nothing special about possession change
- Pass interception: A defending player is credited with an interception
- Goalkeeper interception: The goalkeeper wins the ball from e.g. a crossing
- Man challenge lost: The defending player wins the ball from someone trying to dribble him
- Loose ball: Normally a result of a challenge where no team had clear possession of the ball immediately after the challenge
- Lost ball control: The ball is won from someone attempting to hold or control the ball
- Bouncing off: Ball is won from bouncing off a shot
Based on actual examples, this is quite a detailed list for this stage. In the data file a ball possession change type is represented with the “PossessionChange” attribute of the “result” tag, or the “Ending” attribute of the “challenge” tag mentioned above.
I hope this section clarifies the structure of the data file and provides an overview of the workings of the match engine. You might have noticed some strange terminology regarding the values of some of the attributes, while some others might be self-explanatory. I won’t go into detail about their semantics, which right now can only be traced in the source code. This data file representation, after all, was meant to be used as a binary file and its xml version is only used as a convenient representation. However, besides semantics, the attribute values you see function as qualifiers in terms of determining the states chaining (i.e. which state should follow) or for producing stats (as mentioned above in the “ball possession change” attribute).