In this tutorial, we are going to show how one can gather data about the esports game Dota 2 from a dedicated site collecting and providing such data. The data will be requested through the web site’s API interface documented here: The Open Dota API Documentation. First, we’re going to cover the basics of accessing an API using the R programming language.
APIs allow programmers to request data directly from certain websites through what’s called an Application Programming Interface. When a website sets up an API, they are essentially setting up a computer that waits for data requests. Once this computer receives a data request, it will do its own processing of the data and send it to the computer that requested it. From our perspective as the requester, we will need to write code in R that creates the request and tells the computer running the API what we need. That computer will then read our code, process the request, and return nicely-formatted data that can be easily parsed by existing R libraries.
Making API requests in R
To work with APIs in R, we need to bring in some libraries. These libraries take all of the complexities of an API request and wrap them up in functions that we can use in single lines of code. The R libraries that we’ll be using are httr
and jsonlite
. If you don’t have either of these libraries in your R console or RStudio, you’ll need to download them first. Use the install.packages()
function to bring in these packages.
install.packages(c("httr", "jsonlite"))
After downloading the libraries, we’ll be able to use them in our R scripts or RMarkdown files.
library(httr)
library(jsonlite)
For our purposes, we’ll just be asking for data, which corresponds to a GET
request. In order to create a GET
request, we need to use the GET()
function from the httr
library. The GET()
function requires a URL, which specifies the address of the server that the request needs to be sent to. For example, the full list with such URL addresses supported by Open Dota is given in their documentation. Let us make a request for professional Dota 2 matches.
pro_matches_raw = GET("https://api.opendota.com/api/proMatches")
Investigating the pro_matches_raw
variable gives us a summary look at the resulting response. The first thing to notice is that it contains the URL that the GET request was sent to. We can also see the date and time that the request was made, as well as the size of the response. The content type gives us an idea of what form the data takes. This particular response says that the data takes on a json format, which gives a hint about why we need the jsonlite
library.
pro_matches_raw
## Response [https://api.opendota.com/api/proMatches]
## Date: 2021-07-26 09:36
## Status: 200
## Content-Type: application/json; charset=utf-8
## Size: 32 kB
Handling JSON Data
JSON stands for JavaScript Object Notation. While JavaScript is another programming language, our focus on JSON is its structure. JSON is useful because it is easily readable by a computer, and for this reason, it has become the primary way that data is transported through APIs. Most APIs will send their responses in JSON format.
The rawToChar()
is an R base function that converts the unicode content of the request into JSON format. Then we use the fromJSON()
function from jasonlite
to convert the JSON code into a data.frame
structure.
pro_matches = fromJSON(rawToChar(pro_matches_raw$content))
pro_matches %>% head %>% kbl %>% kable_paper() %>% scroll_box(width = "100%", height = "250px")
match_id | duration | start_time | radiant_team_id | radiant_name | dire_team_id | dire_name | leagueid | league_name | series_id | series_type | radiant_score | dire_score | radiant_win |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
6106012716 | 1163 | 1627287926 | 8449479 | Team GL | 5184391 | EYE gaming | 13268 | Ultras Dota Pro | 581335 | 1 | 15 | 34 | FALSE |
6106012486 | 1459 | 1627287956 | 8488438 | NA | 8488435 | NA | 13395 | East Power Cup | 581344 | 0 | 7 | 34 | FALSE |
6106009678 | 3137 | 1627287781 | 8488435 | NA | 8488432 | NA | 13395 | East Power Cup | 581343 | 0 | 0 | 0 | FALSE |
6106005670 | 3129 | 1627287539 | 8488432 | NA | 8488435 | NA | 13395 | East Power Cup | 581341 | 2 | 0 | 0 | TRUE |
6106004860 | 1634 | 1627287492 | 8400307 | its a PRANK | 8336189 | Dota Geniuses | 13286 | Efusion Dota 2 League | 581340 | 1 | 40 | 18 | TRUE |
6105994354 | 2293 | 1627286827 | 8254145 | Execration | 8482620 | NA | 13335 | Perfect Land Gaming | 581334 | 1 | 25 | 49 | FALSE |
Understanding the Dota 2 variables
The match_id
, radiant_team_id
and dota_team_id
are identifiers for the match, and the radiant and dire teams, respectively. The start_time
gives the date and time of the match stored as Unix time:
pro_matches$start_time[1] %>% as.POSIXct(origin = "1970-01-01")
## [1] "2021-07-26 11:25:26 EEST"
The professional Dota 2 games are held in series of tournaments organized by Dota 2 leagues. The leagueid
and league_name
specify a Dota 2 league. The radiant_score
and dire_score
give the number of heroes killed by the opposite team.
Getting match details
To get more details about a Dota 2 match such the names of the 10 players and in-game statistics, we use the API call
https://api.opendota.com/api/matches/{match_id}
.
We supply the identifier of the match we are interested in to the field {match_id}
. Let us for example, consider the match id of the first result.
match_id = pro_matches$match_id[1]
base_url = "https://api.opendota.com/api/matches/"
api_call = paste0(base_url,match_id)
api_call
## [1] "https://api.opendota.com/api/matches/6106012716"
The following request
match_json = rawToChar(GET(api_call)$content)
m = fromJSON(match_json)
returns a long list of nested lists, taking lots of memory:
object.size(m)
## 882040 bytes
Let us see the names of the top-level list elements.
names(m)
## [1] "match_id" "barracks_status_dire" "barracks_status_radiant"
## [4] "chat" "cluster" "cosmetics"
## [7] "dire_score" "dire_team_id" "draft_timings"
## [10] "duration" "engine" "first_blood_time"
## [13] "game_mode" "human_players" "leagueid"
## [16] "lobby_type" "match_seq_num" "negative_votes"
## [19] "objectives" "picks_bans" "positive_votes"
## [22] "radiant_gold_adv" "radiant_score" "radiant_team_id"
## [25] "radiant_win" "radiant_xp_adv" "skill"
## [28] "start_time" "teamfights" "tower_status_dire"
## [31] "tower_status_radiant" "version" "replay_salt"
## [34] "series_id" "series_type" "league"
## [37] "radiant_team" "dire_team" "players"
## [40] "patch" "region" "all_word_counts"
## [43] "my_word_counts" "comeback" "stomp"
## [46] "replay_url"
The game can be summarized by the following statistics:
library(data.table)
data.table(date = as.Date.POSIXct(m$start_time, origin = "1970-01-01"),
league = m$league$name,
radiant = m$radiant_team$name,
dire = m$dire_team$name,
radiant_score = m$radiant_score,
dire_score = m$dire_score,
radiant_win =m$radiant_win
) %>% kbl %>% kable_paper
date | league | radiant | dire | radiant_score | dire_score | radiant_win |
---|---|---|---|---|---|---|
2021-07-26 | Ultras Dota Pro | Team GL | EYE gaming | 15 | 34 | FALSE |
A more dynamic representation of the match can be gleaned from the list elements radiant_gold_adv1
and radiant_xp_adv
. These are time series values giving as the difference in gold and experience between the two teams each minute of the game.
library(data.table)
d = data.table(minute = 1:length(m$radiant_gold_adv), gold = m$radiant_gold_adv, xp = m$radiant_xp_adv)
d = melt(d, id.vars = 'minute')
ggplot2::qplot(minute, value, data = d, color = variable, geom = c('point', 'line'), main = paste('Match id = ', match_id))
The players
list element is a large list with details about the 10 human players.
names(m$players)
## [1] "match_id" "player_slot" "ability_targets"
## [4] "ability_upgrades_arr" "ability_uses" "account_id"
## [7] "actions" "additional_units" "assists"
## [10] "backpack_0" "backpack_1" "backpack_2"
## [13] "backpack_3" "buyback_log" "camps_stacked"
## [16] "connection_log" "creeps_stacked" "damage"
## [19] "damage_inflictor" "damage_inflictor_received" "damage_taken"
## [22] "damage_targets" "deaths" "denies"
## [25] "dn_t" "firstblood_claimed" "gold"
## [28] "gold_per_min" "gold_reasons" "gold_spent"
## [31] "gold_t" "hero_damage" "hero_healing"
## [34] "hero_hits" "hero_id" "item_0"
## [37] "item_1" "item_2" "item_3"
## [40] "item_4" "item_5" "item_neutral"
## [43] "item_uses" "kill_streaks" "killed"
## [46] "killed_by" "kills" "kills_log"
## [49] "lane_pos" "last_hits" "leaver_status"
## [52] "level" "lh_t" "life_state"
## [55] "max_hero_hit" "multi_kills" "net_worth"
## [58] "obs" "obs_left_log" "obs_log"
## [61] "obs_placed" "party_id" "party_size"
## [64] "performance_others" "permanent_buffs" "pings"
## [67] "pred_vict" "purchase" "purchase_log"
## [70] "randomed" "repicked" "roshans_killed"
## [73] "rune_pickups" "runes" "runes_log"
## [76] "sen" "sen_left_log" "sen_log"
## [79] "sen_placed" "stuns" "teamfight_participation"
## [82] "times" "tower_damage" "towers_killed"
## [85] "xp_per_min" "xp_reasons" "xp_t"
## [88] "personaname" "name" "last_login"
## [91] "radiant_win" "start_time" "duration"
## [94] "cluster" "lobby_type" "game_mode"
## [97] "is_contributor" "patch" "region"
## [100] "isRadiant" "win" "lose"
## [103] "total_gold" "total_xp" "kills_per_min"
## [106] "kda" "abandons" "neutral_kills"
## [109] "tower_kills" "courier_kills" "lane_kills"
## [112] "hero_kills" "observer_kills" "sentry_kills"
## [115] "roshan_kills" "necronomicon_kills" "ancient_kills"
## [118] "buyback_count" "observer_uses" "sentry_uses"
## [121] "lane_efficiency" "lane_efficiency_pct" "lane"
## [124] "lane_role" "is_roaming" "purchase_time"
## [127] "first_purchase_time" "item_win" "item_usage"
## [130] "purchase_ward_observer" "actions_per_min" "life_state_dead"
## [133] "rank_tier" "cosmetics" "benchmarks"
## [136] "purchase_ward_sentry" "purchase_tpscroll"
Her are some performance measures of the individual players.
l = m$players
performance = data.table(account_id = l$account_id, rank_tier = l$rank_tier, kills = l$kills, deaths = l$deaths, damage = rowSums(l$damage, na.rm = T), total_gold = l$total_gold)
performance %>% kbl %>% kable_paper %>% scroll_box(width = "100%", height = "250px")
account_id | rank_tier | kills | deaths | damage | total_gold |
---|---|---|---|---|---|
1139215778 | NA | 3 | 5 | 43215 | 7113 |
173851224 | 80 | 2 | 8 | 34597 | 4089 |
364568110 | 51 | 1 | 7 | 16948 | 3566 |
250358373 | 80 | 7 | 9 | 58852 | 8102 |
329706313 | 65 | 2 | 5 | 76585 | 7501 |
1029972951 | 80 | 2 | 4 | 44329 | 7985 |
160725934 | 80 | 7 | 4 | 11683 | 7229 |
119942696 | 80 | 14 | 0 | 99440 | 14440 |
299806942 | 80 | 1 | 3 | 53823 | 8199 |
130433320 | 80 | 9 | 4 | 60769 | 10970 |
These are some basics of the Open Dota API. For more information read the documentation or visit the dotabuff site.