In this tutorial, we are going to show how one can gather data about the esports game Dota 2 from a dedicated site collecting and providing such data. The data will be requested through the web site’s API interface documented here: The Open Dota API Documentation. First, we’re going to cover the basics of accessing an API using the R programming language.

APIs allow programmers to request data directly from certain websites through what’s called an Application Programming Interface. When a website sets up an API, they are essentially setting up a computer that waits for data requests. Once this computer receives a data request, it will do its own processing of the data and send it to the computer that requested it. From our perspective as the requester, we will need to write code in R that creates the request and tells the computer running the API what we need. That computer will then read our code, process the request, and return nicely-formatted data that can be easily parsed by existing R libraries.

Making API requests in R

To work with APIs in R, we need to bring in some libraries. These libraries take all of the complexities of an API request and wrap them up in functions that we can use in single lines of code. The R libraries that we’ll be using are httr and jsonlite. If you don’t have either of these libraries in your R console or RStudio, you’ll need to download them first. Use the install.packages() function to bring in these packages.

install.packages(c("httr", "jsonlite"))

After downloading the libraries, we’ll be able to use them in our R scripts or RMarkdown files.

library(httr)
library(jsonlite)

For our purposes, we’ll just be asking for data, which corresponds to a GET request. In order to create a GET request, we need to use the GET() function from the httr library. The GET() function requires a URL, which specifies the address of the server that the request needs to be sent to. For example, the full list with such URL addresses supported by Open Dota is given in their documentation. Let us make a request for professional Dota 2 matches.

pro_matches_raw = GET("https://api.opendota.com/api/proMatches")

Investigating the pro_matches_raw variable gives us a summary look at the resulting response. The first thing to notice is that it contains the URL that the GET request was sent to. We can also see the date and time that the request was made, as well as the size of the response. The content type gives us an idea of what form the data takes. This particular response says that the data takes on a json format, which gives a hint about why we need the jsonlite library.

pro_matches_raw
## Response [https://api.opendota.com/api/proMatches]
##   Date: 2021-07-26 09:36
##   Status: 200
##   Content-Type: application/json; charset=utf-8
##   Size: 32 kB

Handling JSON Data

JSON stands for JavaScript Object Notation. While JavaScript is another programming language, our focus on JSON is its structure. JSON is useful because it is easily readable by a computer, and for this reason, it has become the primary way that data is transported through APIs. Most APIs will send their responses in JSON format.

The rawToChar() is an R base function that converts the unicode content of the request into JSON format. Then we use the fromJSON() function from jasonlite to convert the JSON code into a data.frame structure.

pro_matches = fromJSON(rawToChar(pro_matches_raw$content))
pro_matches %>% head %>% kbl %>% kable_paper() %>% scroll_box(width = "100%", height = "250px")
match_id duration start_time radiant_team_id radiant_name dire_team_id dire_name leagueid league_name series_id series_type radiant_score dire_score radiant_win
6106012716 1163 1627287926 8449479 Team GL 5184391 EYE gaming 13268 Ultras Dota Pro 581335 1 15 34 FALSE
6106012486 1459 1627287956 8488438 NA 8488435 NA 13395 East Power Cup 581344 0 7 34 FALSE
6106009678 3137 1627287781 8488435 NA 8488432 NA 13395 East Power Cup 581343 0 0 0 FALSE
6106005670 3129 1627287539 8488432 NA 8488435 NA 13395 East Power Cup 581341 2 0 0 TRUE
6106004860 1634 1627287492 8400307 its a PRANK 8336189 Dota Geniuses 13286 Efusion Dota 2 League 581340 1 40 18 TRUE
6105994354 2293 1627286827 8254145 Execration 8482620 NA 13335 Perfect Land Gaming 581334 1 25 49 FALSE

Understanding the Dota 2 variables

The match_id, radiant_team_id and dota_team_id are identifiers for the match, and the radiant and dire teams, respectively. The start_time gives the date and time of the match stored as Unix time:

pro_matches$start_time[1] %>%  as.POSIXct(origin = "1970-01-01")
## [1] "2021-07-26 11:25:26 EEST"

The professional Dota 2 games are held in series of tournaments organized by Dota 2 leagues. The leagueid and league_name specify a Dota 2 league. The radiant_score and dire_score give the number of heroes killed by the opposite team.

Getting match details

To get more details about a Dota 2 match such the names of the 10 players and in-game statistics, we use the API call

https://api.opendota.com/api/matches/{match_id}.

We supply the identifier of the match we are interested in to the field {match_id}. Let us for example, consider the match id of the first result.

match_id = pro_matches$match_id[1]
base_url = "https://api.opendota.com/api/matches/"
api_call = paste0(base_url,match_id)
api_call
## [1] "https://api.opendota.com/api/matches/6106012716"

The following request

match_json = rawToChar(GET(api_call)$content)
m = fromJSON(match_json)

returns a long list of nested lists, taking lots of memory:

object.size(m)
## 882040 bytes

Let us see the names of the top-level list elements.

names(m)
##  [1] "match_id"                "barracks_status_dire"    "barracks_status_radiant"
##  [4] "chat"                    "cluster"                 "cosmetics"              
##  [7] "dire_score"              "dire_team_id"            "draft_timings"          
## [10] "duration"                "engine"                  "first_blood_time"       
## [13] "game_mode"               "human_players"           "leagueid"               
## [16] "lobby_type"              "match_seq_num"           "negative_votes"         
## [19] "objectives"              "picks_bans"              "positive_votes"         
## [22] "radiant_gold_adv"        "radiant_score"           "radiant_team_id"        
## [25] "radiant_win"             "radiant_xp_adv"          "skill"                  
## [28] "start_time"              "teamfights"              "tower_status_dire"      
## [31] "tower_status_radiant"    "version"                 "replay_salt"            
## [34] "series_id"               "series_type"             "league"                 
## [37] "radiant_team"            "dire_team"               "players"                
## [40] "patch"                   "region"                  "all_word_counts"        
## [43] "my_word_counts"          "comeback"                "stomp"                  
## [46] "replay_url"

The game can be summarized by the following statistics:

library(data.table)
data.table(date = as.Date.POSIXct(m$start_time, origin = "1970-01-01"),
           league = m$league$name,
           radiant = m$radiant_team$name,
           dire = m$dire_team$name,
           radiant_score = m$radiant_score,
           dire_score = m$dire_score,
           radiant_win =m$radiant_win
)  %>% kbl %>% kable_paper
date league radiant dire radiant_score dire_score radiant_win
2021-07-26 Ultras Dota Pro Team GL EYE gaming 15 34 FALSE

A more dynamic representation of the match can be gleaned from the list elements radiant_gold_adv1 and radiant_xp_adv. These are time series values giving as the difference in gold and experience between the two teams each minute of the game.

library(data.table)
d = data.table(minute = 1:length(m$radiant_gold_adv),  gold = m$radiant_gold_adv, xp = m$radiant_xp_adv)
d = melt(d, id.vars = 'minute')
ggplot2::qplot(minute, value, data = d, color = variable, geom = c('point', 'line'), main = paste('Match id = ', match_id))   

The players list element is a large list with details about the 10 human players.

names(m$players)
##   [1] "match_id"                  "player_slot"               "ability_targets"          
##   [4] "ability_upgrades_arr"      "ability_uses"              "account_id"               
##   [7] "actions"                   "additional_units"          "assists"                  
##  [10] "backpack_0"                "backpack_1"                "backpack_2"               
##  [13] "backpack_3"                "buyback_log"               "camps_stacked"            
##  [16] "connection_log"            "creeps_stacked"            "damage"                   
##  [19] "damage_inflictor"          "damage_inflictor_received" "damage_taken"             
##  [22] "damage_targets"            "deaths"                    "denies"                   
##  [25] "dn_t"                      "firstblood_claimed"        "gold"                     
##  [28] "gold_per_min"              "gold_reasons"              "gold_spent"               
##  [31] "gold_t"                    "hero_damage"               "hero_healing"             
##  [34] "hero_hits"                 "hero_id"                   "item_0"                   
##  [37] "item_1"                    "item_2"                    "item_3"                   
##  [40] "item_4"                    "item_5"                    "item_neutral"             
##  [43] "item_uses"                 "kill_streaks"              "killed"                   
##  [46] "killed_by"                 "kills"                     "kills_log"                
##  [49] "lane_pos"                  "last_hits"                 "leaver_status"            
##  [52] "level"                     "lh_t"                      "life_state"               
##  [55] "max_hero_hit"              "multi_kills"               "net_worth"                
##  [58] "obs"                       "obs_left_log"              "obs_log"                  
##  [61] "obs_placed"                "party_id"                  "party_size"               
##  [64] "performance_others"        "permanent_buffs"           "pings"                    
##  [67] "pred_vict"                 "purchase"                  "purchase_log"             
##  [70] "randomed"                  "repicked"                  "roshans_killed"           
##  [73] "rune_pickups"              "runes"                     "runes_log"                
##  [76] "sen"                       "sen_left_log"              "sen_log"                  
##  [79] "sen_placed"                "stuns"                     "teamfight_participation"  
##  [82] "times"                     "tower_damage"              "towers_killed"            
##  [85] "xp_per_min"                "xp_reasons"                "xp_t"                     
##  [88] "personaname"               "name"                      "last_login"               
##  [91] "radiant_win"               "start_time"                "duration"                 
##  [94] "cluster"                   "lobby_type"                "game_mode"                
##  [97] "is_contributor"            "patch"                     "region"                   
## [100] "isRadiant"                 "win"                       "lose"                     
## [103] "total_gold"                "total_xp"                  "kills_per_min"            
## [106] "kda"                       "abandons"                  "neutral_kills"            
## [109] "tower_kills"               "courier_kills"             "lane_kills"               
## [112] "hero_kills"                "observer_kills"            "sentry_kills"             
## [115] "roshan_kills"              "necronomicon_kills"        "ancient_kills"            
## [118] "buyback_count"             "observer_uses"             "sentry_uses"              
## [121] "lane_efficiency"           "lane_efficiency_pct"       "lane"                     
## [124] "lane_role"                 "is_roaming"                "purchase_time"            
## [127] "first_purchase_time"       "item_win"                  "item_usage"               
## [130] "purchase_ward_observer"    "actions_per_min"           "life_state_dead"          
## [133] "rank_tier"                 "cosmetics"                 "benchmarks"               
## [136] "purchase_ward_sentry"      "purchase_tpscroll"

Her are some performance measures of the individual players.

l = m$players
performance = data.table(account_id = l$account_id, rank_tier = l$rank_tier, kills = l$kills, deaths = l$deaths, damage = rowSums(l$damage, na.rm = T), total_gold = l$total_gold)
performance %>% kbl %>% kable_paper %>% scroll_box(width = "100%", height = "250px")
account_id rank_tier kills deaths damage total_gold
1139215778 NA 3 5 43215 7113
173851224 80 2 8 34597 4089
364568110 51 1 7 16948 3566
250358373 80 7 9 58852 8102
329706313 65 2 5 76585 7501
1029972951 80 2 4 44329 7985
160725934 80 7 4 11683 7229
119942696 80 14 0 99440 14440
299806942 80 1 3 53823 8199
130433320 80 9 4 60769 10970

These are some basics of the Open Dota API. For more information read the documentation or visit the dotabuff site.