Finding the Data in Data Science

A Python API Walkthrough using Requests and Sportradar

Scott Okamura
7 min readAug 26, 2021

API (Application Programming Interface)

If you are just beginning your journey into data science, you may have learned that sometimes the hardest part about the process is simply finding the data. Whether that be scraping, forking, or requesting from a database, obtaining real, quality datas can be challenging. Websites like Kaggle offer free public datasets for anyone to use. However, those datasets usually come pre-cleaned with minimal null values and organized columns and features. As a data science beginner, where’s the fun in working with a dataset that someone has already cleaned for you? If you want to improve your data scrubbing and cleaning techniques, there are two common ways to obtaining your own datasets from the web:

  1. Web scraping
  2. API requests

The end goal for both of the methods are the same: to extract content and data from a website. However, web scraping can be a little more intimidating for those who are new to programming as it requires a little more knowledge on HTML code. APIs on the other hand, are simpler to use as they give you access to the data directly through the use of an application or operating system. This blog will cover using the Python Requests module to retrieve sports data from the Sportradar developer website. Specifically, we will be retrieving play-by-play data for the 2021 Super Bowl LV.

Requests

The Requests module makes requesting data from websites in Python easy for human beings to understand. First off, if you have never used Requests before, you will need to install all the required packages. Run the following code in your preferred terminal or IDE:

$ python -m pip install requests

Once all the necessary packages have been installed, import the Requests module into your IDE using:

import requests

Requests has many request types such as post and delete. However, we will only be needing the get method. In order to “get” the data we are looking for, Requests first needs to know where to look. We need to pass the URL as a string in get.

url = 'http://api.sportradar.us/nfl/official/trial/v6/en/games/0e00303b-ee60-4cf4-ad68-48efbe53901d/pbp.json?api_key={YOUR_API_KEY}'r = requests.get(url)

In this case, our URL directs to the official Sportradar developer portal for NFL games. Sportradar offers access to their data for not only the 4 major North American sports, but also esports, tennis, and even collegiate level games. If you noticed in the url, {YOUR_API_KEY} will need to be replaced with your personal, private API key. In order to generate your own API key, we will need to segue into the next section.

Sportradar

Sportradar collects and provides data for over 80 different sports. By registering for a free trial account, you get access to your own developer portal and, most importantly, your own private API key to use at your discretion. Although there are different account tiers, the free trial account is all we need and doesn’t require any credit card or payment information to register. Once you register for an account, navigate to your account page and find your API keys. Pick your sport of choice (NFL if you are following along with this blog) and enter any name you’d like in the “Name of your application” box. Make sure “Issue a new key for NFL Trial”, or the sport of your choice, is selected and register for your brand new API key.

Now, in your API keys, you should see your private key under “NFL Trial: NFL Official Trial”. Keep this key private! If someone were to get their hands on your API key, this can result in your account being compromised. Treat it just like you would a password.

Before we move on to the API request call, it’s important to also notice the Key Rate Limits section of your API key. The free trial version allows up to 1 API call per second and up to 1,000 calls per month. Basically it means that when you use the Requests package, you cannot request data from the API more than 1 time per second (which shouldn’t be a problem for most) or 1000 times per month. This call limit is typically a non-issue for most but can quickly accumulate if you are not monitoring your call volume or rate of calls. If you are unsure, the View Report link will take you to a breakdown of all the calls you made using that API key, including dates and your total monthly call volume.

API Sandbox

To find our request URL, Sportradar has an incredibly handy interactive API sandbox that allows users to generate the desired URL. Find the section for Play-By-Play and you’ll findthat the basic URL structure is ‘nfl/official/trial/v6/:language_code/games/:game_id/pbp:format’. The URL requires three parameters: language_code, game_id, and format. Language_code is pretty straight forward and can be kept as is. The format of the output can either be in .xml (Extensible markup language) or .json (JavaScript Object Notation) and depends on your personal preference. For this example, we will be formatting our output in .json format. The tricky part is finding the desired game_id. To find the game_id for the Super Bowl, we need to scroll further down to the Schedule section. Set the season parameter to PST for post-season and press Try it!. This will generate a URL, along with the status code, headers, and body. Since the Super Bowl is the last game of the season, the game_id we are looking for should be at or near the end of the RESPONSE BODY.

The very last game in the response body is titled “Pro Bowl”. That won’t help us here so we can disregard that entry and keep searching. The game prior to that one has no identifiable “title” like the Pro Bowl, but it does tell us that it was played in Tampa and had the Kansas City Chiefs against the Tampa Bay Buccaneers. Bingo! Now all we need from this is the game_id, found under the “games” then “id”. Copy and paste the game ID into the Play-By-Play section :game_id parameter and click Try it! once more to generate your URL. Alternatively, you can construct your own URL now that you have the game ID by following the basic play-by-play URL mentioned earlier. Once we have the request URL, we can go back to our notebook and start requesting some data!

Requests Again

Going back to our code snippet from earlier, we can now enter the private API key. For those of you who attempted to construct your own URL or if you’re interested in other games, you can enter the variables separately and utilize Python f-strings in the URL. You can check the status code of your request (whether it failed or not) with the status_code method on your r request variable.

api_key = 'YOUR_PRIVATE_API_KEY'
game_id = '68d9ebf9-7005-435f-8eaf-90b1b6c6170f'
format_ = 'json'
url = f'http://api.sportradar.us/nfl/official/trial/v6/en/games/{game_id}/pbp.{format_}?api_key={api_key}'r = requests.get(url)
r.status_code
Output: 200 # if successful

The data you requested can now be accessed in text form using r.text or as a dictionary using r.json(). We can load this into a Pandas data frame using json_normalize.

import pandas as pd
pd.set_option('display.max_columns', None) # to see all 50 columns
json_object = r.json()
df = pd.json_normalize(json_object)
Resulting Data Frame

We are almost done obtaining our NFL play-by-play dataset. The actual data that we are interested in is now all contained within the periods column in our data frame. To grab just the data we want, we could simply use df['periods'] and save that series (and its further nested lists and dictionaries) as another data frame. json_normalize also takes a parameter record_path that specifies the path the function should follow to find the list of desired records.

df = pd.json_normalize(json_object, record_path=['periods'])

This data frame now shows four quarters as entries. To access the plays, further exploring into the pbp column and the events column within it is required. But that is a story for next time!

Wrap-Up

Congratulations! You just completed your first API request call. Now all that’s left is the scrubbing, cleaning, and exploring the data followed by modeling, reiterating, and deployment…and your project is done!

If you have any comments on this quick API tutorial, please leave one below. I’d love to know where I messed up, or made things unnecessarily difficult, and learn from more experienced API-ers!

Part II of this blog will cover cleaning and exploring the dataset we just obtained. See you then!

--

--