Accessing Remote Resources

Web pages and data

I have mentioned before how one can access data files on your hard drive, but Python also allows you to access remote data, for example on the internet. The easiest way to do this is to use the requests module. To start off, you just can get the URL:

In [ ]:
import requests

response = requests.get('http://xkcd.com/353/')

response holds the response now. You can access the content as text via the text-property:

In [ ]:
print(response.text[:300])  # only print the first 300 characters

You can either just use this information directly, or in some cases you might want to write it to a file. Let's download one of the full resolution files for the Ice coverage data from Problem Set 2:

In [ ]:
r2 = requests.get('http://wwwstaff.ari.uni-heidelberg.de/rschmidt/pycourse/ice_data/20060315.npy')
In [ ]:
r2.headers
In [ ]:
r2.text[:200]

However, this doesn't seem to be actual text. Instead, its a binary format. The binary data of the response can be accessed via

In [ ]:
r2.content[:200]

Note the little b at the beginning indicating a binary byte-string.

Now we can open a new (binary) file and download the data to the file.

In [ ]:
with open('20060315.npy', 'wb') as f:
    f.write(r2.content)

Let's now load and plot the data:

In [ ]:
import numpy as np
data = np.load('20060315.npy')
In [ ]:
import matplotlib.pyplot as plt
%matplotlib inline

plt.figure(figsize=(12,12))
plt.imshow(data, origin='lower')

APIs

Imagine that you want to access some data online. A number of websites now offer an "Application programming interface" (or API) which is basically a way of accessing data is a machine-readable way. An example for weather data is http://openweathermap.org/API

Let's however take an example from theoretical Astrophysics, which is the Illustris simulation. The routines we use below are presented and described on their web page.

Illustris is a cosmological simulation. It traces the positions and the formation of components such as dark matter, gas and stars across cosmic time from high-redshift to today. Extracting data at various cosmic times (=redshifts $z$), one can study, for instance, the gravitational collapse and merging history of clusters of galaxies and galaxies.

We show here how to extract the dark matter particles inside haloes you can pick from the illustris explorer tool.

Once we have picked a halo ID, we can download the data. For the access, we need the access key. This was generated for the use by this course (normally, every user has their own key.)

In [ ]:
headers={"api-key":"2566d8dd2bcf9aefbb3d8b01080a877b"} # 2566... = key for Uni HD python course
In [ ]:
import requests
r = requests.get('http://www.illustris-project.org/api/', headers=headers)
print (r.text[:400])

This is much shorter, but still not ideal for reading into Python in this form. The output details the available simulations and the number of snapshots in a format called JSON ("JavaScript Object Notation"). Python includes a library to easily read in this data:

In [ ]:
import json
data = json.loads(r.text)
In [ ]:
data

In fact, such a dictionary can be obtained at every level in the data tree (try, e.g., the next one http://www.illustris-project.org/api/Illustris-1/ )

Now we pick a halo to study. Try finding halo 75 or 1030 in the Explorer for a first visualization.

Specifications of data fields can be found here.

In [30]:
id = 75                # choose your ID of the halo    75 or 1030 are defaults
redshift = 0.0         # choose a redshift, will be converted to snapshot automatically.
params = {'dm':'Coordinates'}

url = "http://www.illustris-project.org/api/Illustris-1/snapshots/z=" + str(redshift) + "/subhalos/" + str(id)

# read the parameters of the subhalo in the json format
subhalo = requests.get(url, headers=headers).json()

# read the cutout of the subhalo into memory
cutout  = requests.get(url+"/cutout.hdf5", headers=headers, params=params)

# extract the filename from the header information and build the output name
filename= cutout.headers['content-disposition'].split("filename=")[1]
outname=filename.split(".hdf5")[0]+"_z="+str(redshift)+".hdf5"

# save the cutout to disk, use 'outname'
with open(outname, 'wb') as f:
    f.write(cutout.content)

The variable subhalo is a dictionary:

In [ ]:
for key in sorted(subhalo):
    print (key,subhalo[key])
In [ ]:
subhalo['len']

The illustris data are in the hdf5 format. We need to write them to disk and reload them. For this, we need to import the h5py package and read the file using h5py.File

The result is again a dictionary. We are here interested in the PartType1 key (dark matter), in particular the Coordinates.

In [ ]:
import matplotlib.pyplot as plt
%matplotlib inline
import numpy as np
print (outname)
In [ ]:
import h5py

with h5py.File(outname) as f:
    # we subtract the centre of the halo (extracted from the dictionary information in subhalo)
    # from the particle coordinates
    dx = f['PartType1']['Coordinates'][:,0] - subhalo['pos_x']
    dy = f['PartType1']['Coordinates'][:,1] - subhalo['pos_y']
    dz = f['PartType1']['Coordinates'][:,2] - subhalo['pos_z']

Exercise

Consider subhalo 75 or 1030 (choose) at the three redshifts z=0 (snapshot 135), z=0.5 (snapshot 103) and z=1 (snapshot 85). Note the mass of individual particl is $6.3 \times 10^6 M_\odot$ ($M_\odot$ = solar mass).

  1. Look at the distribution of the mass particles. Determine the total dark matter mass inside R=10 kpc (ignore here the redshift and Hubble constant scaling of these units).

  2. Determine the density profiles $\rho = \frac{d N}{dV}$ for the halo at these three redshifts. $dN$ is the number of objects in the Volume $dV$.

    Can you find a redshift dependence?

    Remember: plt.hist can be used to build histograms.

  1. (bonus, with time) Fit the functional form

    $\rho(r)=\frac{A}{\frac{r}{r_s} (1+\frac{r}{r_s})^n}$

    with n=3 or n=4 to the density profiles. Choose "sensible" fitting intervals.

Note: To simplify things, the following function was provided by the illustris project.

It takes a url as above and returns either the header information, or downloads binary data and returns the file name.

(from the illustris documentation here )

In [ ]:
def get(url, params=None):
    # make HTTP GET request to path
    headers = {"api-key":"2566d8dd2bcf9aefbb3d8b01080a877b"}
    r = requests.get(url, params=params, headers=headers)

    # raise exception if response code is not HTTP SUCCESS (200)
    r.raise_for_status()

    if r.headers['content-type'] == 'application/json':
        return r.json() # parse json responses automatically

    if 'content-disposition' in r.headers:
        filename = r.headers['content-disposition'].split("filename=")[1]
        with open(filename, 'wb') as f:
            f.write(r.content)
        return filename # return the filename string

    return r

Here is an example how the function get(path) can be used:

In [ ]:
id = 75
snapshot = 135
params = {'dm':'Coordinates'}

url = "http://www.illustris-project.org/api/Illustris-1/snapshots/" + str(snapshot) + "/subhalos/" + str(id)
sub = get(url)                                    # get json response of subhalo properties
saved_filename = get(url + "/cutout.hdf5",params) # get and save HDF5 cutout file

with h5py.File(saved_filename) as f:              # read coordinates
    dx = f['PartType1']['Coordinates'][:,0] - sub['pos_x']
    dy = f['PartType1']['Coordinates'][:,1] - sub['pos_y']
    dz = f['PartType1']['Coordinates'][:,2] - sub['pos_z']