Accessing Remote Resources

Web pages and data

I have mentioned before how one can access data files on your hard drive, but Python also allows you to access remote data, for example on the internet. The easiest way to do this is to use the requests module. To start off, you just can get the URL:

In [ ]:
import requests
In [ ]:
response = requests.get('http://xkcd.com/353/')

response holds the response now. You can access the content as text via the text-property:

In [ ]:
print(response.text[:300])  # only print the first 300 characters

You can either just use this information directly, or in some cases you might want to write it to a file. Let's download one of the full resolution files for the Ice coverage data from Problem Set 2:

In [ ]:
r2 = requests.get('http://wwwstaff.ari.uni-heidelberg.de/rschmidt/pycourse/ice_data/20060315.npy')
In [ ]:
r2.headers
In [ ]:
r2.text[:200]

However, this doesn't seem to be actual text. Instead, its a binary format. The binary data of the response can be accessed via

In [ ]:
r2.content[:200]

Note the little b at the beginning indicating a binary byte-string.

Now we can open a new (binary) file and download the data to the file.

In [ ]:
with open('20060315.npy', 'wb') as f:
    f.write(r2.content)

APIs

Imagine that you want to access some data online. A number of websites now offer an "Application programming interface" (or API) which is basically a way of accessing data is a machine-readable way. An example for weather data is http://openweathermap.org/API

Let's however take an example from theoretical Astrophysics, which is the Illustris simulation. The routines we use below are presented and described on their web page.

Illustris is a cosmological simulation. It traces the positions and the formation of components such as dark matter, gas and stars across cosmic time from high-redshift to today. Extracting data at various cosmic times (=redshifts $z$), one can study, for instance, the gravitational collapse and merging history of clusters of galaxies and galaxies.

We show here how to extract the dark matter particles inside haloes you can pick from the illustris explorer tool.

Once we have picked a halo ID, we can download the data. For the access, we need the access key. This was generated for the use by this course (normally, every user has their own key.)

In [ ]:
headers={"api-key":"2566d8dd2bcf9aefbb3d8b01080a877b"} # 2566... = key for Uni HD python course
In [ ]:
import requests
r = requests.get('http://www.illustris-project.org/api/', headers=headers)
print (r.text[:400])

This is much shorter, but still not ideal for reading into Python in this form. The output details the available simulations and the number of snapshots in a format called JSON ("JavaScript Object Notation"). The response object includes a method to read in this data:

In [ ]:
r.json()

In fact, such a dictionary can be obtained at every level in the data tree (try, e.g., the next one http://www.illustris-project.org/api/Illustris-1/ )

Now we pick a halo to study. Try finding halo 75 or 1030 in the Explorer for a first visualization.

Specifications of data fields can be found here.

In [ ]:
id   = 75          # choose your ID of the halo    75 or 1030 are defaults
snap = 135         # choose a snapshot: snap=135 cooresponds to z=0; snap=103 to z=0.5; snap=85 to z=1

params = {'dm':'Coordinates'}  # this downloads only dark matter particles
url = "http://www.illustris-project.org/api/Illustris-1/snapshots/"+str(snap)+"/subhalos/" + str(id)

# download the cutout of the subhalo into memory
cutout  = requests.get(url+"/cutout.hdf5", headers=headers, params=params)

# define the output filename and save data locally
outname='halo'+str(id)+'_snap'+str(snap)+".hdf5"
with open(outname, 'wb') as f:
    f.write(cutout.content)

The illustris data are in the hdf5 format. We need to write them to disk and reload them. For this, we need to import the h5py package and read the file using h5py.File

The result is again a dictionary. We are here interested in the PartType1 key (dark matter), in particular the Coordinates.

In [ ]:
import matplotlib.pyplot as plt
import numpy as np
In [ ]:
print (outname)
In [ ]:
import h5py

with h5py.File(outname) as f:
    # we subtract the centre of the halo (extracted from the dictionary information in subhalo)
    # from the particle coordinates
    x = f['PartType1']['Coordinates'][:,0]
    y = f['PartType1']['Coordinates'][:,1] 
    z = f['PartType1']['Coordinates'][:,2] 

    dx = x - np.mean(x)   # coordinates with respect to mean coordinate
    dy = y - np.mean(y)
    dz = z - np.mean(z)

Exercise

Consider subhalo 75 or 1030 (choose) at the three cosmic times: today=snapshot 135, snapshot 103 (universe 2/3 the size of today) and snapshot 85 (universe 1/2 the size of today). Note the mass of individual particl is $6.3 \times 10^6 M_\odot$ ($M_\odot$ = solar mass).

  1. Plot the distribution of the mass particles. Determine the total dark matter mass inside R=10 kpc.

  2. Determine the density profiles $\rho = \frac{d N}{dV}$ for the halo at these three redshifts. $dN$ is the number of objects in the Volume $dV$.

    Can you find a dependence on the size of the universe / cosmic time?

    Remember: plt.hist (and also np.histogram) can be used to build histograms.

  1. (bonus, with time) Fit the functional form

    $\rho(r)=\frac{A}{\frac{r}{r_s} (1+\frac{r}{r_s})^n}$

    with n=3 or n=4 to the density profiles. Choose "sensible" fitting intervals.

Note: To simplify things, the following function was provided by the illustris project.

It takes a url as above and returns either the header information, or downloads binary data and returns the file name.

(from the illustris documentation here )

In [ ]:
def get(url, params=None):
    # make HTTP GET request to path
    headers = {"api-key":"2566d8dd2bcf9aefbb3d8b01080a877b"}
    r = requests.get(url, params=params, headers=headers)

    # raise exception if response code is not HTTP SUCCESS (200)
    r.raise_for_status()

    if r.headers['content-type'] == 'application/json':
        return r.json() # parse json responses automatically

    if 'content-disposition' in r.headers:
        filename = r.headers['content-disposition'].split("filename=")[1]
        with open(filename, 'wb') as f:
            f.write(r.content)
        return filename # return the filename string

    return r

Here is an example how the function get(path) can be used:

In [ ]:
id = 75
snapshot = 135
params = {'dm':'Coordinates'}

url = "http://www.illustris-project.org/api/Illustris-1/snapshots/" + str(snapshot) + "/subhalos/" + str(id)
sub = get(url)                                    # get json response of subhalo properties
saved_filename = get(url + "/cutout.hdf5",params) # get and save HDF5 cutout file

with h5py.File(saved_filename) as f:              # read coordinates
    dx = f['PartType1']['Coordinates'][:,0] - sub['pos_x']
    dy = f['PartType1']['Coordinates'][:,1] - sub['pos_y']
    dz = f['PartType1']['Coordinates'][:,2] - sub['pos_z']