{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "Accessing Remote Resources\n", "==========================\n", "\n", "Web pages and data\n", "------------------\n", "\n", "I have mentioned before how one can access data files on your hard drive, but Python also allows you to access remote data, for example on the internet. The easiest way to do this is to use the [requests](https://pypi.python.org/pypi/requests) module. To start off, you just can get the URL:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "import requests\n", "\n", "response = requests.get('http://xkcd.com/353/')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "``response`` holds the response now. You can access the content as text via the text-property:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "print(response.text[:300]) # only print the first 300 characters" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You can either just use this information directly, or in some cases you might want to write it to a file. Let's download one of the full resolution files for the Ice coverage data from Problem Set 2:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "r2 = requests.get('http://wwwstaff.ari.uni-heidelberg.de/rschmidt/pycourse/ice_data/20060315.npy')" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "r2.headers" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false, "scrolled": true }, "outputs": [], "source": [ "r2.text[:200]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "However, this doesn't seem to be actual text. Instead, its a binary format. The binary data of the response can be accessed via" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "r2.content[:200]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Note the little ``b`` at the beginning indicating a binary byte-string." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now we can open a new (binary) file and download the data to the file." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "with open('20060315.npy', 'wb') as f:\n", " f.write(r2.content)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's now load and plot the data:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "import numpy as np\n", "data = np.load('20060315.npy')" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "import matplotlib.pyplot as plt\n", "%matplotlib inline\n", "\n", "plt.figure(figsize=(12,12))\n", "plt.imshow(data, origin='lower')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "APIs\n", "----\n", "\n", "Imagine that you want to access some data online. \n", "A number of websites now offer an \"Application programming interface\" (or API) which is basically a way of accessing data is a machine-readable way. \n", "An example for weather data is http://openweathermap.org/API " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "Let's however take an example from theoretical Astrophysics, which is the\n", "[Illustris simulation](http://www.illustris-project.org). *The routines we use below are presented and described on their web page.*\n", "\n", "Illustris is a cosmological simulation. It traces the positions and the formation of components such as dark matter, gas and stars across cosmic time from high-redshift to today. Extracting data at various cosmic times (=redshifts $z$), one can study, for instance, the gravitational collapse and merging history of clusters of galaxies and galaxies.\n", "\n", "We show here how to extract the dark matter particles inside haloes you can pick from the\n", "[illustris explorer](http://www.illustris-project.org/explorer) tool.\n", "\n", "Once we have picked a halo ID, we can download the data. For the access, we need the access key.\n", "This was generated for the use by this course (normally, every user has their own key.)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "headers={\"api-key\":\"2566d8dd2bcf9aefbb3d8b01080a877b\"} # 2566... = key for Uni HD python course" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "import requests\n", "r = requests.get('http://www.illustris-project.org/api/', headers=headers)\n", "print (r.text[:400])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This is much shorter, but still not ideal for reading into Python in this form. The output details the available simulations and the number of snapshots in a format called **JSON (\"JavaScript Object Notation\")**. Python includes a library to easily read in this data:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "import json\n", "data = json.loads(r.text)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "data" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In fact, such a dictionary can be obtained at every level in the data tree (try, e.g., the next one http://www.illustris-project.org/api/Illustris-1/ )" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now we pick a halo to study. Try finding halo 75 or 1030 in the Explorer for a first visualization.\n", "\n", "Specifications of data fields can be found [here](http://www.illustris-project.org/data/docs/specifications/)." ] }, { "cell_type": "code", "execution_count": 30, "metadata": { "collapsed": false }, "outputs": [], "source": [ "id = 75 # choose your ID of the halo 75 or 1030 are defaults\n", "redshift = 0.0 # choose a redshift, will be converted to snapshot automatically.\n", "params = {'dm':'Coordinates'}\n", "\n", "url = \"http://www.illustris-project.org/api/Illustris-1/snapshots/z=\" + str(redshift) + \"/subhalos/\" + str(id)\n", "\n", "# read the parameters of the subhalo in the json format\n", "subhalo = requests.get(url, headers=headers).json()\n", "\n", "# read the cutout of the subhalo into memory\n", "cutout = requests.get(url+\"/cutout.hdf5\", headers=headers, params=params)\n", "\n", "# extract the filename from the header information and build the output name\n", "filename= cutout.headers['content-disposition'].split(\"filename=\")[1]\n", "outname=filename.split(\".hdf5\")[0]+\"_z=\"+str(redshift)+\".hdf5\"\n", "\n", "# save the cutout to disk, use 'outname'\n", "with open(outname, 'wb') as f:\n", " f.write(cutout.content)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The variable subhalo is a dictionary:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "for key in sorted(subhalo):\n", " print (key,subhalo[key])" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "subhalo['len']" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The illustris data are in the [hdf5](https://support.hdfgroup.org/HDF5/) format. We need to write them to disk and\n", "reload them. For this, we need to import the h5py package and read the file using *h5py.File*\n", "\n", "The result is again a dictionary. We are here interested in the PartType1 key (dark matter), in particular the Coordinates." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "import matplotlib.pyplot as plt\n", "%matplotlib inline\n", "import numpy as np\n", "print (outname)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "import h5py\n", "\n", "with h5py.File(outname) as f:\n", " # we subtract the centre of the halo (extracted from the dictionary information in subhalo)\n", " # from the particle coordinates\n", " dx = f['PartType1']['Coordinates'][:,0] - subhalo['pos_x']\n", " dy = f['PartType1']['Coordinates'][:,1] - subhalo['pos_y']\n", " dz = f['PartType1']['Coordinates'][:,2] - subhalo['pos_z']\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Exercise\n", "\n", "Consider subhalo 75 or 1030 (choose) at the three redshifts z=0 (snapshot 135), z=0.5 (snapshot 103) and z=1 (snapshot 85). Note the mass of individual particl is $6.3 \\times 10^6 M_\\odot$ ($M_\\odot$ = solar mass).\n", "\n", "1. Look at the distribution of the mass particles.\n", " Determine the total dark matter mass inside R=10 kpc (ignore here the redshift and Hubble constant scaling of these units).\n", "\n", "2. Determine the \n", " density profiles $\\rho = \\frac{d N}{dV}$ for the halo at these three redshifts.\n", " $dN$ is the number of objects in the Volume $dV$.\n", "\n", " Can you find a redshift dependence?\n", "\n", " Remember: *plt.hist* can be used to build histograms.\n", "\n", " \n", "3. (bonus, with time) Fit the functional form\n", "\n", " $\\rho(r)=\\frac{A}{\\frac{r}{r_s} (1+\\frac{r}{r_s})^n}$\n", "\n", " with n=3 or n=4 to the density profiles. Choose \"sensible\" fitting intervals.\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ " \n", "\n", "\n", "\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Note:** To simplify things, the following function was provided by the illustris project.\n", "\n", "It takes a url as above and returns either the header information, or downloads binary data and returns the file name.\n", "\n", "(from the illustris documentation [here](http://www.illustris-project.org/data/docs/api/) )" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "def get(url, params=None):\n", " # make HTTP GET request to path\n", " headers = {\"api-key\":\"2566d8dd2bcf9aefbb3d8b01080a877b\"}\n", " r = requests.get(url, params=params, headers=headers)\n", "\n", " # raise exception if response code is not HTTP SUCCESS (200)\n", " r.raise_for_status()\n", "\n", " if r.headers['content-type'] == 'application/json':\n", " return r.json() # parse json responses automatically\n", "\n", " if 'content-disposition' in r.headers:\n", " filename = r.headers['content-disposition'].split(\"filename=\")[1]\n", " with open(filename, 'wb') as f:\n", " f.write(r.content)\n", " return filename # return the filename string\n", "\n", " return r" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ " " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Here is an **example** how the function **get(path)** can be used:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "id = 75\n", "snapshot = 135\n", "params = {'dm':'Coordinates'}\n", "\n", "url = \"http://www.illustris-project.org/api/Illustris-1/snapshots/\" + str(snapshot) + \"/subhalos/\" + str(id)\n", "sub = get(url) # get json response of subhalo properties\n", "saved_filename = get(url + \"/cutout.hdf5\",params) # get and save HDF5 cutout file\n", "\n", "with h5py.File(saved_filename) as f: # read coordinates\n", " dx = f['PartType1']['Coordinates'][:,0] - sub['pos_x']\n", " dy = f['PartType1']['Coordinates'][:,1] - sub['pos_y']\n", " dz = f['PartType1']['Coordinates'][:,2] - sub['pos_z']" ] } ], "metadata": { "anaconda-cloud": {}, "kernelspec": { "display_name": "Python [Root]", "language": "python", "name": "Python [Root]" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.5.2" } }, "nbformat": 4, "nbformat_minor": 0 }