Accessing Remote Resources

Web pages and data

I have mentioned before how one can access data files on your hard drive, but Python also allows you to access remote data, for example on the internet. The easiest way to do this is to use the requests module. To start off, you just can get the URL:

In [ ]:
import requests
response = requests.get('http://xkcd.com/353/')

response holds the response now. You can access the content as text via the text-property:

In [ ]:
print(response.text[:300])  # only print the first 300 characters

You can either just use this information directly, or in some cases you might want to write it to a file. Let's download just the image from the comic above:

In [ ]:
r2 = requests.get('https://imgs.xkcd.com/comics/python.png')
In [ ]:
r2.headers # You can see that 'Content-Type' is 'image/png'
In [ ]:
r2.text[:100] # The first few lines

However, this doesn't seem to be actual text. Instead, its a binary format. The binary data of the response can be accessed via

In [ ]:
r2.content[:100]

Note the \x89PNG at the beginning indicating a PNG-type binary byte-string. Most binary data start with a string describing the format of the file that is at least partially human-readable.

Now we can open a new (binary) file and download the data to the file.

In [ ]:
with open('downloaded_image.png', 'wb') as f:
    f.write(r2.content)

Let's now load and display the image. One way is to use matplotlib's image method.

In [ ]:
import matplotlib.pyplot as plt
%matplotlib inline
import matplotlib.image as mpimg
img = mpimg.imread('downloaded_image.png')
fig1 = plt.figure(figsize=(18, 16), dpi= 80, facecolor='w', edgecolor='k')
plt.imshow(img, cmap='gist_gray')

Another option is to use the Python Imaging Library (PIL) module which offers several standard procedures for image processing (e.g. blurring, sharpening, resizing):

In [ ]:
from PIL import Image
img2 = Image.open('downloaded_image.png')
fig2 = plt.figure(figsize=(18, 16), dpi= 80, facecolor='w', edgecolor='k')
plt.imshow(img2)

APIs

Imagine that you want to access some data online. A number of websites now offer an "Application programming interface" (or API) which is basically a way of accessing data is a machine-readable way. An example for weather data is http://openweathermap.org/API

For the access, we often need an access key. This is usually generated for you, e.g. if you want to access cloud services of one of the well-known providers. The following example tells you how that can be added for this particular API.

We will use the API provided by Openweathermap.org to get the current weather data for Heidelberg. Instructions on how to use this API are provided here.

In [ ]:
# We will import the weather report for Heidelberg (latitude = 49.407681, longitude = 8.69079 decimal degrees)

import requests
#r1=requests.get('http://samples.openweathermap.org/data/2.5/weather?lat=51.51,lon=-0.13&APPID=329bded0f436c203622bd75ca56dc93f')
r1 = requests.get('http://api.openweathermap.org/data/2.5/weather?lat=49.408&lon=8.691&APPID=329bded0f436c203622bd75ca56dc93f')
print (r1.json())

Or we can also search by city name. Let's try London.

In [ ]:
#r2 = requests.get('http://samples.openweathermap.org/data/2.5/weather?q=London,UK&APPID=329bded0f436c203622bd75ca56dc93f')
r2 = requests.get('http://api.openweathermap.org/data/2.5/weather?q=London,UK&APPID=329bded0f436c203622bd75ca56dc93f')
print (r2.json())

Another example of an organization that provides access to its (cloud based) archives via API is the Las Cumbres Observatory (LCO) which provides access to millions of astronomical images. Querying and downloading files requires python scripts if one wants to automatically search the archive. Scientific users are then provided with instructions on how to use the interface:

https://developers.lco.global/#data-format-definition

Saving and restoring data efficiently

As we have seen before, there are multiple ways of opening and processing data. You can, of course, always resort to writing data line by line to disk. In practice, there are multiple alternatives for writing python data to disk and some of them are actually more efficient than others.

First of all, when you are working with numpy arrays and structures you might want to consider using built-in function such as np.savetext

In [ ]:
# example from https://docs.scipy.org/doc/numpy/reference/generated/numpy.savetxt.html
import numpy as np
x = y = z = np.arange(0.0,5.0,1.0)
np.savetxt('test.out', x, delimiter=',')   # X is an array
np.savetxt('test.out', (x,y,z))   # x,y,z equal sized 1D arrays

Sometimes one wants to restore the exact, current state of a numpy array without actually writing all human-readable digits to disk. In order to achieve that, numpy comes with a dedicated numpy.save method. It permits to store the actual bits efficiently to disk without having to waste characters for doing that.

In [ ]:
x = np.arange(10)
with open('test.npy','wb') as fp:
    np.save(fp,x)
print(x)

with open('test.npy','rb') as fp2:
    y = np.load(fp2)
print(y)