analitics

Pages

Sunday, October 2, 2016

The python CacheControl module - part 001.

This tutorials series want to be a better approach to understand the several mechanisms that HTTP provides for web cache validation. Let's start with the first part.
You can install it with pip
C:\>cd Python27
C:\Python27>cd Scripts
C:\Python27\Scripts>pip install cachecontrol
Collecting cachecontrol
  Downloading CacheControl-0.11.7.tar.gz
Requirement already satisfied (use --upgrade to upgrade): 
requests in c:\python27\lib\site-packages (from cachecontrol)
Building wheels for collected packages: cachecontrol
  Running setup.py bdist_wheel for cachecontrol ... done
  Stored in directory: C:\Users\GeorgeCatalin\AppData\Local\pip\\
Cache\wheels\9b\94\d2\1793b004461b5bc238a89e260cd2b9f770437c42424fdd0943
Successfully built cachecontrol
Installing collected packages: cachecontrol
Successfully installed cachecontrol-0.11.7
First test come with the default example and show all with the text.
import requests
from cachecontrol import CacheControl
sess = requests.session()
cached_sess = CacheControl(sess)
response = cached_sess.get('http://google.com')
print response

print response.text
...
The requests python module is an Apache2 Licensed HTTP library to allow you to send HTTP/1.1 requests.
This help you to add headers, form data, multipart files, and parameters with simple
Python dictionaries, and access the response data in the same way.

The theory part.
You can use CacheControl with the basic wrapper way or via a requests Transport Adapter.
The Transport Adapters provide a mechanism to define interaction methods for an HTTP service.
The code will come with this template (docs example):
sess = requests.Session()
sess.mount('http://', CacheControlAdapter())
This mean the CacheControl assumes you are using a requests.Session for your requests.
So the Transport Adapter will cover the HTTPCore and WSGICore.
Now, both (the wrapper and adapter classes) allow providing a custom cache store object.
This is used for storing your cached data.
The next step will be
from cachecontrol.caches import FileCache
sess = CacheControl(requests.Session(),
                    cache=FileCache('.webcache'))
The result will create a directory called .webcache and store a file for each cached request.
Also the CacheControl python module comes with a few storage backends for storing your cache objects.
First is DictCache is the default cache, next is FileCache is similar to the caching mechanism provided by httplib2 and the last is RedisCache uses a Redis database to store values.
One note about requesting the filecache extra can use dependency with: pip install cachecontrol[filecache].
The CacheControl’s support of ETags by returns a response with the appropriate If-None-Match header.
Seem the ETag support only takes effect when the time has expired.
The ETag or entity tag, is part of HTTP, the protocol for the World Wide Web and provides for web cache validation. You can also take a look at Hypertext Transfer Protocol (HTTP/1.1): Caching.
The documentation of cachecontrol python module tells us:
Caching is hard! It is considered one of the great challenges of computer science.
Yes! you can agree with that, because some parts need to be understand well.
This issues: Timezones, Cached Responses and Query String Params are the most important parts.

Any info about this issue will be grea, just put your comments.