analitics

Pages

Monday, October 3, 2016

The python CacheControl module - part 002.

Today was a hard day and this is the reason I make this short tutorial.
Teory of HTTP:
HTTP specifies four response cache headers that you can set to enable caching:
  • Cache-Control
  • Expires
  • ETag
  • Last-Modified
These four headers are used to help cache your responses into two different models:
  • Expiration Caching - used to cache your entire response for a specific amount of time (e.g. 24 hours), simple, but cache invalidation is more difficult;
  • Validation Caching - this is more complex and used to cache your response, but allows you to dynamically invalidate it as soon as your content changes.
First you need to know about this code is a raw example about how we can access cache of the page.
Come with a simple class named DictCache. You can named with any name and is a BaseCache class.
The next step I make is to show you how can access it.
One simpe way is to see the page - first session.
The complex come when you need to access for example data and info like:
 'adapters', 'auth', 'cert', 'close', 'cookies', 'delete', 'get', 'get_adapter', 'head', 'headers', 'hooks', 'max_redirects', 'merge_environment_settings', 'mount', 'options', 'params', 'patch', 'post', 'prepare_request', 'proxies', 'put', 'rebuild_auth', 'rebuild_method', 'rebuild_proxies', 'redirect_cache', 'request', 'resolve_redirects', 'send', 'stream', 'trust_env', 'verify'
And this is come with teh second session from this source code:

import requests
from cachecontrol import CacheControl
from cachecontrol.cache import BaseCache

class DictCache(BaseCache):

    def __init__(self, init_dict=None):
        self.data = init_dict or {}

    def get(self, key):
        return self.data.get(key, None)

    def set(self, key, value):
        self.data.update({key: value})

    def delete(self, key):
        self.data.pop(key)

print "first session requests"
sess = requests.session()
cached_sess = CacheControl(sess)
response = cached_sess.get('http://google.com')
print '=================='
print 'see page by add this: print response.text'
print '=================='
print "second session BaseCache"
sess2 = requests.session()
base=DictCache(sess2)
print '=================='
print "dir(base)"
print dir(base)
print '=================='
print"dir(base.data)"
print dir(base.data)
print '=================='
print"base.data.max_redirects"
print base.data.max_redirects
print '=================='

Sunday, October 2, 2016

The python CacheControl module - part 001.

This tutorials series want to be a better approach to understand the several mechanisms that HTTP provides for web cache validation. Let's start with the first part.
You can install it with pip
C:\>cd Python27
C:\Python27>cd Scripts
C:\Python27\Scripts>pip install cachecontrol
Collecting cachecontrol
  Downloading CacheControl-0.11.7.tar.gz
Requirement already satisfied (use --upgrade to upgrade): 
requests in c:\python27\lib\site-packages (from cachecontrol)
Building wheels for collected packages: cachecontrol
  Running setup.py bdist_wheel for cachecontrol ... done
  Stored in directory: C:\Users\GeorgeCatalin\AppData\Local\pip\\
Cache\wheels\9b\94\d2\1793b004461b5bc238a89e260cd2b9f770437c42424fdd0943
Successfully built cachecontrol
Installing collected packages: cachecontrol
Successfully installed cachecontrol-0.11.7
First test come with the default example and show all with the text.
import requests
from cachecontrol import CacheControl
sess = requests.session()
cached_sess = CacheControl(sess)
response = cached_sess.get('http://google.com')
print response

print response.text
...
The requests python module is an Apache2 Licensed HTTP library to allow you to send HTTP/1.1 requests.
This help you to add headers, form data, multipart files, and parameters with simple
Python dictionaries, and access the response data in the same way.

The theory part.
You can use CacheControl with the basic wrapper way or via a requests Transport Adapter.
The Transport Adapters provide a mechanism to define interaction methods for an HTTP service.
The code will come with this template (docs example):
sess = requests.Session()
sess.mount('http://', CacheControlAdapter())
This mean the CacheControl assumes you are using a requests.Session for your requests.
So the Transport Adapter will cover the HTTPCore and WSGICore.
Now, both (the wrapper and adapter classes) allow providing a custom cache store object.
This is used for storing your cached data.
The next step will be
from cachecontrol.caches import FileCache
sess = CacheControl(requests.Session(),
                    cache=FileCache('.webcache'))
The result will create a directory called .webcache and store a file for each cached request.
Also the CacheControl python module comes with a few storage backends for storing your cache objects.
First is DictCache is the default cache, next is FileCache is similar to the caching mechanism provided by httplib2 and the last is RedisCache uses a Redis database to store values.
One note about requesting the filecache extra can use dependency with: pip install cachecontrol[filecache].
The CacheControl’s support of ETags by returns a response with the appropriate If-None-Match header.
Seem the ETag support only takes effect when the time has expired.
The ETag or entity tag, is part of HTTP, the protocol for the World Wide Web and provides for web cache validation. You can also take a look at Hypertext Transfer Protocol (HTTP/1.1): Caching.
The documentation of cachecontrol python module tells us:
Caching is hard! It is considered one of the great challenges of computer science.
Yes! you can agree with that, because some parts need to be understand well.
This issues: Timezones, Cached Responses and Query String Params are the most important parts.

Any info about this issue will be grea, just put your comments.