The Art of Integrating Plone with Webservices
Also available in presentation mode…
What is a web service?
- Lets code running on one machine (the client) interact with code on another (the server)
- Transports messages over HTTP
Why web services?
- Often easier to integrate than to build your own
- Combine best-of-breed tools
Major categories
- RPC-style / Service-oriented architecture
- Focus is on the action being performed.
- REST-style / Resource-oriented architecture
- Focus is on the object being acted upon.
XML-RPC
-
Passes messages in a simple XML-based format.
Sample request:
<?xml version="1.0"?>
<methodCall>
<methodName>examples.getStateName</methodName>
<params>
<param>
<value><i4>40</i4></value>
</param>
</params>
</methodCall>Sample response:
<?xml version="1.0"?>
<methodResponse>
<params>
<param>
<value><string>South Dakota</string></value>
</param>
</params>
</methodResponse> -
Support in the Python stdlib: xmlrpclib
Example: Querying PyPI
>>> import xmlrpclib
>>> from pprint import pprint
>>> client = xmlrpclib.ServerProxy('http://pypi.python.org/pypi')
>>> client.release_urls('Plone', '4.0.1')
>>> pprint(client.release_urls('Plone', '4.0.1'))
[{'comment_text': '',
'downloads': 177,
'filename': 'Plone-4.0.1.zip',
'has_sig': False,
'md5_digest': 'be72596d49295b7207f0a861ee3530ed',
'packagetype': 'sdist',
'python_version': 'source',
'size': 1507065,
'upload_time': <DateTime '20101004T02:30:01' at 10071a248>,
'url': 'http://pypi.python.org/packages/source/P/Plone/Plone-4.0.1.zip'}]
More info: http://wiki.python.org/moin/PyPiXmlRpc
Example: wsapi4plone
Provides an XML-RPC interface for interacting with a Plone site.
- post_object
- put_object
- get_object
- delete_object
- query
- get_schema
- get_types
- get_workflow
- set_workflow
- get_discussion
More info: http://pypi.python.org/pypi/wsapi4plone.core
SOAP
- "big Web Services" — described by various WS-* W3C standards
- passes XML-based messages like XML-RPC, but more complicated (can represent complex types)
- WSDL (web service description language) — machine-readable XML description of the interface
- In Python:
- soaplib
- suds
Sample Request and Response
Sample request:
<?xml version="1.0" encoding="UTF-8"?>
<SOAP-ENV:Envelope xmlns:ns0="http://cicero.azavea.com/"
xmlns:ns1="http://schemas.xmlsoap.org/soap/envelope/"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns:SOAP-ENV="http://schemas.xmlsoap.org/soap/envelope/">
<SOAP-ENV:Header/>
<ns1:Body>
<ns0:GetOfficialsByAddress>
<ns0:authToken>FOO</ns0:authToken>
<ns0:address>1402 3rd Ave</ns0:address>
<ns0:city>Seattle</ns0:city>
<ns0:state>WA</ns0:state>
<ns0:postalCode>98101</ns0:postalCode>
<ns0:country>US</ns0:country>
<ns0:districtType>NATIONAL_UPPER</ns0:districtType>
<ns0:includeAtLarge>false</ns0:includeAtLarge>
</ns0:GetOfficialsByAddress>
</ns1:Body>
</SOAP-ENV:Envelope>
Sample response:
<?xml version="1.0" encoding="utf-8"?>
<soap:Envelope xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns:xsd="http://www.w3.org/2001/XMLSchema">
<soap:Body>
<GetOfficialsByAddressResponse xmlns="http://cicero.azavea.com/">
<GetOfficialsByAddressResult>
<ElectedOfficialInfo>
<ElectedOfficialID>326f9123-4196-49ff-a9ab-cca8194a12a8</ElectedOfficialID>
<AssemblyName />
(snip)
<FirstName>Maria</FirstName>
<MiddleInitial>E.</MiddleInitial>
<LastName>Cantwell</LastName>
(snip)
<LastUpdateDate>2009-03-26T00:00:00</LastUpdateDate>
</ElectedOfficialInfo>
</GetOfficialsByAddressResult>
</GetOfficialsByAddressResponse>
</soap:Body>
</soap:Envelope>
suds example: Azavea's Cicero API
Azavea provides a web service to look up information about elected officials for a given address:
>>> from suds.client import Client
>>> auth_client = Client('http://cicero.azavea.com/Azavea.Cicero.WebService.v2/AuthenticationService.asmx?WSDL')
>>> token = auth_client.service.GetToken(username, password)
# printing a client lists its service's available methods
>>> client = Client('http://cicero.azavea.com/Azavea.Cicero.WebService.v2/ElectedOfficialQueryService.asmx?WSDL')
>>> print client
Suds ( https://fedorahosted.org/suds/ ) version: 0.4 GA build: R699-20100913
Service ( ElectedOfficialQueryService ) tns="http://cicero.azavea.com/"
Prefixes (2)
ns0 = "http://cicero.azavea.com/"
ns1 = "http://microsoft.com/wsdl/types/"
Ports (2):
(ElectedOfficialQueryServiceSoap)
Methods (11):
GetOfficialsByAddress(xs:string authToken, xs:string address, xs:string city, xs:string state, xs:string postalCode, xs:string country, xs:string districtType, xs:boolean includeAtLarge, )
(snip)
>>> officials = client.service.GetOfficialsForAddress(token, '1402 3rd Ave', 'Seattle', 'WA', '98101', 'US', 'NATIONAL_UPPER', True)
>>> officials.ElectedOfficialInfo[0].FirstName
'Maria'
>>> officials.ElectedOfficialInfo[0].LastName
'Cantwell'
RESTful APIs
- reaction to "big web services"
- resource-oriented
- encourages direct use of features of HTTP (request methods, passing parameters in query string, caching, etc.)
- response representations may vary. XML and JSON are common.
- in Python:
- urllib/urllib2 for transfer (stdlib)
- ElementTree (stdlib), lxml, or some other XML library to parse XML
- json (stdlib) to parse JSON
Example: Brown Paper Tickets API
Brown Paper Tickets provides an API for listing and registering for events.
Sample Response:
<?xml version="1.0"?>
<document>
<result>success</result>
<resultcode>000000</resultcode>
<note></note>
<event>
<title>My Event</title>
<link>http://www.brownpapertickets.com/event/120141</link>
<description>blah blah blah</description>
<event_id>120141</event_id>
<live>y</live>
(snip)
</event>
</document>
We can make a request to this service using urllib and parse the response using ElementTree:
>>> from urllib import urlencode, urlopen
>>> from xml.etree import ElementTree
>>> url = 'https://www.brownpapertickets.com/api2/eventlist?id=foo&client=bar'
>>> res = urlopen(url).read()
>>> tree = ElementTree.fromstring(res)
>>> for node in tree.findall('event'):
... title = node.find('title').text
Authentication
- Token or API key in request
- HTTP basic authentication
- AuthSub & OAuth (requires callback to receive token)
What could go wrong
- call times out
- call fails
- call succeeds but other code fails
- ZODB conflict errors
- long external calls tie up publisher threads
Timeouts & Deadlocks
- Python's default socket timeout is None (forever), which is pathological
- Can override with socket.setdefaulttimeout(), then catch socket.timeout
- But note that it is a global setting, not per-thread
Transactions
- In Zope, we're used to transactions being handled automatically:
- new transaction for each request
- resource manager exists for common resources
- two-phase commit ensures atomicity
- But web services are not generally transactional
- Can do the non-transactional (e.g. web service) calls last:
- If something local fails, exception will cause transaction to abort
- Web service call never happens
- Can use transaction.addAfterCommitHook for this
- What if we need to write something locally based on a response from a web service? (e.g. payment authorization)
- If something fails _after_ the web service call, the transaction will abort but the web service call can't be undone
- Workaround: Catch exceptions and make a new webservice call to undo the effect of the first (but what if _that_ fails?)
- Workaround: Catch exceptions and log them (make sure your logging is foolproof!)
- Workaround: Use an asynchronous task system like zc.async to queue the second part as a separate job that can be retried if it fails
ConflictErrors
Occurs when connection A tries to commit changes to an object that was modified by another transaction (from connection B) since the object was loaded by connection A.
Even worse variant of the last case:
- Web service call succeeds
- Write to ZODB fails with a ConflictError
- ZPublisher RETRIES THE REQUEST -- and the web service gets called again
- Much weeping and gnashing of teeth
This is exacerbated by the fact that remote web service calls tend to be slow, which makes transactions last longer and increases the risk of conflicts.
Possible ways to mitigate:
- Set the request's retry_max_count attribute to 0 (conflicting requests will fail hard instead of getting retried silently, but sometimes that's better)
- Handle remote calls as zc.async jobs, so they take place in a separate transaction
Tools and techniques
Maintaining a pool of clients
- Generally want one client object per thread to store session token, avoid reinitializing
- Store in foreign_connections dictionary, an attribute of the ZODB connection
- Use _v_ attributes only as a fallback
- See "How This Package Maintains Persistent Connections" at http://pypi.python.org/pypi/alm.solrindex
Server-side caching
- Goal: improve performance or API usage by cutting out unnecessary web service requests
- plone.memoize provides various decorators to cache method results in different ways
- on the request object
- in an object attribute
- in an object volatile attribute
- in a global RAM cache
- Can use something like time.time() // 3600 in the cache key to expire after no more than some given interval.
- But remember there are only two hard problems in computer science: Naming things, cache invalidation, and off-by-one errors.
Asynchronous Loading
If you're fetching something from a remote server for display as part of a web page, load it in a separate request after the main page loads.
Advantages:
- Keeps the site from being perceived as slow
- Separates long-running remote calls from write transactions, so the risk of ConflictErrors is reduced.
- Sometimes can load directly from the external service to Javascript instead of hitting your server.
Example: collective.googleanalytics
JQuery makes it easy:
jq(function () {
jq('#analytics--1027659344').load('http://davisagli.com/blog/plone-4-in-the-news/@@analytics_async', {
'report_ids': 'page-pageviews-sparkline,page-top-keywords-table',
'profile_ids': 'ga:31264872',
'request_url': 'http://davisagli.com/blog/plone-4-in-the-news',
'date_range': 'month'
}, function () {
jq('#analytics--1027659344').css({
'background-image': 'none',
'height': 'auto'
});
});
});
Asynchronous Processing
plone.async.core makes it easy to queue asynchronous jobs to run via the zc.async infrastructure.
As discussed above, this can be used to avoid blocking the ZPublisher threads and keeping transactions open when long-running external calls are needed.
Simple usage example:
from Products.Five import BrowserView
import zc.async.job
from plone.async.core import getQueues
from time import sleep
def do_work(MAX):
for x in xrange(1,MAX):
sleep(0.1)
print 'done!'
class Work(BrowserView):
def __call__(self):
queue = getQueues()['']
job = zc.async.job.Job(do_work, 100)
queue.put(job)
# returns immediately; job runs in another thread
Testing Web Services
Some approaches:
- use a real connection (full system test)
- connect to a stub server (for an example, see zc.authorizedotnet)
- inject responses to particular calls (unit test)
Serving webservices from Zope
- XML-RPC
- native support in Zope 2
- overview at http://plone.org/documentation/manual/plone-community-developer-documentation/serving/xmlrpc
- SOAP
- good overview at http://plone.org/documentation/kb/soap-support-for-plone
- soaplib
- z3c.soap
- ZSI (Zolera SOAP Infrastructure)
- etc.
- write browser views which serialize to JSON or other representations