Understanding P3P

Learn how P3P hopes to bring greater transparency to the way personal information is used over the Web.

The P Word

Privacy on the Web has always been an important bugbear for Internet users.

Used to be, most Web sites couldn't care less who you were, where you lived, or what your email address was. Then came the Internet boom, and suddenly everyone and his uncle was launching a Web site, a portal or an online store. And, in the war for eyeballs and clicks, your privacy suddenly became fair game - many Web sites started requiring users to provide detailed personal information before allowing them access, and using (sometimes even sharing) this information for targeted advertising (aka spam).

Faced with the potential loss of his privacy, and deluged with a constant barrage of banner ads and unsolicited commercial email, Joe Surfer hit back. The result: P3P.

P3P (not be confused with PGP or PHP) is an acronym for the Platform for Privacy Preferences Project, a W3c initiative aimed at improving privacy practices on the Web. It's goal, though lofty, is pretty simple: a clearly-defined, open standard that defines how personal information is collected and used over the Web.

Needless to say, this is harder than it sounds. Web sites, many of which require demographic data to sell ad space or decide business strategy, tend to get overbearing and pushy when it comes to asking for personal details, sometimes refusing access to their content unless the user first fills up a detailed questionnaire. And, at the other end of the spectrum, Web users are concerned about the loss of privacy that occurs when these sites play fast and loose with the personal information they have in their massive databases.

P3P attempts to provide a solution to the problem, by providing greater information to Web users about how Web sites handle their personal information. It addresses privacy concerns at two levels, providing Web sites with a standard way of defining and publishing their privacy policies, and providing Web users with a way to access these policies and make informed choices about releasing personal information to the requesting party.

Over the next few pages, I'll be taking a closer look at P3P, explaining its rationale and goals, how it works and the problems associated with it. I'll warn you at the outset itself that the P3P specification is still under development, so things may change over the next few months - however, the following material should be sufficient to explain the basics.

Private Thoughts

Currently, privacy policies (when they exist) tend to be written in non-standard ways - some sites publish extremely precise privacy policies, crammed with so much legalese and fine print that reading it makes your head hurt, while others favour the spartan approach, providing next to no information on how they use personal information. Some sites merely log each client request, with no specific user information collected, while others ask for demographic data or track user clicks to generate a user profile. Similarly, some sites save user information to provide better service to the user when (s)he comes back the next time, while others collect user information and share it with other agencies in either aggregate or individual form.

P3P attempts to bring some standards and structure to the party, enabling sites to clearly and effectively communicate to users exactly how the information they provide will be used, and leaving it to the user to decide how to proceed.

Typically, a P3P-compliant Web site creates and publishes a privacy policy, using standard P3P-defined constructs, and places it on its Web site. This policy specifies, in clear and simple terms, the type of information collected by the site during the user's visit, as well as how the site plans to use the information. When a P3P-enabled Web browser connects to the Web site, it first looks for the site's privacy policy, analyzes it and then, depending on whether or not the site's published policy matches the user's comfort level, consummates or aborts the transaction.

This isn't necessarily a perfect solution - it implies, for one thing, that a user needs to specify his or her personal privacy preferences before attempting to access any Web site - but it does have merits. It allows the user to be aware of how personal information is going to be used before submitting it, offers him or her greater control over the process, and, by forcing a site to make its privacy policies public, implies greater accountability and transparency than is currently prevailing.

It's important to note, though, that P3P does not provide any mechanism to enforce the statements made in a site's privacy policy. Its focus is more on communicating stated policy accurately, thereby allowing for more knowledgeable decisions on the part of the user, and less on verifying the implementation of the policy. Enforcement of a site's privacy policy has more to do with the current legal framework than with P3P. As the W3C's P3P FAQ clearly states, "... P3P is intended to be complementary to legislative and self-regulatory programs...there is no reason why P3P and legislation should be exclusionary of each other..." (P3P and Privacy FAQ, W3C, 06/2001)

As a W3C project that is likely to impact Web users across the planet, P3P is a pretty important effort. Consequently, the W3C's P3P Working Group has solicited input from a large number of organizations to ensure that the specification is balanced and fair to all parties involved. Contributors to this process include some of the world's largest corporations, including AT&T, Citibank, Microsoft, IBM, and HP, as well as privacy advocates like Trust, and Privacy Alliance and TRUSTe. As a result of all this input, P3P has taken a while to come to fruition...and the effort hasn't been helped by the rapid changes in XML-based technologies (P3P uses XML as its expression language), which have in turn necessitated changes to the P3P specification.

A Matter Of Policy

P3P is implemented via two types of files, both expressed in XML: a policy reference, and one or more policy statements. Each of these file types has a distinct and unique role to play in the P3P paradigm.

The policy reference file specifies the location of the site's P3P policy (or policies), and provides information on which sections of the site are covered by which policy. This policy reference file is usually placed in a standard location on the Web server - currently defined as /w3c/p3p.xml - and the P3P specification also allows for the location of this file to be specified within HTTP header responses or embedded as part of the URL reference within a hyperlink.

The real meat, though, lies in the policies specified within the policy reference file. These policies, which are again expressed using P3P-specific XML elements, contain detailed information on the type of information collected by the site, the manner in which it is used, the types of people who have access to it, and the period for which it is retained. It also provides information on the legal measures available to users who feel that their privacy has been violated, together with details of the remedies available.

In order to illustrate how this works, consider the following simple example of a policy reference file:

<meta xmlns="http://www.w3.org/2000/12/P3Pv1">
    <policy-references>
        <policy-ref about="/w3c/policy.xml#all">
            <include>/*</include>
        </policy-ref>
    </policy-references>
</meta>

This file specifies the name and location of the site's policy statement(s), enclosing each one within <policy-ref> tags. Within these tags, <include> and <exclude> tags are used to identify which areas of the site are covered by each policy. The example above specifies that the entire site is covered by a single policy, named "general.xml"; however, it's also possible to build a more complex policy reference file, as demonstrated by the next example:

<meta xmlns="http://www.w3.org/2000/12/P3Pv1">
<expiry max-age="604800" />
    <policy-references>

        <policy-ref about="/w3c/policy.xml#gen">
            <include>/*</include>
            <exclude>/account/*</include>
            <exclude>/feedback/*</include>
        </policy-ref>

        <policy-ref about="/w3c/policy.xml#account">
             <include>/account/*</include>
        </policy-ref>

        <policy-ref about="/w3c/policy.xml#feedback">
             <include>/feedback/*</include>
        </policy-ref>
    </policy-references>
</meta>

In this case, we have three different policies, each one covering a different area of the site. Note also the <expiry> element at the beginning of the file, which specifies how long the policies are valid (in this example, seven days).

When a user attempts to access a URL on a P3P-compliant site, a P3P-compatible Web browser (like Internet Explorer 6.0, which includes primitive P3P support) will first look for the policy reference file (either in the standard location, the location specified in the HTTP response header, or the location stated in the referring hyperlink) to find out which policy applies to that URL. The policy reference file, which maps a specific policy statement to a particular section of the site, provides the browser with the location of the policy statement; the browser can then read this statement, evaluate whether the user's privacy will be violated by accessing the URL, and make a decision on how to proceed.

So that's the policy reference file. Next, let's look at an actual policy statement.

Data Overload

Here's an example of a simple policy:

<policies>

<policy name="feedback" discuri="http://www.melonfire.com/w3c/feedback_policy.html">

    <!-- who's collecting the information? -->
    <entity>
        <data-group>
                <data ref="#corp.name">Melonfire</data>
                <data ref="#corp.email">melonfire@mail.com</data>
        </data-group>
    </entity>

    <!-- statement explaining the type of information collected, and why? -->
    <statement>
        <purpose><develop required="always" /></purpose>
        <consequence>Melonfire uses your feedback to improve its content quality. </consequence>
        <recipient><ours/></recipient>
        <retention><no-rentention /></retention>
        <data-group>
                <data ref="#visitor.name" optional="yes" />
                <data ref="#visitor.email" optional="no"/>
        </data-group>
    </statement>

    <!-- how much of it is shared with others? -->
    <access><none /></access>

    <!-- how are disputes resolved? -->
    <disputes-group>
        <disputes resolution-type="service" service="http://www.melonfire.com/cs/" short-description="Melonfire Customer Support">
        <remedies><correct/></remedies>
        </disputes>
    </disputes-group>

 </policy>

</policies>

This may look complicated, but it's actually pretty simple. The document is broken up into distinct sections, each one serving a particular purpose. Every policy begins and ends with <policy> tags; a single document may contain more than one policy, each one identified by a unique "name" attribute and a URL identifying the English-language version of the policy statement.

Within a policy, the <entity> section identifies the entity requesting the information (Melonfire), together with contact details. Next, the <statement> section explains why the information is being collected (in this case, for further development or improvement of the site), together with a list of the data elements collected (name and email address), how long they're stored for (not too long), and who uses it (the site owners only). The <access> element, which is mandatory, explains who has access to the data collected, while the <disputes-group> element provides information on the site's dispute resolution policy.

In case you're wondering where the element names and values come from, most of them are defined and explained in the P3P specification. I won't get into the details of all the options here - you should look at the P3P specification if you're interested - though I will tell you that the choices presented are quite exhaustive, enabling a Web service provider to describe a site's privacy policy in all relevant detail.

Off Target

While the premise of P3P is certainly intriguing, the standard has nevertheless come in for its fair share of flak. Critics argue that P3P is a toothless tiger, often missing the forest for the trees, and is not likely to make any significant contribution to the privacy debate. Here's why:

P3P does not provide any enforcement mechanism for the statements made in a site's privacy policy. Its focus is more on communicating stated policy accurately, thereby allowing for more informed decision-making by the user, and less on verifying the implementation of the policy.

Critics argue that policies which cannot be verified or enforced in any way are largely useless, and suggest that P3P should also address enforcement issues. Since this takes P3P into the legal arena (enforcement of privacy policies is closely tied to the laws prevailing in different legal jurisdictions), and introduces a whole new level of complexity, it's unlikely that this will happen any time soon.

Web sites which lack P3P-compliant policies will likely be blocked by P3P-aware Web browsers. Or, to put it another way, a Web site which lacks a privacy policy might have difficulty attracting eyeballs to its pages, merely because the user's browser denies access to the site. The solution: requiring sites to publish P3P policies, effectively making it mandatory. Critics argue that this imposition is unacceptable.
By requiring users to define their privacy "comfort level", and using this definition as the decision criteria for accessing Web sites, critics argue that P3P makes the Web experience more complex. This complexity could end up hindering, rather than helping, new users.

Despite these and other criticisms, the W3C is moving forward with P3P, expecting it to evolve further in the future, growing and expanding into a platform that meets all user requirements in an efficient and simple manner.

Endgame

While P3P is not yet an official W3C recommendation, the specification is almost final, with the most recent Working Draft published in September 2001. Consequently, a number of companies, most notably IBM, Microsoft and, have begun adding P3P support into upcoming products; you'll find early P3P support in Internet Explorer 6.0 and AT&T WorldNet browsers.

If you're interested in learning more about P3P, you might want to consider visiting the following links:

The P3P FAQ, at http://www.w3.org/P3P/p3pfaq.html

The W3C's official P3P specification, at http://www.w3.org/TR/2001/WD-P3P-20010928/

A W3C guide to defining and publishing a P3P-compliant privacy policy for your site, at http://www.w3.org/P3P/details.html

Examples and links to implementations of the P3P specification, at http://www.w3.org/P3P/implementations

And that's about it for the moment. I hope you found this article useful, and that it offered some broad insight into the why and how of P3P. Till next time...stay healthy!

Note: Examples are illustrative only, and are not meant for a production environment. YMMV!

This article was first published on30 Nov 2001.