A common question asked by clients is “how do I create segments for anonymous visitors?”. While tracking anonymous visitors and calculating appropriate segments is a deep subject worthy of its own blog post (check back later!), one must first deal with “how do we uniquely identify a returning visitor who wants to remain anonymous?” before we deal with how to calculate segments for this individual. This last point is made even more complex when we acknowledge that visitors may elect to turn off cookies, remove cookies, mobile devices don’t allow cookies, or that there may even be local or regional laws (e.g. the European Union) that deny cookie usage in the first place.
While fingerprinting technology is fairly nascent, there are already some sites using the technology to identify anonymous visitors. The basic idea is straightforward: there are various characteristics of each computer that taken as a whole tend to make it unique. And these device characteristics are generally available via any modern browser.
Examples of such characteristics include (but are not limited to):
|System fonts||Mac, Linux, Windows, and mobile devices all use unique fonts. Then when you add your own fonts you further make your device “stand out from the rest”|
|Operating System||Mac, Linux, Windows, Android, iOS, etc,|
|Cookies enabled?||Inferred in HTTP, logged by server|
|Graphics capability||different hardware support different graphics standards|
|IP Address||Address mapped to location|
|City Name||Geographic name of the city|
|Country Name||Geographic name of the Country|
|Connection Speed||Internet connection speeds or bandwidths (high, medium, low)|
|Connection Type||Describes the data connection between the device or LAN and the internet. See the Connection Type mapping|
|IP Routing Type||Tells how the user is routed to the internet|
|Carrier Name||The name of the entity that manages the ASN entry|
|ASN||Globally unique number assigned to a network or group of networks that is managed by a single entity|
|Top-level Domain||The top-level domain of the URL. For example, .com in www.oracle.com. This is mapped through the Quova reference file.|
|Second-level Domain||The second-level domain of the URL|
Now if either Java or Flash are installed, then obtaining a higher degree of probability of uniqueness becomes even more likely. If we then generate a unique hash string/vector per visitor based on these characteristics then we have in essence authenticated the visitor (or at the minimum, we have identified with a high degree of probability a unique computing device which many not be the same as an individual, recognizing that it is common for devices to be sometimes shared among several users). Once we have identified a unique, returning visitor/device, then we are free do all sorts of other things, like record current history and store/retrieve past behaviors that can drive what segments we calculate for this visitor.
A quote from the Forbes article linked below should convince you of the inevitable wide-spread adoption of this technology: “The head of online advertising for a major company said the decay of cookies over time, the growth of mobile phones and different kinds of portable devices, and Apple’s default settings all make fingerprinting the key for future online advertising.”
The typical fingerprinting solution for WCS (or any application for that matter) would implement a few lines of static HTML in the wrapper code to include a Flash shared object and/or image tags to collect additional device characteristics. The Flash code then makes an internal call to the application server thereby uploading the device characteristics. The schema of the database to store such visitor/device data should allow for binding a visitor to multiple devices (and conversely, allow for multiple known “visitors” per device) whenever they explicitly authenticate, for example when they log in. Additionally, if cookies are allowed, we can then combine device characteristics with cookies to enable stitching sessions together — even across devices.
The flow might look like the following:
- here comes a new user with a device that doesn’t match any other I have in my db — so let’s add a record to the device tracking table, perhaps using a hash/vector as a key — where each device characteristic is assigned a value contributing to the n-dimensional vector. In this way, if a device changes just one characteristic of its device — example: adding a new font — then the new vector will be “close” to the old vector (testable via matrix multiplication) and we have a chance of binding these two records together later in a data-mining/data-cleaning batch process. However, such precision may not be necessary for a given audience!
- we observe this device’s various site behaviors (e.g. this device appears to like women’s cloths) and calculate a segment for this device, storing it in the device tracking table (and in a cookie if allowed).
- if a visitor explicitly authenticates (assuming our site has that ability), we can loosely bind the device fingerprint id to the visitor id (recognizing that a known visitor might use multiple devices to access our site).
- over time our data should show how many visitors use each device, and conversely, how many devices are used by each visitor. As such, we likely will want to have at least two tables that can be joined together for various kinds of reports and analysis.
Creating a session variable that identifies the visitor for the duration of the session can then be used by subsequent page visits such that the fingerprinting code only needs to be executed once per session. Note that there is a practical limit to evaluating potentially endless device characteristics — i.e. we don’t want to create significant latency while such evaluations/calculations are being made. As such, the goal is to always implement the lightest weight code that will get the job done — not necessarily an easy task.
Notwithstanding, I believe that every demo of any Oracle product that purports to be about Experience Management should ship with at the very minimum a lightweight demo of this fingerprinting technique such that developers could then extend it to match their client’s specific requirements.
I encourage you to explore the links at the end of this blog post. Some of these links demonstrate just how unique your device is!! (somewhat scary when you think about it).
- https://panopticlick.eff.org/browser-uniqueness.pdf (documenting their methodology)
- http://docs.oracle.com/cd/E27559_01/admin.1112/e27207/finger.htm (fingerprinting using Oracle Adaptive Access Manager)
All site content is the property of Oracle Corp. Redistribution not allowed without written permission