We recently had a client who is a multi-national retailer with both a physical and Internet seo keyword api presence. The client needed a way to acquire certain business intelligence (BI) data from the Internet on a daily basis. After several unsuccessful attempts to create this functionality themselves, they came to us for a solution.
On the surface the requirements seemed to be difficult and it was easy to see why their own IT team had failed to find a solution. They were thinking “inside the box”, however, and hadn’t considered third-party alternatives. The specifications required that the application perform all of these tasks:
Retrieve new product listings on competitor’s web sites.
Retrieve current pricing for all products listed on competitor’s web sites.
Retrieve full text of competitor’s Press Releases and public financial reports.
Track all inbound links pointing to competitor’s web sites from other web sites.
Once the data was acquired it needed to be processed for reporting purposes and then stored in the data warehouse for future access.
After reviewing current web-based data acquisition technology, including “spiders” which crawled the Internet and returned data which then had to be processed through HTML filters, we determined that the Google API and Web Services offered the best solution.
The Google API provides remote access to all of the search engine’s exposed functionality and provides a communication layer which is accessed via the “Simple Object Access Protocol” (SOAP), a web services standard. Since SOAP is an XML-based technology it is easily integrated into legacy web-enabled applications.
The API met all of the requirements of the application in that it:
Provided a methodology for querying the Web using non-HTML interfaces
Enabled us to schedule regular search requests designed to harvest new and updated information on the target subjects.
It provided data in a format which was able to be easily integrated with the client’s legacy systems.
Using the Google API, SOAP and WSDL, our developers were able to define messages that fetched cached pages, searched the Google document index and retrieve the responses without having to filter out HTML or reformat the data. The resulting data was then handed off to the client’s legacy systems for validation, reporting and further processing before reaching the data warehouse.
During the Proof of Concept phase we ran tests where we were able to reliably identify and retrieve updated public relations and investor relations information that exceeded the client’s expectations.
In our next test we retrieved the most currently available product pages which were listed in Google and then ran another query to retrieve the Google “cached page” versions. We ran these two data sets through difference filters and were able to produce accurate price increase and decrease reports as well as identify new products.
For our final test we used the Google API’s ability to access the “link:” feature to rapidly build lists of inbound links.
These limited tests demonstrated that the Google API was capable of producing the BI data that the client requested as well as demonstrating that the data could be returned in a pre-defined format which eliminated the need to apply post retrieval filters.
The client was pleased with the results of our Proof of Concept phase and authorized us to proceed with building the solution. The application is now in daily use and is exceeding the client’s performance expectations by a wide margin.
Any developer who is familiar with coding WSDL will have no problem using the Google API. Using the Google API requires a developer’s key which is available at no charge. Go to http://www.google.com/apis/ for an overview and to register for your own key.