The existence of huge data volume on the Web has fueled an unrelenting need to locate the "right information at the right time," as well as to effectively develop an integrated, comprehensive information source. This calls for tools for efficiently analyzing and managing web data-and for efficiently managing web information from the database perspective. This comprehensive resource presents a data model called WHOM (Warehouse Object Model) to represent HTML and XML documents in the warehouse. It defines a set of web algebraic operators for building new web tables by extracting relevant data from the Web, as well as generating new tables from existing ones. This "web-warehouse approach" incorporates modern and effective shared web data-management concepts, methods, and models. Features & Benefits: * Presents a simple and generic data model for representing metadata, structure, and content of web documents and hyperlinks * Addresses schema-related issues for both HTML and XML data, with their associated challenges of irregularity and heterogeneity * Describes a web algebra for manipulating warehoused data * Utilizes numerous examples to illustrate various concepts of web data management and to simplify all key issues * Highlights change management and knowledge discovery, two important applications of web warehouses With its accessible style and emphasis on practicality, the book delivers an excellent survey for all current principles for structured, web-based data-management technologies. Database-management systems developers, enterprise web-site developers, and applied R&D researchers will find the work an essential companion for new concepts, development strategies, and application models.
Table of Contents:
* Introduction * Survey of Web data-management systems * Node and link objects * Predicates on node and link objects * Imposing constraints on hyperlink structures * Query mechanism for the Web * Schemas for warehouse data * WHOM-algebra * Web data visualization * Detecting and representing relevant Web deltas * Knowledge discovery using Web bags * The road ahead * Index