Keywords: Book , Data Mining
Modeling the Internet and the Web: Probabilistic Methods and Algorithms provides a very good introduction to the problem of modeling the Internet and how to mine the huge amount of data generated.
The topics covered include:
- basic WWW technologies (URL, logs, search engines, ...)
- Web Graphs (Power-Law, the bow-tie structure, ...)
- Text Analysis (Indexing, vector-space model, Latent Semantics Analysis, Text Categorization)
- Link Analysis ( Hubs and Authorities: HITS, PageRank)
- Advances Crawling techniques
- Modeling and Understanding Human Behavior on the web
- Commerce on the Web (automated recommender systems, web path analysis for purchase prediction)