计算机科学
Web挖掘
数据网
万维网
情报检索
Web建模
网络智能
背景(考古学)
网页
概念挖掘
数据挖掘
古生物学
生物
出处
期刊:International Journal of Computational Intelligence Research
日期:2012-07-04
卷期号:8 (2): 95-105
被引量:3
摘要
In Context of web mining, large collection of web documents are used in the process of mining to extract more useful information. Most of the web information is irrelevant. Web document presents 10-15% of data using 85-90% of tags. The previous researchers on web mining proposed many methods, for mining web documents, but all these mining methods process documents without consider size of the document. Let N Documents are used in the mining and size of the each document is different. It is a time consuming process. In this thesis we proposed new web mining method called Web Mining using Divide and Conquer Approach (WMDCA). It consists of four phases: document selection phase (list of documents selected), Preprocessing phase (Divide big size documents, cleaning of each document, combine all sub documents to create XML cube), web mining phase (apply our algorithm to identify patterns), presentation phase (presentation of discovered results). Experiments are conducted on various web documents that are related to one domain. Experimental results of proposed system produce patterns with less time compare with existing methods on web document mining.
科研通智能强力驱动
Strongly Powered by AbleSci AI