短语
计算机科学
时间戳
子序列
名词短语
情报检索
领域(数学分析)
词(群论)
数据库
期限(时间)
自然语言处理
语言学
数学
量子力学
物理
名词
数学分析
计算机安全
哲学
有界函数
作者
Brian Lent,Rakesh Agrawal,Ramakrishnan Srikant
出处
期刊:Knowledge Discovery and Data Mining
日期:1997-08-14
卷期号:: 227-230
被引量:211
摘要
We address the problem of discovering trends in text databases. Trends can be used, for example, to discover that a company is shifting interests from one domain to another. We are given a database V of documents. Each document consists of one or more text fields and a timestamp. The unit of text is a word and a phrase is a list of words. (We defer the discussion of more complex structures till the “Methodology” secl-inn Ao.aw.;,tc.rl ..r;th r...rh nhrano ;a s h;rtmw nf the YAVU., ~uu”~Icu”n,L& ““lull \.uIUIA yuLCll”U I” Lo ,YYUY”~ y “I Yll” frequency of occurrence of the phrase, obtained by partitioning the documents based upon their timestamps. The frequency of occurrence in a particular time period is the number of documents that contain the phrase. (Other measures of frequency are possible, e.g. counting each occurrence of the phrase in a document.) A trend is a specific subsequence of the history of a phrase that satisfies the users’ query over the histories. For example, the user may specify a “spike” query to finds those phrases whose frequency of occurrence increased and then decreased.
科研通智能强力驱动
Strongly Powered by AbleSci AI