VOL 2 . NO 3 e-ISSN : 2549-9904 ISSN : 2549-9610 INTERNATIONAL JOURNAL ON INFORMATICS VISUALIZATION BlogNewsRank: Finding and Ranking Frequent News Topics Using Social Media Factors Harshitha H #. Dr. Mohammed Rafi # # #Department of Studies in CSE. University B. T College of Engineering (A Constituent college of VTU,Belagav. India E-mail: harshithadravid12@gmail. AbstractAi In early days, mass media sources such as news media used to inform us about daily events. Now a days, social media services such as Twitter huge amount of user-generated data, which has a great potential to contain informative news-related content. For these resources to be useful, we have to find a way to filter noise and capture the content that, based on its similarity to the news media, is considered valuable. Even after noise is removed, information overload may still exist in the remaining data. Hence it is convenient to prioritize it for consumption. To achieve prioritization, information must be ranked in order of estimated importance considering mainly three factors. First, the temporal prevalence of a particular topic in the news media is a factor of importance, and can be considered the media focus (MF) of a topic. Second, the temporal prevalence of the topic in social media indicates its user attention (UA). Last, the interaction between the social media users who mention this topic indicates the strength of the community discussing it, and can be regarded as the user interaction (UI) toward the topic. We propose an unsupervised frameworkAi BlogNewsRankAiwhich identiAes news topics prevalent in both social media and the news media, and then ranks them by relevance. using their degrees of MF. UA, and UI. KeywordsAi Topic identiAcation. Topic ranking. Social network analysis. Keyword extraction. Co-occurrence similarity measures. Graph clustering. assist in the prioritization of news information, news must be ranked in order of estimated importance. The temporal prevalence of a particular topic in the news media indicates that it is widely covered by news media sources, making it an important factor when estimating topical relevance. This factor may be referred to as the MF of the topic. The temporal prevalence of the topic in social media, speciAcally in Twitter, indicates that users are interested in the topic and can provide a basis for the estimation of its popularity. This factor is regarded as the UA of the topic. Likewise, the number of users discussing a topic and the interaction between them also gives insight into topical importance, referred to as the UI. By combining these three factors, we gain insight into topical importance and are then able to rank the news topics accordingly. Consolidated. Altered, and ranked news topics from both professional news providers and individuals have several beneAts. The most evident use is the potential to improve the quality and coverage of news recommender systems or Web feeds, adding user popularity feedback. To achieve its goal. BlogNewsRank uses keywords from news media sources . or a speciAed period of tim. to identify the overlap with social media from that same period. We then build a graph whose nodes represent these keywords and whose edges depict their co-occurrences in INTRODUCTION Microblogs have become one of the most popular social media outlets. One microblogging service in particular. Twitter, is used by millions of people around the world, providing enormous amounts of user-generated data. One may assume that this source potentially contains information with equal or greater value than the news media, but one must also assume that because of the unveriAed nature of the source, much of this content is useless. For social media data to be of any use for topic identiAcation, we must And a way to Alter uninformative information and capture only information which, based on its content similarity to the news media, may be considered useful or valuable. The news media presents professionally veriAed occurrences or events, while social media presents the interests of the audience in these areas, and may thus provide insight into their popularity. Social media services like Twitter can also provide additional or supporting information to a particular news media topic. In summary, truly valuable information may be thought of as the area in which these two media sources topically intersect. Unfortunately, even after the removal of unimportant content, there is still information overload in the remaining newsrelated data, which must be prioritized for consumption. social media. The graph is then clustered to clearly identify distinct topics. After obtaining well-separated topic clusters (TC. , the factors that signify their importance are calculated: MF. UA, and UI. Finally, the topics are ranked by an overall measure that combines these three factors. II. period of time, which represents the UA of the topic to which the site is related. Additionally, we believe that the relationship between social media users who discuss the same topics also plays a key role in topic relevance. Kwan et al. proposed a measure referred to as reciprocity, which attempts to detect the interaction between social media users and perceive their engagement in relation to a particular topic. Higher reciprocity means greater interaction between users, and thus topics with higher reciprocity should be considered more important because of their underlying community structure. We can inherently identify the power and inCuence of a well-structured community as opposed to a decentralized and unstructured one. Our method applies this logic to support the idea that higher reciprocity signiAes greater BACKGROUND AND RELATED WORK The main research areas applied in this paper include: topic identiAcation, topic ranking, social network analysis, keyword extraction, co-occurrence similarity measures, and graph clustering. Topic IdentiAcation Many methods have been proposed for keyphrase Most of them are based on machine learning Much research has been carried out in the Aeld of topic identiAcation--Two traditional methods for detecting topics are LDA . and PLSA . , . LDA is a generative probabilistic model that can be applied to different tasks, including topic identiAcation. PLSA, similarly, is a statistical technique, which can also be applied to topic modeling. Another trending area of related research is the detection of AuburstyAy topics . , topics or events that occur in short, sudden episode. Diao et al. proposed a method that uses a state machine to detect bursty topics in microblogs. Their method also determines whether user posts are personal or refer to a particular trending topic. Yin et al. also developed a model that detects topics from social media data, distinguishing between temporal and stable topics. These methods, however, only use data from microblogs and do not attempt to integrate them with real news. Additionally, the detected topics are not ranked by popularity or Keyword Extraction Concerning the Aeld of keyword or informative term extraction, many unsupervised and supervised methods have been proposed. Unsupervised methods for keyword extraction rely solely on implicit information found in individual texts or in a text corpus. Supervised methods, on the other hand, make use of training datasets that have already been classiAed. We use TextRank . to extract keywords from the news media sources. Furthermore. TextRank does not require training or any document corpus for its operation. Co-Occurrence Similarity Since establishing the importance of the word-pair cooccurrence distribution in the actual corpus of tweets is of more interest to us, we did not employ BollegalaAos or ChenAos semantic similarity methods. In this paper, we tested other similarity measures, and found that the Dice similarity measure provided the best results. Topic Ranking Another major concept that is incorporated into this paper is topic ranking. There are several means by which this task can be accomplished, traditionally being done by estimating how frequently and recently a topic has been reported by mass media. Ranking is the central problem for many information retrieval applications, such as document retrieval and collaborative Altering. Recently a new research area is emerging in machine learning, which is called learning to Learning to rank aims at automatically creating a model . that can perform ranking on instances, using training data and machine learning techniques. Many learning to rank methods have been developed and applied to information retrieval. Shubhankar et al. developed an algorithm that detects and ranks topics in a corpus of research papers. They used closed frequent keyword-sets to form topics and a modiAcation of the PageRank. algorithm to rank them. Their work, however, does not integrate or collaborate with other data sources, as accomplished by BloggyNewsRank. Graph Clustering The main purpose of graph clustering in this paper is to identify and separate TCs. Newton proposed a new method to identify clusters based on modularity. Modularity is a measure designed to estimate the strength of division of a network into clusters. Networks that possess a high modularity value have dense connections between the nodes within each cluster, but sparse connections between nodes in different clusters. NewmanAos new algorithm calculated modularity as it progressed, making it simple to And the optimal clustering structure. Given their effectiveness, the concepts of betweenness and transitivity are both applied into our graph clustering algorithm. BLOGNEWSRANK FRAMEWORK The goal of our methodAiBlogNewsRankAiis to identify, consolidate and rank the most frequent topics discussed in both news media and social media during a speciAc period of The system framework can be visualized in FIG. achieve its goal, the system must undergo four main stages. Preprocessing: Key terms are extracted and Altered from news and social data corresponding to a particular period of time. Key Term Graph Construction: A graph is constructed from the previously extracted key term set, whose Social Network Analysis In the case of UA. Wang et al. estimated this factor by using anonymous website visitor data. Their method counts the amount of times a site was visited during a particular vertices represent the key terms and edges represent the co-occurrence similarity between them. The graph, after processing and pruning, contains slightly joint clusters of topics popular in both news media and social media. Graph Clustering: The graph is clustered in order to obtain well-deAned and disjoint TCs. Content Selection and Ranking: The TCs from the graph are selected and ranked using the three relevance factors (MF. UA, and UI). In this section, we show experiments performed on some of the variables and components used in our method, namely, the co-occurrence measure. IQR coefAcient, and node For the controlled experiments, the control dataset was divided into six partitions, with each partition representing ten daysAo worth of news and tweets. Taking all results into consideration emphasizes the point that MF alone is a substandard estimator of what users And interesting or consider important, and should therefore not be used in this way. BlogNewsRank, on the other hand, proves to be more capable of performing this, and so we conclude that the information provided by BlogNewsRank can prove vital in commerce-based areas where the interest of users is paramount. CONCLUSION In this paper, we proposed an unsupervised methodAi BlogNewsRankAiwhich identiAes news topics prevalent in both social media and the news media, and then ranks them by taking into account their MF. UA, and UI as relevance The temporal prevalence of a particular topic in the news media is considered the MF of a topic, which gives us insight into its mass media popularity. The temporal prevalence of the topic in social media, speciAcally Twitter, indicates user interest, and is considered its UA. Finally, the interaction between the social media users who mention the topic indicates the strength of the community discussing it, and is considered the UI. To the best of our knowledge, no other work has attempted to employ the use of either the interests of social media users or their social relationships to aid in the ranking of topics. Fig. Blognewsrank framework IV. EXPERIMENTS AND RESULTS The testing dataset consists of tweets crawled from Twitter public timeline and news articles crawled from popular news websites. REFERENCES