International Journal of Electrical and Computer Engineering (IJECE) Vol.
No.
February 2017, pp.
ISSN: 2088-8708.
DOI: 10.
11591/ijece.
Context Sensitive Search String Composition Algorithm using User Intention to Handle Ambiguous Keywords Uma Gajendragadkar1.
Sarang Joshi2 COEP.
Savitribai Phule Pune University.
Pune.
Maharshtra.
India PICT.
Savitribai Phule Pune University.
Pune.
Maharshtra.
India
Article Info
ABSTRACT
Article history:
Finding the required URL among the first few result pages of a search engine is still a challenging task.
This may require number of reformulations of the search string thus adversely affecting user's search time.
Query ambiguity and polysemy are major reasons for not obtaining relevant results in the top few result pages.
Efficient query composition and data organization are necessary for getting effective results.
Context of the information need and the user intent may improve the autocomplete feature of existing search This research proposes a Funnel Mesh-5 algorithm (FM.
to construct a search string taking into account context of information need and user intention with three main steps .
Predict user intention with user profiles and the past searches via weighted mesh structure .
Resolve ambiguity and polysemy of search strings with context and user intention .
Generate a personalized disambiguated search string by query expansion encompassing user intention and predicted query.
Experimental results for the proposed approach and a comparison with direct use of search engine are A comparison of FM5 algorithm with K Nearest Neighbor algorithm for user intention identification is also presented.
The proposed system provides better precision for search results for ambiguous search strings with improved identification of the user intention.
Results are presented for English language dataset as well as Marathi .
n Indian languag.
dataset of ambiguous search strings.
Received Aug 5, 2016 Revised Nov 12, 2016 Accepted Nov 26, 2016 Keyword:
Autocompletion Context Data mining Search User intention Copyright A 2017 Institute of Advanced Engineering and Science.
All rights reserved.
Corresponding Author:
Uma Gajendragadkar.
COEP.
Phone 919822479128.
G7/9 Omkar Garden.
Manikbaug.
Pune.
Maharshtra.
India.
Email: umagadkar@gmail.
INTRODUCTION
Current search engines churn a large volume of data to obtain meaningful information.
however, the main challenge is to get relevant results in the top few result pages .
, .
Search engines check for the presence of keywords in documents.
Mere presence of keywords in a document may not match the user's search intention and need.
User satisfaction increases when more relevant and exact information is presented in the top few results.
An appropriately composed query is the starting point for handling this challenge .
Performance of search engines can be improved with the use of appropriate keywords or prediction of such keywords .
Search engines use search logs and most popular queries.
however, these are not sufficient to predict the user's interests or intention .
Users are of three types, first - Internet skilled users, second - Internet aware users and third Internet unskilled users.
Many times, users do not know the proper keywords for searching information and they cannot express their information need or intent of search .
, .
This results in search results often not satisfying user's information need.
This problem can be addressed by query expansion and reformulation .
Search engines provide autocompletions of queries based on popularity .
however, they are Journal homepage: http://iaesjournal.
com/online/index.
php/IJECE A ISSN: 2088-8708 inadequate .
, .
Although different users may use the same query keyword, their intent and context may be different.
Current search engines provide the same results to all users using the same keywords at a given point in time.
Personalization is desirable to better satisfy the needs of the user .
The following experiment illustrates this further.
If a user searches for 'Michael Jackson' then search engines return results for the famous singer Michael Jackson in majority of result pages.
These results would be treated as irrelevant and incorrect if the user intent was to search for professor Michael Jackson.
Table 1.
Example search query done on Google .
on 29th May 2015 result rows Query String Total Results Search Results as Singer Search Results as Professor Search Results as Software Development Search Results as VP Michael Jackson About 39,00,00,000 First 13 pages and after Page 17 8th result Page 13 last result Page 16 4th result Michael Jackson professor About 7,89,00,000 results Page 3 - 5th result First page Second page 2nd result Not present in the first 20 pages As shown in Table 1, when one searches for the query string 'Michael Jackson', results for the singer 'Michael Jackson' are returned in the first 13 pages whereas no result is returned for the professor 'Michael Jackson'.
With each page containing 10 results, the relevant results start appearing after 130 result rows.
However, when a word 'professor' is added to the query string 'Michael Jackson', the results for professor Michael Jackson are seen in the first result page itself.
This demonstrates that if keywords based on user intention are used then better hits can be obtained in the first few search result pages.
Query expansion based on user intention has shown to give better search results over large data sets like Web .
, .
Thus user intention can be used to disambiguate a query .
User context can include parameters such as 'gender', 'age', 'topic', location' etc.
It can be short-term .
or long-term .
In the proposed method, user intention is identified with the help of user profile containing parameters like 'gender', 'profession', 'interests', 'location' and past searches.
User intention identified with FM5 algorithm is used to reformulate the query.
This paper brings together different IR (Information Retrieva.
areas like QAC (Query autocompletio.
Query Personalization and automatic query expansion.
Our contributions are:
A novel user intention identification algorithm is proposed to predict user intention.
Query expansion is done using identified user intention to get improved precision for ambiguous search .
Experimental evaluation of the method is conducted with dataset collected from users.
The results reflect improvement in user intention identification and precision of search results.
Results of query expansion using the identified user intention are compared with the results of Google search engine .
directly as first baseline and also with results obtained for ambiguous queries by Chirita et al .
as a second baseline.
In this paper.
Section 2 describes the related work.
Section 3 explains data description and how it is used by the proposed system while Section 4 describes the FM5 user intention identification algorithm.
Results and discussion are described in Section 5.
Conclusion is presented in Section 6.
RELATED WORK
Autocompletions and Personalization Bhatia et al.
presents work where phrases and n-grams are mined from text collections and used for generating autocompletions.
Most popular completion i.
autocompletions based on past popularity of queries in query logs are modeled in Bar-Yossef and Kraus's work .
, .
Commercial search engines use MPC .
ost popular completio.
for query autocompletion .
Other query autocompletion methods include personalized autocompletion, context based autocompletion using previous queries by user .
, time based autocompletion .
, time and context based autocompletion .
Homologous queries and semantically related terms are used to generate autocompletions by Cai et al.
Personalization of query results by using the interests of users is done by many researchers .
User preferences are collected by either implicit or explicit method.
Gender and age are used for personalizing the results by Kharitonov and Serdyukov .
User context based on their recent queries is generated and used to rank the query results in a session by Xiang et al .
Most of the research conducted is for personalizing the query results by reranking them using user profile rather than query autocompletion.
IJECE Vol.
No.
February 2017 : 432 Ae 450
IJECE
ISSN: 2088-8708
This paper proposes an algorithm that uses personalization for query completion or autocompletions in An improvement in autocompletion ranking is claimed by personalization in Shokouhi's work .
Shokouhi et al.
also presented ranking of autocompletions with a time-sensitive approach as per their expected popularity .
Ambiguous queries are handled by Shoukhoi et al.
by providing user context in terms of session context.
Query suggestion is achieved by using click information along with previous queries in a session as context and then mining query log sessions for query reformulations .
This work is similar to us but it does not consider long-term user context instead focuses on session based user context in terms of click information and previous queries.
User Intention Many studies have tried to identify user intention in different ways.
Most of them try to categorize the queries as informational, navigational and transactional as proposed by Jansen et al .
Given a query suggestion, efforts have been done to understand the user intention using different means like web search logs .
, .
, previous user's search log for same query .
, clicked pages .
, user's search session history .
Wikipedia .
Wordnet and Google n-gram .
Using search query logs for existing users to identify intention cannot guarantee the correctness of search results .
Search intent prediction along with query autocompletion is a less explored area.
According to Cheng et al.
, many searches are triggered by browsed web pages .
Kong et al.
tried to predict search intent using recently browsed news article before search .
A large number of queries are triggered by news article daily .
Predicting search intent using browsed pages is inadequate .
Our proposed method uses live RSS newsfeed for query prediction.
makes use of user profiles to predict the search intent.
Query Expansion Query expansion is used to reformulate the original user query so as to improve retrieval of search results to better satisfy user needs.
One of them is relevance feedback using the returned results and adding new terms related to the original query and selected documents .
Other methods include adding relevant terms based on term frequency, document frequency from top ranked documents .
, .
, co-occurrence based techniques .
, thesaurus based techniques .
- .
, desktop specific techniques .
, probability of terms over search logs .
Our approach uses a user intention based keyword addition to expand the original query to handle ambiguous query terms.
DATA DESCRIPTION
Data collection Methodology and Data Resources The system uses different types of data sources.
For temporal contextual corpustwo elements are One is static contextual data based on current month and the second is dynamic contextual data based on daily current events.
Based on the parameter Aperiod', a month-wise list of occasions from Hindu and Christian calendar is taken and their associated keywords list is built.
Secondly based on daily current events.
RSS news feed from Reuters .
is processed and a dataset of keywords is built .
The temporal data is refreshed every day and also at restart of server.
This contextual data is generated for both English and Marathi-an Indian language popularly used in the state of Maharashtra by more than 70 million people.
Marathi n-gram dataset is also created by crawling Marathi websites for about four months and processing the web pages and is available .
The proposed algorithm also uses data from various sources like Google n-gram .
and Wordnet .
for English and Marathi Wordnet data .
How to use abovedescribed contextual data to mine possible query autocompletions is discussed by Uma Gajendragadkar et al .
Autocompletions for all sample test queries are collected from popular search engines for comparison.
This is done foreach character key press of all the test queries.
User Intention Based Query Expansion AKA user profiles returned by KNN (K Nearest Neighbo.
algorithm are used as input to the FM5 Let be a set of profiles such that Context Sensitive Search String Composition Algorithm using User Intention to A (Uma Gajendragadka.
A ISSN: 2088-8708
I }
where P(Z) is the probability of the known user profiles and Let I is the probability of unknown user profiles.
be the set of n query words and let
{ |
be the set of m intentions identified.
A trial is conducted by collecting random samples<ai.
For each sample query keyword ai, there could be multiple user intentions stored in intention matrix.
If keyword has two possible intentions then they are indicated by 'T' and all other intentions those are not applicable are indicated by 'F'.
Total 34 different user intentions are considered.
For example consider keyword - 'Jaguar'.
can have two possible user intentions - 'Automobile' and 'Wildlife'.
Initially = 0 and = 0 when no query keyword is typed or predicted and hence no user intentions are present.
Learning Method and Knowledge Generation Association rule based learning method is used for user intention identification.
Support and confidence for an intention are computed for the predicted keyword.
Association rule learning is used to find interesting relations between different parameters in the data .
It finds strong rules in data based on support and confidence measures.
If a rule such as is found in the data, it indicates that a customer is likely to buy coffee if the customer has bought both milk and sugar.
Association rule is used in many applications like market analysis, bioinformatics, web usage mining etc.
Minimum threshold values on support and confidence are used to find the interesting rules out of all possible } is a set of items and is a set of transactions in database , then a rule can be defined like I and Support can be calculated as proportion of transactions containing the item set U.
Say for illustration, the item set .
ilk,suga.
has a support of 6/10 = 0.
6 since it occurs in 60% of all transactions.
Confidence of rule can be calculated as the proportion of the transactions that contain both U and V.
For illustration, the rule has a confidence of 0.
9 means in 90% of the transactions that contain milk and sugar the rule holds true.
Other user intention identification methods have few drawbacks.
Using web search logs for intent identification lacks in correct outcomes as the same query responses have been provided to the users.
Using click pages .
is not very effective as user clicks do not always translate to the result being relevant to search intent.
User search session history .
works only for a No user intention prediction was done for ambiguous query in case of intent identification with Wikipedia .
Table 2 shows few records from sample data considered for learning intent of a user for a given keyword.
Table 2.
Example Data for Association Rule Mining Word Intent Automobile Gender IJECE Vol.
No.
February 2017 : 432 Ae 450
Location India
India
USA
India
India
India
USA
USA
India
India
India
India
Profession Lawyer
Engineer
Lawyer
Farmer
Doctor
Lawyer
Engineer
Doctor
Engineer
Doctor
Farmer
Engineer
Engineer
Engineer
Interest
Cooking Gardening Books Painting Poems Gardening Poems Art
Theatre Photography Wildlife Sports Movies IJECE
ISSN: 2088-8708
From the sample data, all rules having a support and confidence more than the threshold value are These rules are used to learn about the possible intention of a user about a keyword.
Let be the returned intentions and in Equation 4 then how one out of these is selected is explained in section 4 of the paper.
For the purpose of experimentation, two types of users are considered for the system registered user and unregistered user.
First case is when a user logs in to the system .
egistered user with known user profile in Z') and the second case is a user who does not log in to the system .
nregistered users with unknown user profiles in I ) as in Equation 1.
In the second case, if user does not login to the system then the user profile is not available hence no personalization can be done and no learning happens.
In the first case, a User Profile is created during registration to the search system.
This profile ' ' is created by obtaining user preferences for a set of questions.
The values are filled in by an explicit questionnaire asking questions like 'What is your Profession?' and answer will set the value.
User preferences are stored in the user profile and the past searches done by the user are also stored in this profile.
A bit vector representing user profile is stored in the system for every registered user.
This vector of different parameters forms a key for each user.
is the set of user profile parameters to be considered for taking decision of next probable alphabet/ numerical/symbol.
The composition of search string is done using elements of set .
The system personalizes search strings based on these user preferences and learns from past After pressing a key character in search box, the system tries to predict the next character by using the past searches of the registered user initially and later by using the pool of searches done by other users having similar profiles to the current user.
Comparison of this method is done with KNN (K Nearest Neighbo.
The graphs in Figure 1 show the performance of KNN for user intention identification with different K values on sample As seen in Figure 1.
KNN shows better performance with smaller K value for identifying the user intention but the accuracy of the identification is less .
bout 39%).
To better predict the user intention, we have proposed the FM5 algorithm.
Figure 1.
Performance of KNN with Sample Data Context Sensitive Search String Composition Algorithm using User Intention to A (Uma Gajendragadka.
A ISSN: 2088-8708
PROPOSED METHOD
The objective is to find appropriate user intention for a search string being entered by a user in the search box.
In FM5 algorithm, user profiles are used to identify user intention for a search string being entered in the search box.
For each user a user profile is created during registration to the search system as discussed in section 3.
FM5 implements a funnel filter consisting of different meshes mapped to the user profile parameters.
Weights are applied to these meshes to disambiguate different user intentions of query as given in Equation 22.
User can select the parameter or multiple parameters to be used.
If users' current search intention is related to his {'Interest'} rather than his {'Profession'} then only the parameters like {'Interest.
Gender.
Location'} could be selected and other parameters like {'Profession'} could be Let be the set of parameters considered for the experiment.
For example, user profile consists of 5 parameters'Profession', 'Interests', 'Gender', 'Location' and 'Past searches'.
For illustration purpose, higher weight is assigned to {'Profession'} parameter followed by {'Gender', 'Interests', 'Location', 'Past searches'} This is configurable and more parameters can be added to the funnel shown in Figure 2.
Figure 2.
User Intention Identification Funnel Let be the set of weights such that .
The computation of these weights is explained in Equation 22.
Let Q be the prefix query input string which is progressively attached with an alphanumeric character to complete the search string.
are characters to compose the search string assuming .
character key presses.
Initial state of this set is empty.
can be any character ranging from a to AzA and A0A to A9Aor characters like A:, .
, ,
AA etc.
Let is a partial search string.
The composition of search string and related selection of done using elements of set as per Equation 7.
Proposed Circular Structure for User Profile Parameters The user profiles are organized in a circular linked list as shown in Figure 3.
A circular linked list is used as one can add or remove parameters from the list easily and easy to traverse to reach an object.
Learning will add or subtract parameters from circular linked list.
Sizeof the circular linked list will increase or decrease accordingly.
Let 'r' be the radius of the circle on which various user profiles are arranged.
The pointer in the circular linked list is placed as per the weight calculated by association rule in terms of support as shown in Equation 20.
IJECE Vol.
No.
February 2017 : 432 Ae 450
IJECE
Let
ISSN: 2088-8708
be the cost associated with elements of Q such that .
C= 1 when partial or complete search string does not exist in the search set or the algorithm fails to predict the search string.
C = 0 when search string is distinctly known then no search string prediction and composition is required.
The sorting of search string is done such that it always has the central tendency.
Figure 3.
Circular Structure used for FM5 - user Intention Identification Algorithm Since the search string is computed and mapped in the range .
, the central tendency predicts most likely search string.
Computation of logical search string is done with the various parameters of user Greedy algorithm is used for intention selection using cost .
Each time the mesh with largest cost and greater than or equal to threshold value is selected as explained in Equation 21.
Mathematical Model for Funnel Mesh 5 (FM.
Algorithm The algorithm in pseudo code form is represented in the Algorithm 1.
Rest of the section explains it in detail.
To illustrate it further, if = Profession then = {Engineer.
Doctor.
Lawyer.
Architect, .
or combination of these.
The system assumes that one user has one profession.
If 6bits are used to store profession parameter then 26 combinations are possible.
Forexample, let the bit sequence '000001' indicates profession = Engineer.
The probability of choosing correct profession is .
wheret = number of bits used to store the parameter.
Let be the past searches associated with the user profile vectors in circular linked list.
Let Context Sensitive Search String Composition Algorithm using User Intention to A (Uma Gajendragadka.
A ISSN: 2088-8708 be the past search strings associated with X3i .
Then .
where weight Wi is the support value calculated using association rule for search stringSi as shown in Equation 20.
Si X3i a list of parameters is given by .
Weight is computed using association rule of the form being calculated in circular linked list.
having value 5as the central tendency is Algorithm 1.
Funnel Mesh 5 (FM.
Algorithm Here and support value is given by .
where N = total number of records in the circular linked list and X is a combination of parameters from X1 whose support Then, currentuser, starting with the prefix Q, .
IJECE Vol.
No.
February 2017 : 432 Ae 450
IJECE
ISSN: 2088-8708
be the set of possible user intentions.
Total 34 intentions are considered for the experimental setup.
Table 3 shows user intention taxonomy and example keywords for each of the intentions.
For example, the keyword ASeedAfalls under intention AGardeningA whereas AStanzaA belongs to APoemsA.
For ai as in Equation 3, all search strings starting with prefix Q would be considered.
Letthe set of search strings starting with prefix be .
Initial probability of choosing the next character is .
With each character keypressqi 1, this probability increases as | | .
ount of possible search string.
keeps on reducing as shown in Figure 4.
As in Figure 4, with every parameter having weight .
greater than the threshold, a mesh filtering is done and user intention list keeps on reducing and algorithm makes use of the central tendency to identify matching intentions.
From Equation 23, it is seen that there are 'h' intentions.
Initially there are 'h' intentions.
Hence probability of choosing matching intention will be .
where MI = matching intention.
Figure 4.
User Intention Identification For the purpose of experimental trial, total 34 intentions are considered as shown in User intention taxonomy in Table 3.
Table 3 shows example keywords for each of the intentions.
For example, the keyword AseedA can fall under AAgricultureAintention.
Asong' in AMusicA.
Table 3.
User intention taxonomy and examples Intention Social Technical Research Political Philosophy Medical Military Religious Scientific Legal New Generation Example Intention Agriculture Bad meaning words Music Cooking Sports Gardening Health Books Writing Poems Example Intention Movies Theater Art Craft Painting Travel Hiking Social work Sculpting Photography Literature Wildlife Example Context Sensitive Search String Composition Algorithm using User Intention to A (Uma Gajendragadka.
A ISSN: 2088-8708 Based on weights calculated in Equation 22, parameters are chosen from set X5i for each past search string and a further function is applied.
gives a reduced set of matching intentions from initial {AhA} intentions as shown in Figure 4.
Let the returned multiset be given as So with recursive application of , using elements of .
, the probability becomes .
After the last application of , whatever multiset of matching intentions is obtained, from that the final user intention is chosen.
For this, multiplicity of each member of the multiset is found and the member with highest multiplicity is chosen as the final user intention.
where f.
= frequencies of intentions in Query Expansion Let Q be the original query selected by user.
Let A be the set of ambiguous queries such that Let C be the set of context based words for each ambiguous word where m is maximum number of meanings associated with the word .
Query expansion patterns are used to expand the query selected by user based on user intention.
Let set of query expansion patterns available given by .
Let be the function which maps the identified user intention to an expansion pattern from The FM5 algorithm identifies the user intention for a query using association rule and user profiles.
The original query is modified as .
and given to search engine.
Table 4.
FM5 Intention Identification Examples Keyword Cost Ae Mesh selected Jaguar {User Ae Female.
Engineer.
Music.
Indi.
6 - Profession Ie Jaguar 7 - Location IeJaguar Java {User Ae Female.
Engineer.
Sports.
Indi.
Bond {User Ae Male.
Engineer.
Movies.
Indi.
63 - Profession Ie Java 6 - Gender Ie Java 7 - Location Ie Java 76 - Location Ie Bond IJECE Vol.
No.
February 2017 : 432 Ae 450
Intentions of final user set
after filtering Automobile Wildlife Automobile Automobile Automobile Research
Technology Technology Movie Legal Movie Movie Legal Movie Matching Intention
Automobile Technology Movie be the IJECE
ISSN: 2088-8708
Table 4 lists 3 sample test cases for FM5 algorithm.
In the first column, test keyword and the profile of the user entering test keyword is given.
In the second column, the association rule .
esh paramete.
selected and its cost are given.
In the third column, list of possible intentions obtained after filtering through chosen meshes is given.
In the fourth column, the matching intention generated as output for the keyword is The first keyword Jaguar is entered by a user who is female and an engineer having music as her interest and located in India.
After computing the support .
for all possible association rules with user profile parameters on LHS and keyword on RHS, only two rules are found having support greater than or equal to threshold of 0.
Hence two meshes Ae profession and location are applied.
After filtering, a set of 5 users is obtained.
From the past searches of these 5 users, intentions for keyword AJaguarA are selected and are displayed in the third column.
Out of these 5, the most frequently occurring intention Ae AAutomobileA is returned as matching intention.
RESULTS AND DISCUSSION
Authors developed a questionnaire to collect the user profiles and the desired intents for the search strings as shown in Figure 5.
For English dataset-1, 25 users and 15 ambiguous queries per user and their desired intent for each query were collected.
Thus, 375 queries and intentions were evaluated for the first For English dataset-2, 100 users and 20 ambiguous queries per user and their desired intent for each query were collected.
For English dataset-2 overall 2000 queries and intentions were evaluated.
For Marathi dataset, 20 users and 18 ambiguous queries per user were evaluated.
Thus.
Marathi dataset contained overall 360 queries and intentions.
The survey was designed as a paper-and-pencil-based field survey to approach a large number of users and a digital survey was also designed on the same lines.
The paper-based questionnaire was designed in two languages i.
English and Marathi.
To validate the proposed model, the questionnaire was distributed to collect the user profile information and desired intention while searching for various ambiguous queries.
Population of the study comprises of Engineers.
Doctors.
Farmers and Lawyers.
Samples of 170 users were selected randomly.
After scrutiny of filled questionnaire 150 were found to be fit for the analysis.
The users comprised of third year Engineering students from different streams of College of Engineering Pune .
as well as engineers, doctors, farmers and lawyers from different locations in India.
Figure 5.
User Survey Questionnaire designed to collect user profile and intentions Context Sensitive Search String Composition Algorithm using User Intention to A (Uma Gajendragadka.
A ISSN: 2088-8708 The system is evaluated for 25 users for English dataset-1 with about 15 ambiguous queries each.
The training data consists of about 55 user profiles and their past searches.
Table 6 shows user intention identification results for FM5 algorithm and KNN algorithm with different AkA values like 5 (KNN.
, 10 (KNN.
and 15 (KNN.
Matched intention indicates the total number of ambiguous test queries for all test users where the algorithm gave matching intention to the desired intention of user.
Unmatched intentions indicate the number of cases where algorithm failed to identify desired intention.
The results for user intention identification, obtained with FM5 are encouraging.
For English ambiguous dataset-1, accuracy of about 75% is observed with FM5 whereas with KNN an accuracy of about 38.
4% is observed for KNN5 and 8% with KNN10 and 27% with KNN15.
Table 6.
English Ambiguous Dataset Results Intention/Method FM5
KNN15
KNN10
KNN5
Matched Unmatched Total Accuracy The first graph in Figure 6 shows Matched intentions obtained for FM5 and KNN for different users with various queries on English dataset.
AMatchedA is the legend used for cases where appropriate user intention is obtained and AUnmatchedA is the legend showing cases where the algorithm failed to identify appropriate user intention.
The second graph in Figure 6 shows comparison of FM5 with KNN algorithm for different values of AkA.
FM5 gives better results than KNN.
Figure 6.
Results for Ambiguous English Dataset-1 For Marathi ambiguous dataset evaluation, system is evaluated with 20 users with 18 ambiguous queries each as shown in Table 7.
Thus, 360 queries and intentions are evaluated for Marathi dataset.
Accuracy of about IJECE Vol.
No.
February 2017 : 432 Ae 450
IJECE
ISSN: 2088-8708
5% is observed with FM5 algorithm whereas with KNN an accuracy of about 36.
7% is observed for KNN5 and 28.
3% with KNN10 and 24.
2% with KNN15.
Table 7.
Marathi Ambiguous Dataset Results Intention/Method FM5
KNN15
KNN10
KNN5
Matched Unmatched Total Accuracy User intention identification for search string with FM5 algorithm gives encouraging results.
Figure 7 depicts the results for Marathi dataset.
The first graph in Figure 7 shows the total number of Matched intentions obtained with FM5 and KNN for each user.
The second graph in Figure 7 shows the total number of matched and unmatched intentions for all users with FM5 and KNN.
Figure 7.
Results for Ambiguous Marathi Dataset The accuracy observed with FM5 and KNN algorithm for English dataset is plotted in Figure 8.
The accuracy observed with FM5 and KNN algorithm for Marathi dataset is plotted in Figure 9.
Context Sensitive Search String Composition Algorithm using User Intention to A (Uma Gajendragadka.
A ISSN: 2088-8708 Figure 8.
Accuracy observed with FM5 User Intention Identification for English Figure 9.
Accuracy observed with FM5 User Intention Identification for Marathi Further testing of the FM5 algorithm was done using English ambiguous dataset-2.
The system is further evaluated for 100 users for English dataset-2 with 20 ambiguous queries each as discussed earlier.
Table 8 shows user intention identification results for FM5 algorithm and KNN algorithm with different 'k' values like 5 (KNN.
, 10 (KNN.
and 15 (KNN.
Matched intention indicates the total number of ambiguous test queries for all test users where the algorithm gave matching .
intention to the desired intention of user.
Unmatched intentions indicate the number of cases where algorithm failed to identify desired intention.
The results obtained for English dataset-2 with FM5 show improvement of 0.
7% as compared to English dataset-1.
This may be because of increased number of past searches available while computing the parameters to build the mesh.
For English ambiguous dataset-2, accuracy of a 75.
9% is observed with FM5 whereas with KNN an accuracy of about 39.
3% is observed for KNN5 and 29.
05% with KNN10 and 28.
3% with KNN15.
Table 8.
English Ambiguous Dataset-2 Results Intention/Method FM5
KNN15
KNN10
KNN5
Matched Unmatched Total Accuracy The first graph in Figure 10 shows Matched intentions obtained for FM5 and KNN for different users with various queries on English dataset-2.
AMatchedA is the legend used for cases where appropriate user intention is obtained and AUnmatchedA is the legend showing cases where the algorithm failed to identify IJECE Vol.
No.
February 2017 : 432 Ae 450
IJECE
ISSN: 2088-8708
appropriate user intention.
The second graph in Figure 10 shows comparison of FM5 with KNN algorithm for different values of AkA for English dataset-2.
Figure 10.
Results for Ambiguous English Dataset-2 Table 9 shows the results of query expansion after user intention identification is done.
Top 50 URLs returned by Google API .
were collected for each query after query expansion.
The web pages of these URLs were evaluated as either relevant or not relevant to the query under consideration.
Table 9 shows average precision values obtained for test queries after query expansion based on identified user intention.
Table 9.
Average Precision after Query Expansion Query Results Top 5 Top 10 Top 15 Top 20 Top 25 Top 30 Top 35 Top 40 Top 45 Top 50 Average Precision Metric used for evaluation of search results returned after query expansion is P@N = related queries / N It indicates how many valuable results are present in top N search results.
Precision Improvement with FM5 and Query Expansion Top 50 URLs returned by Google were collected for each query.
The results were evaluated as either relevant or not relevant.
Precision was calculated.
It is given as P = (Relevant result.
/ (Returned Result.
Context Sensitive Search String Composition Algorithm using User Intention to A (Uma Gajendragadka.
A ISSN: 2088-8708 Graph in Figure 11 shows precision values for top 5, top 10 up to top 50 results obtained with our system and with results from direct use of Google search engine.
On X-axis, 1, 2, 3,.
indicate the average precision values for all the queries given by the user for the top 5 search results, for top 10 search results, for top 15 search results.
A etc.
From the observed values, the system shows significant improvement in the average precision The results are compared to the results obtained using Google search engine .
and the results obtained for ambiguous queries by Chirita et.
About 60.
47% improvement is seen in average precision with FM5 and query expansion as compared to the results obtained using Google .
This is our first baseline comparison.
Second baseline comparison is with the results obtained by Chirita et.
and an improvement of 40% is observed with the proposed method.
Figure 11.
Improvement in Precision after Query Expansion
CONCLUSION
Composing the search string by providing better autocompletions to the user that will result in more relevant and less redundant results is the goal of this research.
In this algorithm, personalization is used to add context and user intention to the search string composition.
The algorithm selects the most appropriate intention out of possible intentions for a keyword by using support as weight.
The FM5 algorithm of intent identification via funneling builds upon the advantages of simple k-nearest neighbor (KNN) algorithm.
The approach consists of identification of user intention with FM5 and then expanding original the query based on this intention to obtain more relevant search results in the first few pages.
FM5 user intention identification algorithm uses association rule mining with user profiles and shows improvement in performance as compared to KNN.
This FM5 when extended with query expansion patterns shows improvement in average precision values for ambiguous queries giving better search results.
The system does not use explicit feedback or other strategies like using click pages or session history for determining user intention or for query expansion.
Proposed user intention identification algorithm - FM5 showed improvement in accuracy as compared to KNN.
Proposed query expansion approach using identified user intention with FM5 showed improvement in average precision values for ambiguous queries giving better search results in top 50 pages.
Experimental results for the proposed approach and a comparison with direct use of search engine showed that performance improved significantly.
The proposed system provides better precision for search results for ambiguous search strings with improved identification of the user intention for English language dataset as well as Marathi .
n Indian languag.
dataset of ambiguous search strings.
REFERENCES