Web usage mining as a tool to identify user Behavioural Patterns to Design Effective E-Marketing Strategies for Tourism Businesses (the case of an Egyptian Travel Agency)

 

 

ELROUBY ITEN

Tourism Department Faculty of Tourism & Hotels- Alexandria University

 

EL KASRAWY SAMAR

Tourism Department Faculty of Tourism & Hotels- Alexandria University

 

ATTIA ABIR

Tourism Department Faculty of Tourism & Hotels- Alexandria University

 

 

 

ABSTRACT

Web Usage Mining is the application of data mining techniques to discover interesting knowledge about Web users through investigating behavioural usage patterns. Through mining usage patterns, Web designers and tourism marketers can better serve Web users` needs. Usage data and browsing patterns reflect the identity of Web users. This can be useful if thoroughly investigated to classify users and users` preferences to personalize Web sites accordingly and dynamically provide recommendations to build effective tourism e-marketing strategies.

The primary aim of this study is to examine and evaluate Web mining applications to develop tourist-based e-marketing strategies. This was accomplished by using the reports of "Google Analytics", a software analytic used to analyze user behavioural patterns. The secondary data of the log files of the customers of a travel agency was used and patterns were developed.  Multiple regression and correlation analyses were utilized to show relationships between variables such as bounce rate, pageviews and pages visited.

The results showed that there was a significant relationship between the variable bounce rate and loading time, bounce rate and pages visited and pages visited and loading time. Also the results showed that there was no significant relationship between the variables pageviews and loading time.

Based on the results of research, the Website of the case company was redesigned and a framework for an e-marketing strategy was introduced. The research also introduced recommendations of how to effectively use user behavioral patterns to design e-marketing strategies.

Keywords: E-Marketing, Google Analytics, Web log files, Web usage mining, SOSTAC, Travel Agency

 

1 INTRODUCTION

The role of the Internet in promoting and distributing products and services has rapidly expanded in recent years. As an information-intensive industry, the Internet and its World Wide Web have an extensive impact on the tourism industry. According to a wide range of researchers and practitioners tourism is among several industries that can make best use of Internet potentials.

The content of a Website is thus very important, and must be updated regularly. Travellers search for information on tourism Websites, therefore the content and structure of these Websites become one of the main factors contributing to repeated visits and affecting purchase intentions (Horng et al., 2010).

Web mining, a type of data mining used in customer relationship management (CRM) takes advantage of the huge amount of information gathered by a Website to look for patterns in user behaviour (Searchwindowsserver, 2012).It is categorized into three active research areas namely Content mining, Structure mining and Usage mining (Liu et al., 2007).

In a world with highly competitive markets, business organizations are necessarily in need to develop effective decision support systems to direct decision-making processes (Chaovalitwongse  et al., 2008). Web mining tools can help organizations examine data from the past, relate it to present events and thereby suggest future actions.

The increase in the number of Websites offering same services presents a challenge for organizations to organize the content in a way that attracts its customers. Modelling and analyzing Web navigational behaviours with Web mining analytics like Web Log Analyzers provide organizations with huge information that can be processed and analyzed for pattern discovery.  Results from the analysis of Website navigational behaviours are indispensable knowledge for business intelligence applications and web-based personalization systems. Nevertheless, the dynamic nature of online navigational behaviours presents a serious challenge to intelligent information extraction.

The primary aim of the study is to examine and evaluate Web mining applications to develop tourist-based e-marketing strategies. More specifically:

1. To explore customer navigational behaviours using Web usage mining.

2. To identify uses of Web mining data to develop personalized e-marketing strategies.

Thus the research hypothesizes the following:

H1: There is no significant relationship between bounce rate and loading time

H2: There is no significant relationship between pageviews and loading time.

H3: There is no significant relationship between pages visited and loading time.

H4: There is no significant relationship between bounce rate and pages visited.

 

2 RESEARCH BACKGROUND

A stream of researchers devoted their work to investigate Web mining techniques and tools. Others focused on Website design and the factors influencing purchasing intentions of online consumers. Only few explored the uses of Web mining data in designing e-marketing strategies.

A study by Lee et al. (2005) focuses on one of the Web mining methods namely; Web traversal pattern mining which is used to discover users’ access patterns from Web logs and how to use this data to satisfy users’ requirements. Several studies like the study by Jalali et al. (2010) and Wang et al. (2005), focused on using Web usage mining (WUM) as a tool to analyze customer navigational behaviours to improve the efficiency of their Websites. The study by Liu et al. (2007) introduces a combined methodology of Web content mining and Web usage mining of Web server logs to categorize user navigational patterns and predict users’ future requests.

Intelligent systems, which are used as agents that analyze customer’s behaviours and business strategies, can help travel agencies build marketing strategies and overcome the threat of disintermediation. The goal of the work by Buyukozkan et al. (2011) was to propose an intelligent module which can be integrated in tourism Websites to help customers in their choice of destinations during their decision-making process. A further study by Wang et al. (2007) proposes a method that can automatically mine key information from Web pages.

Although tourism is dominated by e-business systems and applications, also being a suitable candidate for these applications, relatively few attempts have been made to explore the huge potentials of Web mining in e-tourism. In their attempt to model the navigation behaviour of hotel guests, Schegg et al. (2005) analyzed log-files of 15 Swiss hotels. Their findings identified the average visitor stay at a site, views, search keywords, top 10 search words and referring search engines.

Some researchers focused on evaluating electronic tourist-based Websites like the study by Choi, et al. (2007) that attempted to identify the image representations of Macau on the Internet by analyzing the contents of different Web information sources—Macau official tourism Website, tour operators and travel agents’ Websites, online travel magazine and online travel ‘‘blogs.’’

A study by Liao et al (2009) sheds light on customer relationship management as a competitive strategy that businesses should use in order to stay focused on the needs of their customers. The study uses a data mining algorithm, which is implemented for mining customer knowledge from a firm in Taiwan. Knowledge extracted through data mining showed patterns that can be used by the case firm for new product development and customer relationship management.

A study by Pitman, et al. (2010) introduced a workflow for utilizing Web server log data in Web Usage Mining.

Built upon a number of previous studies, a research by Xiang et al. (2011) was conducted to identify patterns in online travel queries across tourist destinations. They utilized transaction log files from a number of search engines.

The study by Olmeda et al. (2001) analyzes the potential uses of Data Mining techniques in Tourism Internet Marketing and electronic customer relationship management.

 

 

3 LOG FILES AND PATH TRAVERSAL PATTERNS

The main data source for Web usage mining, Web server log files are generally stored in Common Log File format. Every log entry traces and records the path of the user from one page to another, storing user IP number or domain name, time and type of access method (GET, POST, etc.) and address of the page being accessed. This format was later expanded (Extended Log Format) to include more fields, such as referrer address (i.e. Web page that originated the access) (Boullosa et al., 2002).

Access logs are the source of information which records every transaction between the server and browser. We can detect and analyze users’ activities on a Website using Web servers’ access log. The following figure (1) shows the different fields of an access log with a common log format.

Figure 1: Fields of a Common log format.

Source: (Fong et al., 2002)

 

Log files are a valuable tool for Web developers to learn about why and when clients are accessing the Website. Although log files may not immediately provide details of user patterns of each visit, they may reveal meaningful and useful information by further analysis. Several studies tried to introduce approaches to visualize path traversal patterns or paths of Web surfers like Wang et al. (2011).

Knowledge gained from Web usage mining can help organizations predict user behaviours within the site; identify mostly visited sites and the sequence in which customers access the sites.

 

 

4 RESEARCH METHODOLOGY

This study could be categorized as being a descriptive-analytical research, since it involves gathering data from a private tourism Website through Web usage mining techniques and software analytics, analyzing data, discovering patterns and finally putting guidelines for an effective e-marketing strategy.

Web Mining software analytics were utilized to extract data from the Website of a private tourism company. This is accomplished by analyzing Web server’s log files which are a commonly available data source for learning about visitors’ navigational behaviours.

The secondary data conveyed by Google Analytics was used in SPSS (version 20) to manoeuvre with the data and get more sophisticated and clear insights about customer behavioral patterns. Statistical analyses like correlation and regression were used to identify the relationship between variables and test hypotheses.

The final includes setting up a framework for an e-marketing strategy using the results of previous steps. A framework for developing e-marketing strategies called SOSTAC model will be applied.

In this research data collection was mainly based on self-administrated survey since it involves analyzing and evaluating the Website of the case organization.

This research falls into the category of non-probability sampling. This purposive or intentional sampling method was chosen due to the nature of research. The case study was purposely chosen based on the fact that it is a well-known brick-and mortar tour operator, existing since 1955 and has a solid base concerning Internet-based services. In addition to that, this company was a Google analytics subscriber.

 

 

4.1 Google Analytics Reports

Web analytics in general enable organizations to examine visitor traffic and their activities across their sites. Web analytics are a precious tool to achieve a dynamically targeted content, and justify budgets based on historic and predictive modelling.

 

4.1.1 Visits

Visits are one of the most basic metrics and a starting point for analysis. Visits can give general insights into the Website traffic.

Examining the visits numbers using different time spans, monthly sum and mean it can be noticed:

The figures of the variable visits dropped out remarkably from the 6th of October 2012 till the 6th of October 2013. Tracking the monthly sum of visits it can be noticed that it started with 10000 visits per month, then it started decreasing with a slight fluctuation till it reached 1408 visits at the end of the tested period. This means that the visits dropped from the beginning of the period till the end by 86% which is a remarkably high drop-out rate. The chart in figure 2 obviously visualizes this drop-out trend. Consequently, the average visits per month also dropped remarkably starting by average visits of 322 per month till it reached an average of 46 visits at the month of November 2013.

Figure 2: Graphical representation of the variable visits

Source: Based on the data of Google Analytics, 2013.

Comparing the number of visits of the year starting from October 2012 till October 2013 and the previous year it can be noticed that the overall number of visits dropped by approximately 13% from the year 2012-2011 compared to previous year accounting for 86,388 as total number of visits in 2011-2012 versus 74,858 in 2012-2013 (Figure 3).

Figure 3: A comparison of the Visits figures of the yearly period starting from 06 October 2012-06 October 2013 and the previous year.

Source: Google Analytics, 2013.

 

4.1.2 Page views:

According to Stokes (2011) page views are the number of times a page was successfully requested”.  In order to improve the user experience, information architecture and relevancy of content on the site, it is important to keep an eye on the page views metric.

Examining page views numbers revealed: The figures of the variable page views dropped out remarkably from the 6th of October 2012 till the 6th of October 2013 which goes parallel to the drop-out trend of visits. Tracking the monthly rates of page views it can be noticed that it started with 21846 page views per month, and then it started decreasing with a slight fluctuation till it reached 2787 page views at the end of the tested period. The page views figure dropped from the beginning of the period till the end of the same period by approximately 88%, which is considered a high withdrawal in the page views rate. Comparing the performance of the page views with the previous year 2011-2012 it can be noticed that the overall pageviews dropped by 19%.

 

4.1.3 Pages / Visit (Page views per visit)

According to Stokes (2011) page views per visit are – “the number of page views in a reporting period divided by the number of visits in that same period to get an average of how many pages are being viewed per visit”.

 

Table 1: Average page views per visit per month in 2013

 

Source: Adopted from Google Analytics, 2013.

Examining the variable page views per visit of the company, it can be noticed that there are no major changes in the averages per month. As page views per visit is a composite variable, which can be split into total page views and total visits, this decrease could be due to fluctuations in the figures of page views or visits of each month (Table 1).

 

4.1.4 Bounce rate:

Bounce rate are “(sometimes confused with exit rate) is an Internet marketing term used in Web traffic analysis. It represents the percentage of visitors who enter the site and "bounce" (leave the site) rather than continue viewing other pages within the same site” (HMTWeb.com). Bounce rate is one of the most important metrics to observe. There are a few exceptions, but a high bounce rate usually means high dissatisfaction with the Website (Stokes, 2011).

High bounce rates could be the result of some factors. Some of these factors may be loading time, poor content or dazzling Web layout. There are several strategies that could be taken in order to improve bounce rates. Bounce rates affect total page views, pageviews per visit and average visit duration.

Table 2: Average Bounce rates per month for the Website

Source: Based on Google Analytics data, 2013

Examining the bounce rates of the case study shows (Table 2), that there is a slight change in the bounce rates and that the average bounce rates of the year accounts for 72.60%. Comparing this average with the average bounce rates of Alexa on the 31st of October 2013 (Alexa rates are average rates of the last three month) of some relevant e-mediaries and local and international travel agencies and tour operators it is clear that the bounce rates of the case study are far away from the average bounce rates of the comparative cases. The average bounce rates of the comparative cases range from 17% to 37%.

 

4.1.5 Hourly and daily overview:

Looking at the visits` hourly overview of the case company for a ten day time span starting from 25th of October 2013 to 04 of November 2013 it can be noticed that (Figure 4):

 

Figure 4: Hourly overview of the numbers of visits

Source: Google Analytics, 2013.

 

There is an hourly trend. Most visits are accumulated in the hours between 9am and 23pm. Visits increase gradually from 9 am till they reach the peak at 3 pm, start to decrease gradually till 6am, then an increase till 23pm can be noticed. This trend can have certain implications concerning the choice of the timing of certain marketing activities specially when linked with the gender and interests report provided by Google Analytics. Weekday distribution of the visits shows that most traffic is generated at the weekend (Thursdays and Fridays).

 

 

4.1.6 Social media overview:

The social media overview report gives the user a glance at how social media platforms are contributing to the overall activity of the company`s Website.

The social media overview report shows that the total visits for a year span accounted for 74,858 while visits via social media platforms accounted for 459 visits, contributing by 2 conversions with $2.00 value of total 412 site conversions ($412.00 of total value).

A visit from a social referral may result in a conversion immediately, or it may assist in a conversion that occurs later on. Referrals that generate conversions immediately are labelled as “Last Interaction Social Conversions” in the graph. If a referral from a social source does not immediately generate a conversion, but the visitor returns later and converts, the referral is included in “Assisted Social Conversions” (Google Support, 2013). In the case company “Last Interaction Social Conversions” and “Assisted Social Conversions” accounted for 2 each.

 

 

4.1.7 Landing pages

The Landing Pages tab shows the top landing pages from social visits.

 

Table 3: Social landing pages

 

Source: Google Analytics, 2013

Table 3 it can be shows, that the homepage of the company is the most popular landing page being shared with 62 visits for a year time span. It can be noticed from Table 4 that most visits originated from Facebook followed by TripAdvisor and that TripAdvisor has the highest average pages per visit (00:04:45), highest pageviews and highest average pages/visit. This can give insights to marketers when planning their social media campaigns to focus on quality channels.

 

 

 

Table 4: Breakdown of social networks related to the first landing page.

Source: Google Analytics, 2013.

 

4.1.8 Conversions

Google Analytics give the user the opportunity to identify the full value of traffic coming from social sites and determine how social media platforms lead to direct conversions or assist in future conversions. Companies can, for example, measure the effect of a newly published video or blog on the traffic and whether it was shared and led to conversions. Social media contributed in this case to only two conversions in a year span from Youtube generating $2.00 as a conversion value. This should give an alert to the case company that they should try to reconsider their strategy concerning social media and try to harvest the wide-ranging benefits of social media and viral effect created by these platforms.

 

4.1.9 Overview

This report provides marketers with an overview of conversion metrics for all goals and also for every goal separately.

It can be detected that the company`s Website generated 418 total conversions with a total of $418 as goal value. The Sightseeing Reservation Goal has the most goal completions (198). It can also be noticed that the Transfer Reservation Goal hasn`t achieved any goal completions (Table 5).

 

Table 5: Conversion overview report

Source: Google Analytics, 2013.

 

 

4.1.10 Funnel Visualization

The marketer sets up a funnel that he thinks prospects should follow in order to achieve a certain goal. At each stage, the marketer can see how many people enter at that stage, how many people are continuing in the funnel from the previous stage, how many people leave at that stage without completing, and perhaps most importantly, where they are going (Google Analytics, 2013). The case company hasn`t set up any funnels.

 

4.1.11 Goal Flow

The Goal Flow Report visualizes the path visitors used through a funnel towards a Goal. The final node in this report represents the Goal, and the other nodes represent funnel steps (Google Analytics, 2013). Examining the goal flow report users might find a page in the funnel that leads to a large amount of exits or that the navigation from a visitor’s perspective is different than that expected path set up by the marketer when he developed the funnel. The analysis shows that for Goal 2,  198 conversions took place. But as the company did not set up any funnels, no funnel conversion rate can be detected. Figure 5 shows that Google as a source is generating most conversions for Goal 2 accounting for 85 conversions. This is followed by direct searches accounting for 2 conversions.

 

 

Figure 5: Goal flow for Goal 2 by source.

Source: Google Analytics, 2013.

 

4.1.12 Path length

The path length report shows the number of interactions that took place before a conversion happens. This is important in showing whether visitors need several clicks in order to reach the goal. If so, marketers should be considering eliminating unnecessary pages in order to reduce confusion and make visitors find quickly the information they need. Table 6 shows the number of interactions with its associated conversions. It seems that 72% of the conversions took place after one interaction. 16% of the conversions took place after 2 interactions and 5% of total conversions happened after three interactions.

It seems that the visitors that convert after one interaction are transferred directly to the landing page which is in that case the last page in the funnel. Also the fact that 15% of the conversions happened after two interactions infers a plus for the company Website.

 

Table 6: Path length

Source: Google Analytics, 2013

 

5 HYPOTHESES TESTING

5.1  Using Person`s Correlation:

Experimenting with different functional forms including linear, semi-log and double-log functions, the double-log-function fitted the data the most.  

Running Durbin-Watson statistic on initial regressions showed valued below 2 (DW significantly below 2 indicates high autocorrelation). Therefore, a lag variable was introduced in the equation to reduce the effect of autocorrelation.

 

 

Table 7: Pearson correlation of multiple variables

 

The researcher performed a Pearson`s correlation to examine the relationship between multiple variables. These variables are: pageviews, loading time, visits, bounce rate, pages visited and time. The analysis showed the following relationships (Table 7):

1. Page views and loading time:

Pearson’s r is - 0,269 with a high significance value (p= 0.000). This number is close to 0 which indicates a weak relationship between loading time and pageviews. In this example, Pearson’s r sign is negative. This indicates a negative relationship between the two variables, i.e. when the loading time increases pageviews decreases or when the loading time decreases the variable pageviews increases.

2. Page views and bounce rate:

Pearson’s r is - 0,234 with a high significance value (p= 0.000). This number is close to 0 which indicates a weak relationship between bounce rate and pageviews.

In this example at hand Pearson’s r sign is negative. This indicates a negative relationship between the two variables, i.e. when the bounce rate increases pageviews decrease or when the bounce rate decreases the variable pageviews increases.

3. Page views and visits:

Pearson’s r is 0,910 with high significance value (p= 0.000). This number is close to 1 which indicates a strong relationship between visits and pageviews.

Pearson’s r sign is positive. This indicates a positive relationship between the two variables, i.e. when the visits increase pageviews also increase or when the visits decrease the variable pageviews decrease.

4. Page views and pages visited or pages visited and visits:

The correlation between the two variables won`t be accurate as pages visited is a composite variable from visits and pageviews.

5. Page views and time:

Pearson’s r is - 0,675 with a high significance value (p= 0.000). This number is close to 1 which indicates a strong relationship between time and pageviews.

In this example, Pearson’s r sign is negative. This indicates a negative relationship between the two variables, i.e. when the time index increases pageviews decrease or when the time index decreases the variable pageviews increases.

6. Loading time and bounce rate:

Pearson’s r is 0,279 with a high significance value (p= 0.000). This number is close to 0 which indicates a weak relationship between the two variables.

In this example, Pearson’s r sign is positive which indicates a positive relationship between the two variables, i.e. when the loading time increases bounce rate also increases or when the loading time decreases the variable bounce rate decreases.

7. Loading time and visits:

Pearson’s r is - 0,223 with a high significance value (p= 0.000). This number is close to 0 which indicates a weak relationship between the two variables.

In this case, Pearson’s r sign is negative. This indicates a negative relationship between the two variables, i.e. when loading time increases visits decrease or when the loading time decreases the variable visits increases.

8. Loading time and pages visited:

Pearson’s r is - 0,281 with a high significance value (p= 0.000). This number is close to 0 which indicates a weak relationship between loading time and pages visited.

Pearson’s r sign is negative. This indicates a negative relationship between the two variables, i.e. when the loading time increases pages visited decreases or when the loading time decreases the variable pages visited increases.

9. Loading time and time:

Pearson’s r is 0,321 with a high significance value (p= 0.000). There is a moderate relationship between the two variables.

In this example, Pearson’s r sign is positive. This indicates a positive relationship between the two variables, i.e. when the time index increases loading time also increases or when the time index decreases the variable loading time decreases.

10. Bounce rate and visits:

Pearson’s r is – 0.97 which is only significant at point 0.1 level (p= 0.064). This number is close to 1 which indicates a strong relationship between the two variables in the opposite direction.

11. Bounce rate and pages visited:

Pearson’s r is - 0,711 with a high significance value (p= 0.000). This number is close to 1 which indicates a strong relationship between the two variables.

Pearson’s r sign is negative which signifies a negative relationship between the two variables, i.e. when the bounce rate increases pages visited decreases or when the bounce rate decreases the variable pages visited increases.

12. Bounce rate and time:

Pearson’s r is 0,269 with a high significance value (p= 0.000). This number is close to 0 which indicates a weak relationship between bounce rate and time.

In this example, Pearson’s r sign is positive. This indicates a positive relationship between the two variables, i.e. when the time index increases bounce rate increases or when the time index decreases the variable bounce rate decreases.

13. Visits and time:

Pearson’s r is - 0,654 with a high significance value (p= 0.000). This number is close to 1 which indicates a strong relationship between the two variables. Pearson’s r sign is negative. This indicates a negative relationship between the two variables, i.e. when the time index increases visits decrease or when the time index decreases the variable visits increases.

 

14. Time and pages visited:

Pearson’s r is - 0,218 with a high significance value (p= 0.000). This number is close to 0 which indicates a weak relationship between time and pages visited. Pearson’s r sign is negative. This indicates a negative relationship between the two variables, i.e. when the time index increases the variable pages visited decreases or when time decreases the variable pages visited increases.

 

 

 

5.2 Regression

a. Regression: Pageviews and Loading

SPSS will generate a few tables of output for a regression analysis. The research will be only focusing on the tables and coefficients required to understand the regression output.

* Determining how well the model fits (Model summary table):

-  The R shows the correlation between the observed and predicted values of dependent variable. Here the correlation coefficient is 0,914 which indicates a strong positive relationship between the two variables.

R-Square - This is the proportion of variance in the dependent variable (pageviews) which can be explained by the independent variable (loading time). 

The R-square value is 0.835 which indicates that the independent variable (loading time) explains 83% of the variability of the dependent variable (pageviews) (Table 8).

Table 8: Model Summary for the variables loading and pageviews

Model Summaryb

Model

R

R Square

Adjusted R Square

Std. Error of the Estimate

Durbin-Watson

1

.914a

.835

.834

.27216

2.416

a. Predictors: (Constant), lnloading, laglnpageviews

b. Dependent Variable: lnpageviews

Source: SPSS

** Statistical significance: (ANOVA table):

The F-ratio in the ANOVA table tests whether the overall regression model is a good fit for the data. The table shows that the independent variables statistically significantly predict the dependent variable as F= 910, p < .0005 (i.e., the regression model is a good fit of the data) (Table 9).

 
 

Table 9: ANOVA table for the variables loading and pageviews

ANOVAb

Model

Sum of Squares

df

Mean Square

F

Sig.

1

Regression

134.838

2

67.419

910.189

.000a

Residual

26.592

359

.074

 

 

Total

161.430

361

 

 

 

a. Predictors: (Constant), lnloading, laglnpageviews

b. Dependent Variable: lnpageviews

Source: SPSS

***Parameter estimates (Coefficients table):

The following output is obtained from the Coefficients table (10), as shown below:

Table 10: Coefficients table for the variables loading and pageviews

Coefficientsa

Model

Unstandardized Coefficients

Standardized Coefficients

t

Sig.

B

Std. Error

Beta

1

(Constant)

.533

.178

 

2.996

.003

laglnpageviews

.916

.022

.910

40.780

  .000

lnloading

-.015

.028

-.012

-.542

.588

a. Dependent Variable: lnpageviews

Source: SPSS

        The Model column shows the predictor variables. B stands for the values for the regression equation for predicting the dependent variable from the independent variable. The coefficient for loading time is -0.015.  So for every 1% increase in loading time, a 1.5 % decrease in pageviews is predicted, ceteris paribus (holding all other variables constant).

- t- values and Sig. - These are the t-statistics and their associated 2-tailed p-values used in testing whether a given coefficient is significantly different from zero. As p is not  < .05 it can be deduced that the coefficients are not statistically significant, i.e. loading time cannot predict the variable pageviews.

Applying a linear regression it showed the following:

- The regression model is a good fit for data as F= 910, p < .0005.

- R-square value is 0.83 which shows that the independent variable (loading time) explains 83% of the variability of the dependent variable (pageviews).

- The coefficient for loading time is -0.015.  So for every 1% increase in loading time, a 1.5% decrease in pageviews is predicted, holding all other variables constant.

 

b. Regression: Loading and Pages visited

The first table of Model Summary as previously stated determines how well a regression model fits the data (Table 11):

Here the correlation coefficient is 0.47 which indicates a moderate positive relationship between the two variables.

R-Square - This is the proportion of variance in the dependent variable (pages visited) which can be explained by the independent variables (loading time). 

It can be dedected from the value of 0.22 that the independent variable (loading time) explains 22% of the variability of the dependent variable (pages visited).

 

Table 11: Model Summary for the variables loading and pages visited

Model Summaryb

Model

R

R Square

Adjusted R Square

Std. Error of the Estimate

Durbin-Watson

1

.472a

.223

.218

.14964

2.116

a. Predictors: (Constant), lnloading, laglnpagesvisited

b. Dependent Variable: lnpagevisited

Source: SPSS

The F-ratio in the ANOVA table (12) as previously mentioned tests whether the overall regression model is a good fit for the data. The table shows that the independent variables statistically significantly predict the dependent variable as F= 51, p < .0005 (i.e., the regression model is a good fit of the data).

 

Table 12: ANOVA table for the variables loading and pages visited

ANOVAb

Model

Sum of Squares

df

Mean Square

F

Sig.

1

Regression

2.303

2

1.151

   51.413

    .000a

Residual

8.039

359

.022

 

 

Total

10.341

361

 

 

 

a. Predictors: (Constant), lnloading, laglnpagesvisited

b. Dependent Variable: lnpagevisited

Source: SPSS

The coefficient for loading time is -0.054.  So for every 1% increase in loading time, a 5.4% decrease in pages visited is predicted, (holding all other variables constant).  As p < .05, it can be concluded that the coefficients are statistically significant (Table 13).

Table 13: The Coefficient table for the variables loading and pages visited

Coefficientsa

Model

Unstandardized Coefficients

Standardized Coefficients

t

Sig.

B

Std. Error

Beta

1

(Constant)

.552

.066

 

8.402

.000

laglnpagesvisited

.397

.049

.396

8.150

.000

lnloading

-.054

.016

-.166

-3.413

.001

a. Dependent Variable: lnpagevisited

Source: SPSS

The regression model is a good fit for data as F= 51, p < .0005. R-square value is 0.22 which shows that the independent variable (loading time) explains 22% of the variability of the dependent variable (pages visited). The coefficient for loading time is -0.054.  So for every 1% increase in loading time, a 5.4% decrease in pages visited is predicted.

c. Multiple regression: Bounce as dependent variable and loading time and time index as independent variables

The Model Summary table (14) as previously mentioned can be used to determine how well a regression model fits the data and it includes: Here the correlation coefficient is 0.34 which indicates a weak positive relationship between the dependent and independent variables.

R-Square - is the proportion of variance in the dependent variable (bounce rate) which can be explained by the independent variables.  The R-square value is 0.11 which indicates that our independent variable explains 11% of the variability of the dependent variable (bounce rate).

 

Table 14: Model Summary for the variables bounce, loading and time index

Model Summary

 Model

R

R Square

Adjusted R Square

Std. Error of the Estimate

1

    .341a

.116

.111

.07632

a. Predictors: (Constant), lnloading, lntimeindex

Source: SPSS

The F-ratio in the ANOVA table (15) shows that the independent variables statistically significantly predict the dependent variable as F= 23, p < .0005 (i.e., the regression model is a good fit of the data).

Table 15: ANOVA table for the variables bounce, loading and time index

ANOVAa

Model

Sum of Squares

df

Mean Square

F

Sig.

1

Regression

.276

2

.138

 23.678

   .000b

Residual

2.097

360

.006

 

 

Total

2.372

362

 

 

 

a. Dependent Variable: lnbounce

b. Predictors: (Constant), lnloading, lntimeindex

Source: SPSS

The following output is obtained from the Coefficients table (16), as shown below.

 

Table16: Coefficients table of the variables bounce, loading and time index

Coefficientsa

Model

Unstandardized Coefficients

Standardized Coefficients

t

Sig.

B

Std. Error

Beta

1

(Constant)

-.509

.028

 

  -18.412

.000

lntimeindex

.017

.004

.207

   3.958

.000

lnloading

.033

.008

.212

 4.059

.000

a. Dependent Variable: lnbounce

Source: SPSS

 

The coefficient for loading time is 0.033.  So for every 1% increase in loading time, a 3.3 % increase in bounce rate is predicted. The coefficient for time is 0.017.  So for every 1% increase in time, a 1.7 % increase in bounce rate is predicted. As  p < .05 it can be concluded that the coefficients are statistically significant.

The regression model is a good fit for data as F=23, p < .0005. R-square value is 0.11 which shows that the independent variables explain 11% of the variability of the dependent variable (bounce rate). The coefficient for loading time is 0.033.  So for every 1% increase in loading time, a 3.3 % increase in bounce rate is predicted. The coefficient for time is 0.017.  So for every 1% increase in time, a 1.7 % increase in bounce rate is predicted.

 

Based on the results of the Correlation and regression analysis it can be deduced that:

H1: There is no significant relationship between bounce rate and loading time.

** H1 can be rejected substantiated by the results of correlation and regression analysis.

H2: There is no significant relationship between pageviews and loading time.

** H2 cannot be rejected substantiated by the results of regression analysis.

H3: There is no significant relationship between pages visited and loading time.

** H3 can be rejected substantiated by the results of correlation and regression analysis.

H4: There is no significant relationship between bounce rate and pages visited.

** H4 can be rejected substantiated by the results of the correlation analysis.

 

 

6 DEVELOPING AN E-MARKETING STRATEGY FRAMEWORK USING WEBMINING APPROACHES

In order to compete successfully in a market it is essential to develop an integrated coherent and customer-focused marketing strategy. The virtual space has its own characteristics that have to be put into consideration when developing an e-marketing strategy. The following part will examine how Webmining approaches and techniques could be incorporated in the SOSTAC model. This model can be divided into five stages according to Chaffey et al. (2008) in order to help electronic enterprises facilitate their strategy design.

First stage “Situation Review”:

A situation review has to take place as a first step in the formulation of an e-marketing strategy. This situation review incorporates several analyses e.g competitor analysis, customer research…etc. Using Webmining approaches could provide strategy developers with precious information that could help them build their strategy.

a- The first part of the situation review includes examining the contribution of the Internet to the organization. The approach in this study suggests that the analyzer could effectively use WUM software in order to extract useful data. Path traversal patterns and reports generated by WUM analytics are utilized in order to show entry and exit points, user access patterns, association rules between pages, conversion rates and predict future user navigational behaviors. This kind of information can depict a clear view of the real contribution of the Internet to the organization.

b- Also Webmining approaches can be utilized when performing resource analysis. A resource analysis involves reviewing the capabilities of the organization in delivering its online services. Online market share could be revealed using tools such as Hitwise and Netratings. In addition to that, technology infrastructure resources which include assessing performance and speed of the Website, the need for applications to enhance customer experience like on-site search or customization facilities, some concepts like content management, customer relationship management using WSM for ranking and backlink and WCM for a fine-tuned content may be utilized.

c- Customer research and building customer databases could be also performed using WUM tools. WUM software generate reports which include some primary information about customer behaviours like entry hours and days, nationality of Website users and also a full view of customer profiles (e.g Alexa). A variety of Web mining algorithms are now available and can be used to generate more sophisticated customer KPI (Key Performance Indicators).

d. WSM could be also of great help when generating a competitor analysis. Page ranking which relies on assessing popularity of the site can depict a clear view of the status of the Website compared to other competitive Websites. 

e. Nevertheless, information for intermediary analysis could be also provided by using WUM software which report referrer pages. This can indicate whether intermediaries are playing an effective role in promoting the site.

f. The SWOT analysis can also be performed using the information generated by WUM, WSM and WCM tools.

Second stage: Setting objectives:

Looking closer at the main advantages which can be at the same time the objectives of Internet marketing, it becomes clear that Web mining approaches can play a major role in achieving the objectives and at the same time maximizing the advantages.

Webmining approaches could be used in order to achieve e-marketing objectives. For example WUM approaches with path analysis could be used in order to identify customer needs and preferences or perform collaborative filtering techniques; thus make personalized recommendations to customers to maximize sales. Also tracking customer navigational behavior can help developers identify points where customers leave the page without reaching the goal. Also restructuring the Web pages and omitting pages which are unnecessary can maximize conversion rates.

Furthermore, using WUM and WSM techniques can help better serve customers. Personalized recommendations, tracking customer access and exit points, clustering customers in groups and predicting customer navigational behavior can be very helpful in customer relationship management strategies. In addition to that using WSM for page ranking could also be of great help to developers to get an idea about page popularity in the virtual competitive space.

WCM and WSM techniques and approaches can also help marketing strategy developers speak to their customers. WCM software can show relational word clusters and word associations. In addition to that, WSM can show page popularity by identifying front- and backlinks which show how much this page is cited by other popular pages.

An organization can also save some costs by using WUM and WSM techniques. Market basket analysis, personalization, association rule mining can help organizations plan cross-selling strategies, recommend personalized recommendations and avoid churn.

Last but not least using WSM techniques like page rank can give strategy developers guidance to launch SEO or PPC campaigns to rank higher in the SERP and thus maximize brand awareness. Using popular keywords in titles and metatags can help to rank higher.

Third and fourth step “How do we get there” and “How exactly do we get there”:

After the e-marketing developer identifies the Internet marketing objectives the strategies should be formulated and ways depicted of how to achieve these objectives. There are several competitive strategies that can meet the desired objectives. The following table (17) shows several marketing objectives, Webmining approaches and related marketing tactics. These marketing tactics can be used in order to achieve marketing objectives. The following review will explain how every marketing objective can be accomplished by using Web mining data extracted from the Websites.

Table 17: Marketing strategic objectives and related Web mining techniques and approaches

Marketing strategic objectives

Definition of Marketing strategic Objectives

(Chaffey et al., 2008)

Web mining

approach

Web mining technique

Marketing tactics

Differentiation

Offering more value to customers to gain a competitive advantage.

WUM

WCM

- Web analytics reports.

- Recommender systems using collaborative filtering)

- text mining

- Personalization

- enhance page

- WebPR

- e-mail marketing

- customer reviews

Product development

Products are developed according to customer needs.

WUM

 

- Path analysis

- Market basket analysis

- shopping carts analysis

-  association rule mining

- classifying customers

- cross-selling

- discount vouchers

- promotions on certain items

 

Customer acquisition

(Brand awareness)

The Internet is used to sell existing products to new customers.

Selling into new geographical areas taking advantage of the low cost advertising opportunities without the necessity of setting up sales infrastructure in the customer countries.

WSM

WUM

WCM

- referrer pages

- path analysis

- page ranking

- content mining

- SEO ( better ranking and landing pages)

- PPC

- affiliate marketing

- online ads

- enhance page

- WebPR

- ORM (Online Reputation Management

 

Customer retention, to avoid churn

 

WUM

- clustering

- prediction

- recommender systems

- Personalization

- pop-ups

- reminder e-mail

-  ORM (Online Reputation Management

Focus, targeting and

communication

Perform some functions to speak and listen to customers

WUM

- Web log analysis

-clustering  classification, collaborative filtering

- personalization

- SEO,PPC

- social media

- e-mail marketing

Cost leadership

And value chain efficiencies

- decrease operation costs by attracting customers to do transactions online

- decrease marketing costs by delivering customized offers

WCM

WUM

- text mining

- Keyword analysis

- path analysis

- collaborative filtering

- SEO, PPC

 

- personalization

Market penetration

Selling existing products in existing markets.

WSM

- page ranking

- SEO

- PPC

- social media

-e-mail marketing

- WebPR

Diversification

 

Selling more new products in new markets

WCM

WUM

- text mining

- collaborative filtering

- enhance the page

- personalization

 

Revenue generation

Completing transactions online

WUM

- Entry and exit point identification

- shopping cart analysis

- path analysis

- enhance the page layout and navigation

- personalization

 

Channel partnership

Choosing affiliates which are highly ranked to insert hyperlinks of the brand in these sites to drive traffic my own web site

WSM

- page rank (back and in-links of affiliates)

- affiliate marketing

- PPC

 

Customer conversion and enhancing customer experience

 

- safe payments

- bouncy rates

- convert visits into leads

 

WCM

WUM

- text mining

- path analysis

- personalization

- customer reviews and ratings

- tailored promotions

- improve on-site search engines

 

1- Differentiation: could be a strategic objective and aims at offering different services to customers. Using Webmining approaches could be a bonus for the organization to differentiate from others. Using WUM techniques for instance like path analysis to track customer navigational behavior can be very useful to developers in avoiding unnecessary Webpages which distract users or force them to migrate away. Also by using collaborative filtering algorithms, the organization can recommend products and services according to similar user`s preferences. Also differentiation tactics involve performing e-mail marketing, communication with the customers using WebPR activities.

2-Product development: Products should be developed according to customer needs. WUM techniques and algorithms like path analysis, Web log analysis and shopping cart analysis could give insights about the products that are mostly favoured and purchased by customers. That way the organization can use some marketing tactics like cross-selling strategies or offer discount vouchers and promotions on certain items.

3- Customer acquisition or brand awareness: can be accomplished by performing Search Engine Marketing (SEM) approaches like SEO or PPC. Search engine marketing, or SEM, is a form of Internet marketing that aims at promoting Websites by increasing their chance to appear in Search Engine Result Pages (SERPs).  Search engine marketing covers a number of techniques or strategies to enhance the Website’s visibility in SERPs (Xiang et al., 2011).

4- Customer retention (avoid churn): in order to keep customers loyal to the brand, strategy developers must perform some activities that optimize customer loyalty.  Some of these activities include personalized offers, recommendations, pop-up, after-sale follow-up e-mails and online reputation.

In order to keep customer loyalty through the above mentioned activities, the organization must keep a solid customer base. This can be achieves through WUM techniques like analyzing cookies and Web logs and performing some datamining techniques on the extracted data like clustering, prediction and classification. Categorizing customers in groups makes them easy to target with e-mails, online ads and promotions.

5- Focus, targeting and communication: One of the mostly used marketing strategic objectives is focus and targeting. Segmenting customers and focusing on one cluster and directing marketing efforts to it can be beneficiary. WUM techniques can be very useful in performing segmentation, clustering and classification. Also collaborative filtering; making recommendations based on similar preferences of previous customers can be very useful in this domain. Marketing tactics for focus and targeting include personalization and recommendation systems and e-mail marketing. Communication can be accomplished through SEO, PPC and monitoring social media activities (CGC- Customer Generated Content).

 

 

 

Fifth step “The details of tactics, who does what and when”:

In this step Webmining approaches cannot help developers in assigning responsibilities on the employees as it is more an administrative task. Nevertheless, Webmining approaches can give some information about how much external agencies are playing an important role in promoting the services and products of the organization. This can be examined from the reports of Web log analyzers, which show referrer pages. Also, the importance of external agencies or intermediaries can be assessed by evaluating PPC campaigns on sales, brand awareness and conversion rates.

Sixth stepHow do we monitor performance”:

The five diagnostic categories for e-marketing measurement include: business contribution, marketing outcomes, customer satisfaction, customer behaviour and site promotion. These insights can be described by using key metrics. For example business contribution can be measured through monitoring online revenue contribution, costs and profitability. Customer satisfaction can be measured through site usability, opinions and repeated visits and purchases.

These were the steps for developing an e-marketing strategy based on the framework of Chaffey et al. (2008). Web mining approaches were used in order to extract useful information that can be successfully used to design e-marketing strategies.

 

 

6      RECOMMENDATIONS

The recommendations can be used by private and public tourism organizations with Web presence to improve their position in the market.

a. Case company Website:

The company should be focusing on retaining their online customer base and try to attract their off-line customers to their Website. More attention should be given to Website quality features especially navigation and accessibility.

The company should be integrating online and offline marketing activities and linking visits to the Website with special offers and promotions. Harmony should exist between off-line and online marketing activities to avoid channel conflict.

It is crucial that the company makes best use of Web mining results and activate disabled features in Google Analytics. It is also recommended that the company uses other Web analytics that have different features than Google Analytic.

It is also recommended that combined analytics, which links Web Structure Mining, Web Content Mining and Web Usage Mining together to make utmost use of extracted data. New dynamic decision tree models should be introduced with the aim to show continuous changes in users` patterns.

The company should be considering decreasing the use of flash and videos to reduce distraction and decrease loading time. SEO strategies should be carefully designed. They include: fine-tuned content in the landing pages, designing successful meta tags and increasing backlinks. Also it is useful for the company to be listed in official Website`s directories like yellow pages for higher rankings.

The company should segment its customers using Web mining results for personalization and target marketing. More attention should be given to PPC campaigns and keyword choice. The company should be focusing on the returns generated by social media. Furthermore, an effective ORM is crucial for viral exposure.

More attention should be given to technical considerations concerning new applications like mobile and tablets. The company should be also considering promoting their Website in numerous search engines like Yahoo for example.

Diversifying the services offered on the Website, specializing in niche services, cross-selling can help the company overcome the instabilities of the tourism sector.

Affiliate campaigns and co-branding can be of great importance for effective marketing.

The company should be taking confident steps towards changing the site from a promotional model promoting brand awareness to an e-commerce model supporting secured e-payments. The company should strengthen the confidence in its online payments.

 

b. Private and Public tourism organizations Websites:

·      The goals of the tourism Website should be a part of the organization-wide strategy.

·      Web mining approaches have a huge potential and should to be efficiently used to guarantee satisfactory customer experience.

·      Web analytics provide organizations with a wide range of information. This information ought to be used to either develop customer-focused e-marketing strategies or utilized to optimize Web designs.

·      The objectives of the Websites should be clear, flexible and concise.

·      The data that is provided by Web analytics can be either used as raw data giving primary insights about customer behavior or they can be subject to further analysis to convey more sophisticated outcomes.

·      Website design should be given first priority, as Websites are the tool used by the company to transmit brand image.

·      Companies should be using SEO and PPC to rank higher in the SERP.

·      Social media is nowadays the key to rank higher, to viral exposure, to successful customer relationship management and effective online reputation management.

·      Marketers should pay attention to Web site quality features and effectively combine it with Web mining outcomes to develop customer-based Websites.

 

 

REFERENCES:

1.     Alexa - The Web Information Company (2014). [ONLINE] Available at: http://www.alexa.com. [Accessed  January 2014].

2.     Boullosa, J.R. and Xexéo, G. (2002), An Architecture for Web Usage Mining, International Conference on Computational Science, pp. 2273–2280

3.     Buyukozkan, G. and  Ergun, B. (2011), Intelligent system applications in electronic tourism, Expert Systems with Applications, Vol. 38, pp. 6586–6598

4.     Chaffey D. and Smith P.R. (2008). E-Marketing excellence: Planning and optimizing your digital marketing (3rd ed.). Oxford: Butterworth-Heinemann.

5.     Chaovalitwongse, W., Pham, H., Hwang, S., Liang, Z. and Pham, C.H. (2008). Recent Advances in Reliability and Quality in Design. Chapter 21: Recent Advances in Data Mining for Categorizing Text Records, Springer.

6.     Choi, S., Lehto, X.Y and Morrison, A.M (2007), Destination image representation on the Web: Content analysis of Macau travel related Websites, Tourism Management, Vol. 28, pp.118–129.

7.     Fong J. and Wong H.K (2002), Online Analytical Mining of Path Traversal Patterns for Web Measurement, Journal of Database Management, Vol. 13(4), pp. 1-23.

8.     Google Analytics- A free- hosted Web Analytic (2013). [ONLINE] Available at: https://www.google.com/analytics/web/?hl=en#home/a24043095w46991877p47287236/. [Accessed October 2013].

9.     Google Analytics Guide (2013). [ONLINE] Available at: http://static.googleusercontent.com/media/www.google.com/en//grants/education/Google_Analytics_Training.pdf. [Accessed August 2013].

10.  Google Support- Analytics Help (2013).  [ONLINE] Available at: https://support.google.com/analytics/answer/1713056?hl=en. [Accessed  November 2013].

11.  HMTWeb ,Glenn Gabe of G-Squared Interactive (GSQi)- Internet Marketing Consulting Services : Online Marketing, SEO, SEM, and Social Media Marketing (2013). [ONLINE] Available at: http://www.HMTWeb.com. [Accessed October 2013].

12.  Horng, J.S and Tsai, C.T. (2010), Government websites for promoting East Asian culinary tourism: A cross-national analysis, Tourism Management, Vol. 31, pp. 74–85.

13.  Jalali, M., Mustapha, N., Sulaiman, M. and Mamat, A. (2010), WebPUM: A Web-based recommendation system to predict user future movements, Expert Systems with Applications, Vol. 37, pp.6201–6212.

14.  Lee, Y., Yen, S. and Hsieh, M. (2005), A Lattice-Based Framework for Interactively and Incrementally Mining Web Traversal Patterns, J. Web. Infor. Syst., Vol. 1 (4).

15.  Liao, S., Chen, Y. S and Deng, M. (2010), Mining customer knowledge for tourism new product development and customer relationship management, Expert Systems with Applications, Vol. 37 pp. 4212–4223.

16.  Liu, H. and Kesˇelj, V. (2007) , Combined mining of Web server logs and Web contents for classifying user navigation patterns and predicting users’ future requests, Data & Knowledge Engineering, Vol.61, pp. 304–330.

17.  Olmeda, I. and Sheldon, P.J. (2001), Data Mining Techniques and Applications for Tourism Internet Marketing, Journal of Travel & Tourism Marketing, Vol. 11(2/3), pp. 1-20.

18.  Pitman, A., Fuchs M., Lexhagen M., and Zanker M. (2010), Web Usage Mining in Tourism - A Query Term Analysis and Clustering Approach.  In: U. Gretzel at al. (Eds.): Information and Communication Technologies in Tourism 2010,   Proceedings of the ENTER Conference 2010, pp. 393-403.

19.  Schegg, R., Steiner, Th., Gherissi-Labben, T. & Murphy, J. (2005), Using Log-File Analysis and Website Assessment to Improve Hospitality Websites, Information and Communication Technologies in Tourism, pp 566-576.

20.  Searchwindowsserver- Windows Server information, news and tips (2012). [ONLINE] Available at: http://searchwindowsserver.techtarget.com. [Accessed  February 2012].

21.  Stokes R. (2011). eMarketing: The essential guide to digital marketing.(4th ed.). Quirk (Pty) Ltd.

22.  Wang, X., Abraham, A. and Smitha, K.A. (2005), Intelligent Web traffic mining and analysis, Journal of Network and Computer Applications, Vol. 28, pp.147–165.

23.  Wang, C., Lu, J. and Zhang, G. (2007), Mining key information of Web pages: A method and its application, Expert Systems with Applications, Vol. 33, pp. 425–433.

24.  Wang, Y. and Lee, A. (2011), Mining Web navigation patterns with a path traversal graph, Expert Systems with Applications, Vol. 38, pp.7112–7122.

25.  Xiang, Z. and Pan, B. (2011), Travel queries on cities in the United States: Implications for search engine marketing for tourist destinations, Tourism Management, Vol. 32, pp.88–97.