Skip to main content
Log in

A methodology based on Deep Learning for advert value calculation in CPM, CPC and CPA networks

  • Focus
  • Published:
Soft Computing Aims and scope Submit manuscript

Abstract

In this research, we propose a methodology for advert value calculation in CPM, CPC and CPA networks. Accurately estimating this value increases the three previous networks’ incomes by selecting the most profitable advert. By increasing income, publishers are better paid and improved services are afforded to advertisers. To develop this methodology, we propose a system based on traditional Machine Learning methods and Deep Learning methods. The system has two inputs and one output. The inputs are the user visit and the data about the advertiser. The output is the advert value expressed in dollars. Deep Learning predicts model behavior more precisely for many supervised problems. The three experiments carried out allow us to conclude that DL is a supervised method that is very efficient in the classification of spam adverts and in the estimation of the CTR. In the prediction of online sales, DLNN have shown, on average, worse performance than cubist and random forest methods, although better performance than model tree, model rules and linear regression methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2

Similar content being viewed by others

Notes

  1. The number in parentheses refers to the year in which the company was founded.

  2. The Classification And Regression Training (Caret) package is a set of functions for creating and evaluating predictive models (caret Package 2015).

  3. R Studio is an integrated development environment oriented toward statistical computing and graphics (Studio 2015).

  4. Architectural depth refers to the number of levels needed to build a nonlinear operation on the learned function (Najafabadi et al. 2015).

  5. Click-bots are programs that are installed on user’s computers without their knowledge and that generate clicks in order to harm the ecosystem of online advertising (Miller et al. 2011).

  6. For the experiments, we used version 3.1.2 (2014-10-31) of the R Studio Software. This software has been executed on a computer Intel®Core™ i5-2400 CPU @ 3.10 GHz with 16 Gb RAM, with operative system Windows 7 Pro, Service Pack 1, 64 bit.

  7. Weka is a data mining software in Java, and it implements a collection of ML algorithms for data mining (Weka 2015).

  8. The H2O DL package is an open-source platform for the automated learning memory (Installation 2015).

  9. In building this classification model, we used a dataset consisting of 3279 adverts out of which 2821 adverts were spam and 458 were not (Set 2015). Each sample had 1558 attributes. Twenty-eight percent of the instances had noisy information or missing values. These missing values were treated with the Weka Software “ReplaceMissingValues” function.

  10. The dataset used to build this classification model was composed of 45,840,617 samples. From this dataset, we randomly chose 10,000 samples applying a statistical scheme (Data Science 2015).

  11. The dataset consists of a total of 751 products and 546 features. The dataset has 12 outputs that indicate the number of sales for the next 12 months. The remaining 534 columns correspond to the product features so that we consider the model inputs. From all inputs, 514 are categorical values and 20 of them are numerical values (Sales 2015).

References

  • Agarwal D, Chen BC, Elango P (2009) Spatio-temporal models for estimating click-through rate. In: Proceedings of the 18th international conference on World wide web, ACM, pp 21–30

  • Arel I, Rose DC, Karnowski TP (2010) Deep machine learning-a new frontier in artificial intelligence research [research frontier]. IEEE Comput Intell Mag 5(4):13–18

    Article  Google Scholar 

  • Balakrishnan S, Chopra S, Melamed ID (2010) The business next door: Click-through rate modeling for local search. Machine Learning in Online Advertising p 14

  • Bauman K, Kornetova A, Topinsky V, Leshiner D (2010) Ctr prediction based on click statistic. In: Workshop: machine learning in online advertising, Citeseer, pp 8–13

  • Bax E, Kuratti A, Mcafee P, Romero J (2012) Comparing predicted prices in auctions for online advertising. Int J Ind Organ 30(1):80–88

    Article  Google Scholar 

  • Beheshti-Kashi S, Karimi HR, Thoben KD, Lütjen M, Teucke M (2015) A survey on retail sales forecasting and prediction in fashion markets. Syst Sci Control Eng: Open Access J 3(1):154–161

    Article  Google Scholar 

  • Bengio Y (2009) Learning deep architectures for AI. Found Trends Mach Learn 2(1):1–127

    Article  MATH  Google Scholar 

  • Bose I, Mahapatra RK (2001) Business data mining: a machine learning perspective. Inf Manag 39(3):211–225

    Article  Google Scholar 

  • caret Package T (2015) The caret package (short for classification and regression training). http://topepo.github.io/caret/index.html, [Online; accessed 05 July 2015]

  • Chen FL, Ou TY (2011) Constructing a sales forecasting model by integrating GRA and ELM: a case study for retail industry. Int J Electron Bus Manag 9(2):107

    Google Scholar 

  • Cho CH, as UoTaAia (2004) Why do people avoid advertising on the internet? J Advert 33(4):89–97

    Article  Google Scholar 

  • Clark J, Koprinska I, Poon J (2003) A neural network based approach to automated e-mail classification. In: Null, IEEE, p 702

  • Dembczynski K, Kotlowski W, Weiss D (2008) Predicting ads click-through rate with decision rules. In: Workshop on targeting and ranking in online advertising, vol 2008

  • Díaz-Uriarte R, De Andres SA (2006) Gene selection and classification of microarray data using random forest. BMC Bioinform 7(1):3

    Article  Google Scholar 

  • Documentation DLH (2015) Deep Learning-H2O 2.8.6.2 documentation. https://s3.amazonaws.com/h2o-release/h2o/rel-markov/1/docs-website/datascience/deeplearning.html, [Online; accessed 22 April 2015]

  • Duarte Torres S, Weber I, Hiemstra D (2014) Analysis of search and browsing behavior of young users on the web. ACM Trans Web 8(2):7

    Article  Google Scholar 

  • Fain DC, Pedersen JO (2006) Sponsored search: a brief history. Bull Am Soc Inf Sci Technol 32(2):12–13

    Article  Google Scholar 

  • Fang Z, Yue K, Zhang J, Zhang D, Liu W (2014) Predicting click-through rates of new advertisements based on the bayesian network. In: Mathematical problems in engineering 2014

  • Fawcett T (2006) An introduction to roc analysis. Pattern Recognit Lett 27(8):861–874

    Article  MathSciNet  Google Scholar 

  • Feily M, Shahrestani A, Ramadass S (2009) A survey of botnet and botnet detection. In: Emerging security information, systems and technologies, 2009. SECURWARE’09. Third International Conference on, IEEE, pp 268–273

  • Fjell K (2010) Online advertising: pay-per-view versus pay-per-click with market power. J Revenue Pricing Manag 9(3):198–203

    Article  Google Scholar 

  • Gabrilovich E, Broder A, Fontoura M, Joshi A, Josifovski V, Riedel L, Zhang T (2009) Classifying search queries using the web as a source of knowledge. ACM Trans Web 3(2):5

    Article  Google Scholar 

  • Gandhi M, Jakobsson M, Ratkiewicz J (2006) Badvertisements: stealthy click-fraud with unwitting accessories. J Digit Forensic Pract 1(2):131–142

    Article  Google Scholar 

  • Goodman J, Yih WT (2006) Online discriminative spam filter training. In: CEAS, pp 1–4

  • Graepel T, Candela JQ, Borchert T, Herbrich R (2010) Web-scale bayesian click-through rate prediction for sponsored search advertising in microsoft’s bing search engine. In: Proceedings of the 27th international conference on machine learning (ICML-10), pp 13–20

  • Granitto PM, Furlanello C, Biasioli F, Gasperi F (2006) Recursive feature elimination with random forest for PTR-MS analysis of agroindustrial products. Chemom Intell Lab Syst 83(2):83–90

    Article  Google Scholar 

  • Grbovic M, Djuric N, Radosavljevic V, Bhamidipati N (2015) Search retargeting using directed query embeddings. In: Proceedings of the 24th international conference on world wide web companion, international world wide web conferences steering committee, pp 37–38

  • Guzella TS, Caminhas WM (2009) A review of machine learning approaches to spam filtering. Expert Syst Appl 36(7):10,206–10,222

    Article  Google Scholar 

  • Haghi HV, Tafreshi SM (2007) An overview and verification of electricity price forecasting models. In: Power engineering conference, 2007. IPEC 2007. International, IEEE, pp 724–729

  • Heckerman D, Horvitz E, Sahami M, Dumais S (1998) A bayesian approach to filtering junk e-mail. In: Proceeding of AAAI-98 workshop on learning for text categorization, pp 55–62

  • Hinton GE, Osindero S, Teh YW (2006) A fast learning algorithm for deep belief nets. Neural Comput 18(7):1527–1554

    Article  MathSciNet  MATH  Google Scholar 

  • Hu YJ, Shin J, Tang Z (2010) Pricing of online advertising: cost-per-click-through vs. cost-per-action. In: System sciences (HICSS), 2010 43rd Hawaii International Conference on, IEEE, pp 1–9

  • Hülsmann M, Borscheid D, Friedrich CM, Reith D (2012) General sales forecast models for automobile markets and their analysis. Trans MLDM 5(2):65–86

    Google Scholar 

  • Installation HRS (2015) H2O installation in R Studio H2O 2.3.0.1283 documentation. http://docs.h2o.ai/h2oclassic/Ruser/Rinstall.html, [Online; accessed 8-April-2015]

  • Jakobsson M, Ramzan Z (2008) Crimeware: understanding new attacks and defenses. Addison-Wesley, Reading

    Google Scholar 

  • Kim JH (2009) Estimating classification error rate: repeated cross-validation, repeated hold-out and bootstrap. Comput Stat Data Anal 53(11):3735–3745

    Article  MathSciNet  MATH  Google Scholar 

  • Kirubavathi G, Anitha R (2014) Botnets: a study and analysis. Computational intelligence, cyber security and computational models. Springer, Berlin, pp 203–214

    Chapter  Google Scholar 

  • Kondakindi G, Rana S, Rajkumar A, Ponnekanti SK, Parakh V (2014) A logistic regression approach to ad click prediction. Mach Learn Class Project

  • König AC, Gamon M, Wu Q (2009) Click-through prediction for news queries. In: Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval, ACM, pp 347–354

  • Kshetri N (2010) The economics of click fraud. IEEE Security Privacy 8(3):45–53

    Article  Google Scholar 

  • Kuhn M (2012) Variable selection using the caret package. URL http://cran.cermin.lipi.go.id/web/packages/caret/vignettes/caretSelection.pdf

  • Kuhn M (2008) Building predictive models in R using the caret package. J Stat Softw 28(5):1–26

    Article  Google Scholar 

  • Kuhn M, Johnson K (2013) Applied predictive modeling. Springer, Berlin

    Book  MATH  Google Scholar 

  • Kumar R, Naik SM, Naik VD, Shiralli S, Sunil V, Husain M (2015) Predicting clicks: CTR estimation of advertisements using logistic regression classifier. In: Advance computing conference (IACC), 2015 IEEE International, IEEE, pp 1134–1138

  • Larochelle H, Erhan D, Courville A, Bergstra J, Bengio Y (2007) An empirical evaluation of deep architectures on problems with many factors of variation. In: Proceedings of the 24th international conference on machine learning, ACM, pp 473–480

  • Le QV (2013) Building high-level features using large scale unsupervised learning. In: Acoustics, speech and signal processing (ICASSP), 2013 IEEE international conference on, IEEE, pp 8595–8598

  • LeCun Y, Boser B, Denker JS, Henderson D, Howard RE, Hubbard W, Jackel LD (1989) Backpropagation applied to handwritten zip code recognition. Neural Comput 1(4):541–551

    Article  Google Scholar 

  • LeCun Y, Bengio Y, Hinton G (2015) Deep learning. Nature 521(7553):436–444

    Article  Google Scholar 

  • Lee H, Grosse R, Ranganath R, Ng AY (2009) Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations. In: Proceedings of the 26th annual international conference on machine learning, ACM, pp 609–616

  • Lee J, Shi Y, Wang F, Lee H, Kim HK (2015) Advertisement clicking prediction by using multiple criteria mathematical programming. World Wide Web pp 1–18

  • Levin J, Milgrom P (2010) Online advertising: heterogeneity and conflation in market design. Am Econ Rev 100(2):603–607

    Article  Google Scholar 

  • Lohtia R, Donthu N, Hershberger EK (2003) The impact of content and design elements on banner advertising click-through rates. J Advert Res 43(04):410–418

    Article  Google Scholar 

  • Mangani A (2004) Online advertising: pay-per-view versus pay-per-click. J Revenue Pricing Manag 2(4):295–302

    Article  Google Scholar 

  • Markoff J (2012) Scientists see promise in deep-learning programs. New York Times

  • Metz CE (1978) Basic principles of roc analysis. Semin Nuclear Med 8:283–298

    Article  Google Scholar 

  • Michie D, Spiegelhalter DJ, Taylor CC (1994) Machine learning, neural and statistical classification

  • Miller B, Pearce P, Grier C, Kreibich C, Paxson V (2011) Whats clicking what? Techniques and innovations of todays clickbots. In: International conference on detection of intrusions and malware, and vulnerability assessment. Springer, Berlin, pp 164–183

  • Miralles-Pechuán L, Ballester EM, Carrasco JMG (2014) Online advertising and the cpa model: challenges and opportunities. Int J Eng Manag Res 4:324–334

    Google Scholar 

  • Miralles-Pechuán L, Rosso D, Brieva J (2015) Reconocimiento de dígitos escritos a mano mediante métodos de tratamiento de imagen y modelos de clasificación. Res Comput Sci 93(93):83–94

    Google Scholar 

  • Mohamed Ar, Sainath TN, Dahl G, Ramabhadran B, Hinton GE, Picheny M, et al (2011) Deep belief networks using discriminative features for phone recognition. In: Acoustics, speech and signal processing (ICASSP), 2011 IEEE international conference on, IEEE, pp 5060–5063

  • Najafabadi MM, Villanustre F, Khoshgoftaar TM, Seliya N, Wald R, Muharemagic E (2015) Deep learning applications and challenges in big data analytics. J Big Data 2(1):1–21

    Article  Google Scholar 

  • of Data Science KTH (2015) Data—display advertising challenge—Kaggle. www.kaggle.com/c/criteo-display-ad-challenge/data, [Online; accessed 16-July-2008]

  • Ponce H, Martínez-Villaseñor MdL, Miralles-Pechuán L (2016) A novel wearable sensor-based human activity recognition approach using artificial hydrocarbon networks. Sensors 16(7):1033

    Article  Google Scholar 

  • Ponce H, Miralles-Pechuán L, Martínez-Villaseñor MdL (2016b) A flexible approach for human activity recognition using artificial hydrocarbon networks. Sensors 16(11):1715

    Article  Google Scholar 

  • Ranadive A, Rizvi S, Daswani NM (2013) Malicious advertisement detection and remediation. US Patent 8,516,590

  • Refaeilzadeh P, Tang L, Liu H (2009) Cross-validation. In: Encyclopedia of database systems. Springer, Berlin, pp 532–538

  • Rey B, Kannan A (2010) Conversion rate based bid adjustment for sponsored search. In: Proceedings of the 19th international conference on world wide web, ACM, pp 1173–1174

  • Richardson M, Dominowska E, Ragno R (2007) Predicting clicks: estimating the click-through rate for new ads. In: Proceedings of the 16th international conference on world wide web. ACM, pp 521–530

  • Sales OP (2015) Description—online product sales | Kaggle. https://www.kaggle.com/c/online-sales, [Online; accessed 22 July 2015]

  • Schmidhuber J (2015) Deep learning in neural networks: an overview. Neural Netw 61:85–117

    Article  Google Scholar 

  • Set IAD (2015) UCI machine learning repository: internet advertisements. https://archive.ics.uci.edu/ml/datasets/Internet+Advertisements, [Online; accessed 16 June 2015]

  • Sharma SK, Sharma V (2012) Comparative analysis of machine learning techniques in sale forecasting. Int J Comput Appl 53(6):51–54

    Google Scholar 

  • Singh S, Kaur S (2015) Improved spambase dataset prediction using svm Rbf kernel with adaptive boost. Int J Res Eng Technol 4(6):383–386

    Article  Google Scholar 

  • Sonka M, Hlavac V, Boyle R (2014) Image processing, analysis, and machine vision. Cengage Learning, UK

    Google Scholar 

  • Sparks ER, Talwalkar A, Franklin MJ, Jordan MI, Kraska T (2015) Tupaq: An efficient planner for large-scale predictive analytic queries. arXiv:1502.00068

  • Stone-Gross B, Stevens R, Zarras A, Kemmerer R, Kruegel C, Vigna G (2011) Understanding fraudulent activities in online ad exchanges. In: Proceedings of the 2011 ACM SIGCOMM conference on Internet measurement conference, ACM, pp 279–294

  • Studio R (2015) R studio is free and open source data analysis. https://www.rstudio.com/, [Online; accessed 23 June 2015]

  • Tagami Y, Ono S, Yamamoto K, Tsukamoto K, Tajima A (2013) Ctr prediction for contextual advertising: learning-to-rank approach. In: Proceedings of the seventh international workshop on data mining for online advertising, ACM, p 4

  • Tappenden AF, Miller J (2009) Cookies: a deployment study and the testing implications. ACM Trans Web 3(3):9

    Article  Google Scholar 

  • Taylor GW, Hinton GE, Roweis ST (2006) Modeling human motion using binary latent variables. In: Advances in neural information processing systems, pp 1345–1352

  • Tretyakov K (2004) Machine learning techniques in spam filtering. Data Min Probl-Oriented Semin MTAT 3:60–79

    Google Scholar 

  • Trofimov I, Kornetova A, Topinskiy V (2012) Using boosted trees for click-through rate prediction for sponsored search. In: Proceedings of the sixth international workshop on data mining for online advertising and internet economy, ACM, p 2

  • Tuzhilin A (2006) The lane’s gifts v. google report. Official Google Blog: Findings on invalid clicks, posted pp 1–47

  • Vasumati D, Vani MS, Bhramaramba R, Babu OY (2015) Data mining approach to filter click-spam in mobile Ad networks

  • Weka (2015) Weka 3: Data Mining Software in Java). http://www.cs.waikato.ac.nz/ml/weka/, [Online; accessed 11 June 2015]

  • Williams D, Hinton G (1986) Learning representations by back-propagating errors. Nature 323:533–536

    Article  Google Scholar 

  • Yin D, Mei S, Cao B, Sun JT, Davison BD (2014) Exploiting contextual factors for click modeling in sponsored search. In: Proceedings of the 7th ACM international conference on Web search and data mining, ACM, pp 113–122

  • Yoganarasimhan H (2015) Search personalization using machine learning. Available at SSRN 2590020

  • Zhang GP (2003) Time series forecasting using a hybrid arima and neural network model. Neurocomputing 50:159–175

    Article  MATH  Google Scholar 

  • Zhong Sh, Liu Y, Liu Y (2011) Bilinear deep learning for image classification. In: Proceedings of the 19th ACM international conference on multimedia, ACM, pp 343–352

  • Zucker J, Shapiro TR (2015) Systems and methods for optimizing marketing decisions based on visitor profitability. US Patent 20,150,193,830

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Luis Miralles-Pechuán.

Ethics declarations

Conflict of interest

Luis Miralles-Pechuán, Dafne Rosso, Fernando Jiménez and Jose M. García declare that they have no conflict of interest.

Ethical approval

This article does not contain any studies with human participants performed by any of the authors.

Funding

Funded in part by the Spanish Ministerio de Economía y Competitividad (MINECO) and European Commission FEDER Under Grants TIN2013-45491-R and TIN2015-66972-C5-3-R.

Additional information

Communicated by H. Ponce.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Miralles-Pechuán, L., Rosso, D., Jiménez, F. et al. A methodology based on Deep Learning for advert value calculation in CPM, CPC and CPA networks. Soft Comput 21, 651–665 (2017). https://doi.org/10.1007/s00500-016-2468-4

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00500-016-2468-4

Keywords

Navigation