A methodology based on Deep Learning for advert value calculation in CPM, CPC and CPA networks

Miralles-Pechuán, Luis; Rosso, Dafne; Jiménez, Fernando; García, Jose M.

doi:10.1007/s00500-016-2468-4

A methodology based on Deep Learning for advert value calculation in CPM, CPC and CPA networks

Focus
Published: 29 December 2016

Volume 21, pages 651–665, (2017)
Cite this article

Soft Computing Aims and scope Submit manuscript

Luis Miralles-Pechuán ORCID: orcid.org/0000-0002-7565-6894¹,
Dafne Rosso¹,
Fernando Jiménez² &
…
Jose M. García²

1371 Accesses
17 Citations
7 Altmetric
Explore all metrics

Abstract

In this research, we propose a methodology for advert value calculation in CPM, CPC and CPA networks. Accurately estimating this value increases the three previous networks’ incomes by selecting the most profitable advert. By increasing income, publishers are better paid and improved services are afforded to advertisers. To develop this methodology, we propose a system based on traditional Machine Learning methods and Deep Learning methods. The system has two inputs and one output. The inputs are the user visit and the data about the advertiser. The output is the advert value expressed in dollars. Deep Learning predicts model behavior more precisely for many supervised problems. The three experiments carried out allow us to conclude that DL is a supervised method that is very efficient in the classification of spam adverts and in the estimation of the CTR. In the prediction of online sales, DLNN have shown, on average, worse performance than cubist and random forest methods, although better performance than model tree, model rules and linear regression methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

The derived demand for advertising expenses and implications on sustainability: a comparative study using deep learning and traditional machine learning methods

Article 07 January 2022

Sule Birim, Ipek Kazancoglu, … Yigit Kazancoglu

Using Machine Learning to Generate Predictions Based on the Information Extracted from Automobile Ads

Online Advertising Dataset Using ANN (Artificial Neural Networks) and LR (Linear Regression Techniques)

Notes

The number in parentheses refers to the year in which the company was founded.
The Classification And Regression Training (Caret) package is a set of functions for creating and evaluating predictive models (caret Package 2015).
R Studio is an integrated development environment oriented toward statistical computing and graphics (Studio 2015).
Architectural depth refers to the number of levels needed to build a nonlinear operation on the learned function (Najafabadi et al. 2015).
Click-bots are programs that are installed on user’s computers without their knowledge and that generate clicks in order to harm the ecosystem of online advertising (Miller et al. 2011).
For the experiments, we used version 3.1.2 (2014-10-31) of the R Studio Software. This software has been executed on a computer Intel®Core™ i5-2400 CPU @ 3.10 GHz with 16 Gb RAM, with operative system Windows 7 Pro, Service Pack 1, 64 bit.
Weka is a data mining software in Java, and it implements a collection of ML algorithms for data mining (Weka 2015).
The H2O DL package is an open-source platform for the automated learning memory (Installation 2015).
In building this classification model, we used a dataset consisting of 3279 adverts out of which 2821 adverts were spam and 458 were not (Set 2015). Each sample had 1558 attributes. Twenty-eight percent of the instances had noisy information or missing values. These missing values were treated with the Weka Software “ReplaceMissingValues” function.
The dataset used to build this classification model was composed of 45,840,617 samples. From this dataset, we randomly chose 10,000 samples applying a statistical scheme (Data Science 2015).
The dataset consists of a total of 751 products and 546 features. The dataset has 12 outputs that indicate the number of sales for the next 12 months. The remaining 534 columns correspond to the product features so that we consider the model inputs. From all inputs, 514 are categorical values and 20 of them are numerical values (Sales 2015).

References

Agarwal D, Chen BC, Elango P (2009) Spatio-temporal models for estimating click-through rate. In: Proceedings of the 18th international conference on World wide web, ACM, pp 21–30
Arel I, Rose DC, Karnowski TP (2010) Deep machine learning-a new frontier in artificial intelligence research [research frontier]. IEEE Comput Intell Mag 5(4):13–18
Article Google Scholar
Balakrishnan S, Chopra S, Melamed ID (2010) The business next door: Click-through rate modeling for local search. Machine Learning in Online Advertising p 14
Bauman K, Kornetova A, Topinsky V, Leshiner D (2010) Ctr prediction based on click statistic. In: Workshop: machine learning in online advertising, Citeseer, pp 8–13
Bax E, Kuratti A, Mcafee P, Romero J (2012) Comparing predicted prices in auctions for online advertising. Int J Ind Organ 30(1):80–88
Article Google Scholar
Beheshti-Kashi S, Karimi HR, Thoben KD, Lütjen M, Teucke M (2015) A survey on retail sales forecasting and prediction in fashion markets. Syst Sci Control Eng: Open Access J 3(1):154–161
Article Google Scholar
Bengio Y (2009) Learning deep architectures for AI. Found Trends Mach Learn 2(1):1–127
Article MATH Google Scholar
Bose I, Mahapatra RK (2001) Business data mining: a machine learning perspective. Inf Manag 39(3):211–225
Article Google Scholar
caret Package T (2015) The caret package (short for classification and regression training). http://topepo.github.io/caret/index.html, [Online; accessed 05 July 2015]
Chen FL, Ou TY (2011) Constructing a sales forecasting model by integrating GRA and ELM: a case study for retail industry. Int J Electron Bus Manag 9(2):107
Google Scholar
Cho CH, as UoTaAia (2004) Why do people avoid advertising on the internet? J Advert 33(4):89–97
Article Google Scholar
Clark J, Koprinska I, Poon J (2003) A neural network based approach to automated e-mail classification. In: Null, IEEE, p 702
Dembczynski K, Kotlowski W, Weiss D (2008) Predicting ads click-through rate with decision rules. In: Workshop on targeting and ranking in online advertising, vol 2008
Díaz-Uriarte R, De Andres SA (2006) Gene selection and classification of microarray data using random forest. BMC Bioinform 7(1):3
Article Google Scholar
Documentation DLH (2015) Deep Learning-H2O 2.8.6.2 documentation. https://s3.amazonaws.com/h2o-release/h2o/rel-markov/1/docs-website/datascience/deeplearning.html, [Online; accessed 22 April 2015]
Duarte Torres S, Weber I, Hiemstra D (2014) Analysis of search and browsing behavior of young users on the web. ACM Trans Web 8(2):7
Article Google Scholar
Fain DC, Pedersen JO (2006) Sponsored search: a brief history. Bull Am Soc Inf Sci Technol 32(2):12–13
Article Google Scholar
Fang Z, Yue K, Zhang J, Zhang D, Liu W (2014) Predicting click-through rates of new advertisements based on the bayesian network. In: Mathematical problems in engineering 2014
Fawcett T (2006) An introduction to roc analysis. Pattern Recognit Lett 27(8):861–874
Article MathSciNet Google Scholar
Feily M, Shahrestani A, Ramadass S (2009) A survey of botnet and botnet detection. In: Emerging security information, systems and technologies, 2009. SECURWARE’09. Third International Conference on, IEEE, pp 268–273
Fjell K (2010) Online advertising: pay-per-view versus pay-per-click with market power. J Revenue Pricing Manag 9(3):198–203
Article Google Scholar
Gabrilovich E, Broder A, Fontoura M, Joshi A, Josifovski V, Riedel L, Zhang T (2009) Classifying search queries using the web as a source of knowledge. ACM Trans Web 3(2):5
Article Google Scholar
Gandhi M, Jakobsson M, Ratkiewicz J (2006) Badvertisements: stealthy click-fraud with unwitting accessories. J Digit Forensic Pract 1(2):131–142
Article Google Scholar
Goodman J, Yih WT (2006) Online discriminative spam filter training. In: CEAS, pp 1–4
Graepel T, Candela JQ, Borchert T, Herbrich R (2010) Web-scale bayesian click-through rate prediction for sponsored search advertising in microsoft’s bing search engine. In: Proceedings of the 27th international conference on machine learning (ICML-10), pp 13–20
Granitto PM, Furlanello C, Biasioli F, Gasperi F (2006) Recursive feature elimination with random forest for PTR-MS analysis of agroindustrial products. Chemom Intell Lab Syst 83(2):83–90
Article Google Scholar
Grbovic M, Djuric N, Radosavljevic V, Bhamidipati N (2015) Search retargeting using directed query embeddings. In: Proceedings of the 24th international conference on world wide web companion, international world wide web conferences steering committee, pp 37–38
Guzella TS, Caminhas WM (2009) A review of machine learning approaches to spam filtering. Expert Syst Appl 36(7):10,206–10,222
Article Google Scholar
Haghi HV, Tafreshi SM (2007) An overview and verification of electricity price forecasting models. In: Power engineering conference, 2007. IPEC 2007. International, IEEE, pp 724–729
Heckerman D, Horvitz E, Sahami M, Dumais S (1998) A bayesian approach to filtering junk e-mail. In: Proceeding of AAAI-98 workshop on learning for text categorization, pp 55–62
Hinton GE, Osindero S, Teh YW (2006) A fast learning algorithm for deep belief nets. Neural Comput 18(7):1527–1554
Article MathSciNet MATH Google Scholar
Hu YJ, Shin J, Tang Z (2010) Pricing of online advertising: cost-per-click-through vs. cost-per-action. In: System sciences (HICSS), 2010 43rd Hawaii International Conference on, IEEE, pp 1–9
Hülsmann M, Borscheid D, Friedrich CM, Reith D (2012) General sales forecast models for automobile markets and their analysis. Trans MLDM 5(2):65–86
Google Scholar
Installation HRS (2015) H2O installation in R Studio H2O 2.3.0.1283 documentation. http://docs.h2o.ai/h2oclassic/Ruser/Rinstall.html, [Online; accessed 8-April-2015]
Jakobsson M, Ramzan Z (2008) Crimeware: understanding new attacks and defenses. Addison-Wesley, Reading
Google Scholar
Kim JH (2009) Estimating classification error rate: repeated cross-validation, repeated hold-out and bootstrap. Comput Stat Data Anal 53(11):3735–3745
Article MathSciNet MATH Google Scholar
Kirubavathi G, Anitha R (2014) Botnets: a study and analysis. Computational intelligence, cyber security and computational models. Springer, Berlin, pp 203–214
Chapter Google Scholar
Kondakindi G, Rana S, Rajkumar A, Ponnekanti SK, Parakh V (2014) A logistic regression approach to ad click prediction. Mach Learn Class Project
König AC, Gamon M, Wu Q (2009) Click-through prediction for news queries. In: Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval, ACM, pp 347–354
Kshetri N (2010) The economics of click fraud. IEEE Security Privacy 8(3):45–53
Article Google Scholar
Kuhn M (2012) Variable selection using the caret package. URL http://cran.cermin.lipi.go.id/web/packages/caret/vignettes/caretSelection.pdf
Kuhn M (2008) Building predictive models in R using the caret package. J Stat Softw 28(5):1–26
Article Google Scholar
Kuhn M, Johnson K (2013) Applied predictive modeling. Springer, Berlin
Book MATH Google Scholar
Kumar R, Naik SM, Naik VD, Shiralli S, Sunil V, Husain M (2015) Predicting clicks: CTR estimation of advertisements using logistic regression classifier. In: Advance computing conference (IACC), 2015 IEEE International, IEEE, pp 1134–1138
Larochelle H, Erhan D, Courville A, Bergstra J, Bengio Y (2007) An empirical evaluation of deep architectures on problems with many factors of variation. In: Proceedings of the 24th international conference on machine learning, ACM, pp 473–480
Le QV (2013) Building high-level features using large scale unsupervised learning. In: Acoustics, speech and signal processing (ICASSP), 2013 IEEE international conference on, IEEE, pp 8595–8598
LeCun Y, Boser B, Denker JS, Henderson D, Howard RE, Hubbard W, Jackel LD (1989) Backpropagation applied to handwritten zip code recognition. Neural Comput 1(4):541–551
Article Google Scholar
LeCun Y, Bengio Y, Hinton G (2015) Deep learning. Nature 521(7553):436–444
Article Google Scholar
Lee H, Grosse R, Ranganath R, Ng AY (2009) Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations. In: Proceedings of the 26th annual international conference on machine learning, ACM, pp 609–616
Lee J, Shi Y, Wang F, Lee H, Kim HK (2015) Advertisement clicking prediction by using multiple criteria mathematical programming. World Wide Web pp 1–18
Levin J, Milgrom P (2010) Online advertising: heterogeneity and conflation in market design. Am Econ Rev 100(2):603–607
Article Google Scholar
Lohtia R, Donthu N, Hershberger EK (2003) The impact of content and design elements on banner advertising click-through rates. J Advert Res 43(04):410–418
Article Google Scholar
Mangani A (2004) Online advertising: pay-per-view versus pay-per-click. J Revenue Pricing Manag 2(4):295–302
Article Google Scholar
Markoff J (2012) Scientists see promise in deep-learning programs. New York Times
Metz CE (1978) Basic principles of roc analysis. Semin Nuclear Med 8:283–298
Article Google Scholar
Michie D, Spiegelhalter DJ, Taylor CC (1994) Machine learning, neural and statistical classification
Miller B, Pearce P, Grier C, Kreibich C, Paxson V (2011) Whats clicking what? Techniques and innovations of todays clickbots. In: International conference on detection of intrusions and malware, and vulnerability assessment. Springer, Berlin, pp 164–183
Miralles-Pechuán L, Ballester EM, Carrasco JMG (2014) Online advertising and the cpa model: challenges and opportunities. Int J Eng Manag Res 4:324–334
Google Scholar
Miralles-Pechuán L, Rosso D, Brieva J (2015) Reconocimiento de dígitos escritos a mano mediante métodos de tratamiento de imagen y modelos de clasificación. Res Comput Sci 93(93):83–94
Google Scholar
Mohamed Ar, Sainath TN, Dahl G, Ramabhadran B, Hinton GE, Picheny M, et al (2011) Deep belief networks using discriminative features for phone recognition. In: Acoustics, speech and signal processing (ICASSP), 2011 IEEE international conference on, IEEE, pp 5060–5063
Najafabadi MM, Villanustre F, Khoshgoftaar TM, Seliya N, Wald R, Muharemagic E (2015) Deep learning applications and challenges in big data analytics. J Big Data 2(1):1–21
Article Google Scholar
of Data Science KTH (2015) Data—display advertising challenge—Kaggle. www.kaggle.com/c/criteo-display-ad-challenge/data, [Online; accessed 16-July-2008]
Ponce H, Martínez-Villaseñor MdL, Miralles-Pechuán L (2016) A novel wearable sensor-based human activity recognition approach using artificial hydrocarbon networks. Sensors 16(7):1033
Article Google Scholar
Ponce H, Miralles-Pechuán L, Martínez-Villaseñor MdL (2016b) A flexible approach for human activity recognition using artificial hydrocarbon networks. Sensors 16(11):1715
Article Google Scholar
Ranadive A, Rizvi S, Daswani NM (2013) Malicious advertisement detection and remediation. US Patent 8,516,590
Refaeilzadeh P, Tang L, Liu H (2009) Cross-validation. In: Encyclopedia of database systems. Springer, Berlin, pp 532–538
Rey B, Kannan A (2010) Conversion rate based bid adjustment for sponsored search. In: Proceedings of the 19th international conference on world wide web, ACM, pp 1173–1174
Richardson M, Dominowska E, Ragno R (2007) Predicting clicks: estimating the click-through rate for new ads. In: Proceedings of the 16th international conference on world wide web. ACM, pp 521–530
Sales OP (2015) Description—online product sales | Kaggle. https://www.kaggle.com/c/online-sales, [Online; accessed 22 July 2015]
Schmidhuber J (2015) Deep learning in neural networks: an overview. Neural Netw 61:85–117
Article Google Scholar
Set IAD (2015) UCI machine learning repository: internet advertisements. https://archive.ics.uci.edu/ml/datasets/Internet+Advertisements, [Online; accessed 16 June 2015]
Sharma SK, Sharma V (2012) Comparative analysis of machine learning techniques in sale forecasting. Int J Comput Appl 53(6):51–54
Google Scholar
Singh S, Kaur S (2015) Improved spambase dataset prediction using svm Rbf kernel with adaptive boost. Int J Res Eng Technol 4(6):383–386
Article Google Scholar
Sonka M, Hlavac V, Boyle R (2014) Image processing, analysis, and machine vision. Cengage Learning, UK
Google Scholar
Sparks ER, Talwalkar A, Franklin MJ, Jordan MI, Kraska T (2015) Tupaq: An efficient planner for large-scale predictive analytic queries. arXiv:1502.00068
Stone-Gross B, Stevens R, Zarras A, Kemmerer R, Kruegel C, Vigna G (2011) Understanding fraudulent activities in online ad exchanges. In: Proceedings of the 2011 ACM SIGCOMM conference on Internet measurement conference, ACM, pp 279–294
Studio R (2015) R studio is free and open source data analysis. https://www.rstudio.com/, [Online; accessed 23 June 2015]
Tagami Y, Ono S, Yamamoto K, Tsukamoto K, Tajima A (2013) Ctr prediction for contextual advertising: learning-to-rank approach. In: Proceedings of the seventh international workshop on data mining for online advertising, ACM, p 4
Tappenden AF, Miller J (2009) Cookies: a deployment study and the testing implications. ACM Trans Web 3(3):9
Article Google Scholar
Taylor GW, Hinton GE, Roweis ST (2006) Modeling human motion using binary latent variables. In: Advances in neural information processing systems, pp 1345–1352
Tretyakov K (2004) Machine learning techniques in spam filtering. Data Min Probl-Oriented Semin MTAT 3:60–79
Google Scholar
Trofimov I, Kornetova A, Topinskiy V (2012) Using boosted trees for click-through rate prediction for sponsored search. In: Proceedings of the sixth international workshop on data mining for online advertising and internet economy, ACM, p 2
Tuzhilin A (2006) The lane’s gifts v. google report. Official Google Blog: Findings on invalid clicks, posted pp 1–47
Vasumati D, Vani MS, Bhramaramba R, Babu OY (2015) Data mining approach to filter click-spam in mobile Ad networks
Weka (2015) Weka 3: Data Mining Software in Java). http://www.cs.waikato.ac.nz/ml/weka/, [Online; accessed 11 June 2015]
Williams D, Hinton G (1986) Learning representations by back-propagating errors. Nature 323:533–536
Article Google Scholar
Yin D, Mei S, Cao B, Sun JT, Davison BD (2014) Exploiting contextual factors for click modeling in sponsored search. In: Proceedings of the 7th ACM international conference on Web search and data mining, ACM, pp 113–122
Yoganarasimhan H (2015) Search personalization using machine learning. Available at SSRN 2590020
Zhang GP (2003) Time series forecasting using a hybrid arima and neural network model. Neurocomputing 50:159–175
Article MATH Google Scholar
Zhong Sh, Liu Y, Liu Y (2011) Bilinear deep learning for image classification. In: Proceedings of the 19th ACM international conference on multimedia, ACM, pp 343–352
Zucker J, Shapiro TR (2015) Systems and methods for optimizing marketing decisions based on visitor profitability. US Patent 20,150,193,830

Download references

Author information

Authors and Affiliations

Universidad Panamericana, Campus México, Facultad de Ingeniería, Augusto Rodin 498, 03920, Ciudad de México, México
Luis Miralles-Pechuán & Dafne Rosso
Faculty of Computer Science, Universidad de Murcia, Murcia, Spain
Fernando Jiménez & Jose M. García

Authors

Luis Miralles-Pechuán
View author publications
You can also search for this author in PubMed Google Scholar
Dafne Rosso
View author publications
You can also search for this author in PubMed Google Scholar
Fernando Jiménez
View author publications
You can also search for this author in PubMed Google Scholar
Jose M. García
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Luis Miralles-Pechuán.

Ethics declarations

Conflict of interest

Luis Miralles-Pechuán, Dafne Rosso, Fernando Jiménez and Jose M. García declare that they have no conflict of interest.

Ethical approval

This article does not contain any studies with human participants performed by any of the authors.

Funding

Funded in part by the Spanish Ministerio de Economía y Competitividad (MINECO) and European Commission FEDER Under Grants TIN2013-45491-R and TIN2015-66972-C5-3-R.

Additional information

Communicated by H. Ponce.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Miralles-Pechuán, L., Rosso, D., Jiménez, F. et al. A methodology based on Deep Learning for advert value calculation in CPM, CPC and CPA networks. Soft Comput 21, 651–665 (2017). https://doi.org/10.1007/s00500-016-2468-4

Download citation

Published: 29 December 2016
Issue Date: February 2017
DOI: https://doi.org/10.1007/s00500-016-2468-4

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

A methodology based on Deep Learning for advert value calculation in CPM, CPC and CPA networks

Abstract

Access this article

Similar content being viewed by others

The derived demand for advertising expenses and implications on sustainability: a comparative study using deep learning and traditional machine learning methods

Using Machine Learning to Generate Predictions Based on the Information Extracted from Automobile Ads

Online Advertising Dataset Using ANN (Artificial Neural Networks) and LR (Linear Regression Techniques)

Notes

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Ethical approval

Funding

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A methodology based on Deep Learning for advert value calculation in CPM, CPC and CPA networks

Abstract

Access this article

Similar content being viewed by others

The derived demand for advertising expenses and implications on sustainability: a comparative study using deep learning and traditional machine learning methods

Using Machine Learning to Generate Predictions Based on the Information Extracted from Automobile Ads

Online Advertising Dataset Using ANN (Artificial Neural Networks) and LR (Linear Regression Techniques)

Notes

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Ethical approval

Funding

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation