On Internet Traffic Classification: A Two-Phased Machine Learning Approach

Traffic classification utilizing flow measurement enables operators to perform essential network management. Flow accounting methods such as NetFlow are, however, considered inadequate for classification requiring additional packet-level information, host behaviour analysis, and specialized hardware...

Full description

Saved in:

Bibliographic Details
Main Authors:	Taimur Bakhshi, Bogdan Ghita
Format:	Article
Language:	English
Published:	Wiley 2016-01-01
Series:	Journal of Computer Networks and Communications
Online Access:	http://dx.doi.org/10.1155/2016/2048302
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1832563608886706176
author	Taimur Bakhshi Bogdan Ghita
author_facet	Taimur Bakhshi Bogdan Ghita
author_sort	Taimur Bakhshi
collection	DOAJ
description	Traffic classification utilizing flow measurement enables operators to perform essential network management. Flow accounting methods such as NetFlow are, however, considered inadequate for classification requiring additional packet-level information, host behaviour analysis, and specialized hardware limiting their practical adoption. This paper aims to overcome these challenges by proposing two-phased machine learning classification mechanism with NetFlow as input. The individual flow classes are derived per application through k-means and are further used to train a C5.0 decision tree classifier. As part of validation, the initial unsupervised phase used flow records of fifteen popular Internet applications that were collected and independently subjected to k-means clustering to determine unique flow classes generated per application. The derived flow classes were afterwards used to train and test a supervised C5.0 based decision tree. The resulting classifier reported an average accuracy of 92.37% on approximately 3.4 million test cases increasing to 96.67% with adaptive boosting. The classifier specificity factor which accounted for differentiating content specific from supplementary flows ranged between 98.37% and 99.57%. Furthermore, the computational performance and accuracy of the proposed methodology in comparison with similar machine learning techniques lead us to recommend its extension to other applications in achieving highly granular real-time traffic classification.
format	Article
id	doaj-art-01a2eddff4a244e497e83e03ddfe6de3
institution	Kabale University
issn	2090-7141 2090-715X
language	English
publishDate	2016-01-01
publisher	Wiley
record_format	Article
series	Journal of Computer Networks and Communications
spelling	doaj-art-01a2eddff4a244e497e83e03ddfe6de32025-02-03T01:12:55ZengWileyJournal of Computer Networks and Communications2090-71412090-715X2016-01-01201610.1155/2016/20483022048302On Internet Traffic Classification: A Two-Phased Machine Learning ApproachTaimur Bakhshi0Bogdan Ghita1Center for Security, Communications and Network Research, University of Plymouth, Plymouth PL4 8AA, UKCenter for Security, Communications and Network Research, University of Plymouth, Plymouth PL4 8AA, UKTraffic classification utilizing flow measurement enables operators to perform essential network management. Flow accounting methods such as NetFlow are, however, considered inadequate for classification requiring additional packet-level information, host behaviour analysis, and specialized hardware limiting their practical adoption. This paper aims to overcome these challenges by proposing two-phased machine learning classification mechanism with NetFlow as input. The individual flow classes are derived per application through k-means and are further used to train a C5.0 decision tree classifier. As part of validation, the initial unsupervised phase used flow records of fifteen popular Internet applications that were collected and independently subjected to k-means clustering to determine unique flow classes generated per application. The derived flow classes were afterwards used to train and test a supervised C5.0 based decision tree. The resulting classifier reported an average accuracy of 92.37% on approximately 3.4 million test cases increasing to 96.67% with adaptive boosting. The classifier specificity factor which accounted for differentiating content specific from supplementary flows ranged between 98.37% and 99.57%. Furthermore, the computational performance and accuracy of the proposed methodology in comparison with similar machine learning techniques lead us to recommend its extension to other applications in achieving highly granular real-time traffic classification.http://dx.doi.org/10.1155/2016/2048302
spellingShingle	Taimur Bakhshi Bogdan Ghita On Internet Traffic Classification: A Two-Phased Machine Learning Approach Journal of Computer Networks and Communications
title	On Internet Traffic Classification: A Two-Phased Machine Learning Approach
title_full	On Internet Traffic Classification: A Two-Phased Machine Learning Approach
title_fullStr	On Internet Traffic Classification: A Two-Phased Machine Learning Approach
title_full_unstemmed	On Internet Traffic Classification: A Two-Phased Machine Learning Approach
title_short	On Internet Traffic Classification: A Two-Phased Machine Learning Approach
title_sort	on internet traffic classification a two phased machine learning approach
url	http://dx.doi.org/10.1155/2016/2048302
work_keys_str_mv	AT taimurbakhshi oninternettrafficclassificationatwophasedmachinelearningapproach AT bogdanghita oninternettrafficclassificationatwophasedmachinelearningapproach

On Internet Traffic Classification: A Two-Phased Machine Learning Approach

Similar Items