Publications

BGP2Vec: Unveiling the Latent Characteristics of Autonomous Systems

Published in IEEE Transactions on Network and Service Management, 2022

In this paper, we present BGP2Vec, a novel approach to revealing the latent characteristics of ASes using neural-network-based embedding. We show that our embedding indeed captures important characteristics of ASes, and then show how the embedding can be used to solve two problems: ASN business-type classification and AS Type of Relationships (ToRs) inference.

Recommended citation: T. Shapira and Y. Shavitt, "BGP2Vec: Unveiling the Latent Characteristics of Autonomous Systems," in IEEE Transactions on Network and Service Management, doi: 10.1109/TNSM.2022.3169638. https://ieeexplore.ieee.org/document/9761992

AP2Vec: an Unsupervised Approach for BGP Hijacking Detection

Published in IEEE Transactions on Network and Service Management, 2022

In this paper, we introduce a novel approach for BGP hijacking detection that is based on the observation that during a hijack attack, the functional roles of ASNs along the route change. To identify a functional change, we build on previous work that embeds ASNs to vectors based on BGP routing announcements and embed each IP address prefix (AP) to a vector representing its latent characteristics, we call it AP2Vec. Then, we compare the embedding of a new route with the AP embedding that is based on the old routes to identify large differences.

Recommended citation: T. Shapira and Y. Shavitt, "AP2Vec: an Unsupervised Approach for BGP Hijacking Detection," in IEEE Transactions on Network and Service Management, doi: 10.1109/TNSM.2022.3166450. https://ieeexplore.ieee.org/document/9754706

Fast and lean encrypted Internet traffic classification

Published in Computer Communications, 2022

We suggest a novel approach for classification that extracts the most out of the two simple yet defining features of a flow: packet sizes and inter-arrival times. We employ a model that uses the inter-arrival times to parameterize the derivative of the flow hidden-state using a neural network (Neural ODE). We compare our results with a solution that uses the same data without the ODE solver and show the benefit of this approach.

Recommended citation: S. Roy, T. Shapira and Y. Shavitt, "Fast and lean encrypted Internet traffic classification," in Computer Communications, Volume 186, 2022, Pages 166-173, ISSN 0140-3664. https://www.sciencedirect.com/science/article/pii/S0140366422000408?via%3Dihub

SASA: Source-Aware Self-Attention for IP Hijack Detection

Published in IEEE/ACM Transactions on Networking, 2021

We introduce a deep learning system that examines the geography of traceroute measurements to detect malicious routes. We use multiple geolocation services, with various levels of confidence; each also suffers from location errors. Moreover, identifying a hijacked route is not sufficient since an operator presented with a hijack alert needs an indication of the cause for flagging out the problematic route. Thus, we introduce a novel deep learning layer, called Source-Aware Self-Attention (SASA), which is an extension of the attention mechanism. SASA learns each data source’s confidence and combines this score with the attention of each router in the route to point out the most problematic one.

Recommended citation: T. Shapira and Y. Shavitt, "SASA: Source-Aware Self-Attention for IP Hijack Detection," in IEEE/ACM Transactions on Networking, doi: 10.1109/TNET.2021.3115935. https://ieeexplore.ieee.org/document/9556519

FlowPic: A Generic Representation for Encrypted Traffic Classification and Applications Identification

Published in IEEE Transactions on Network and Service Management (TNSM), 2021

Identifying the type of a network flow or a specific application has many advantages, such as, traffic engineering, or to detect and prevent application or application types that violate the organization’s security policy. The use of encryption, such as VPN, makes such identification challenging. Current solutions rely mostly on handcrafted features and then apply supervised learning techniques for the classification. We introduce a novel approach for encrypted Internet traffic classification and application identification by transforming basic flow data into an intuitive picture, a FlowPic, and then using known image classification deep learning techniques, CNNs, to identify the flow category (browsing, chat, video, etc.) and the application in use. We show that our approach can classify traffic with high accuracy, both for a specific application, or a flow category, even for VPN and Tor traffic. Our classifier can even identify with high success new applications that were not part of the training phase for a category, thus, new versions or applications can be categorized without additional training.

Recommended citation: T. Shapira and Y. Shavitt, "FlowPic: A Generic Representation for Encrypted Traffic Classification and Applications Identification," in IEEE Transactions on Network and Service Management, doi: 10.1109/TNSM.2021.3071441. https://ieeexplore.ieee.org/document/9395707

Efficient Data-Dependent Learnability

Published in arXiv.org 2020, 2020

The predictive normalized maximum likelihood (pNML) approach has recently been proposed as the min-max optimal solution to the batch learning problem where both the training set and the test data feature are individuals, known sequences. This approach has yields a learnability measure that can also be interpreted as a stability measure. This measure has shown some potential in detecting out-of-distribution examples, yet it has considerable computational costs. In this project, we propose and analyze an approximation of the pNML, which is based on influence functions. Combining both theoretical analysis and experiments, we show that when applied to neural networks, this approximation can detect out-of-distribution examples effectively. We also compare its performance to that achieved by conducting a single gradient step for each possible label.

Recommended citation: Y. Fogel, T. Shapira and M. Feder, "Efficient Data-Dependent Learnability," arXiv, 2020. https://arxiv.org/abs/2011.10334

A Deep Learning Approach for IP Hijack Detection Based on ASN Embedding

Published in ACM SIGCOMM Workshop on Network Meets AI & ML (NetAI 2020), 2020

IP hijack detection is an important security challenge. In this paper we introduce a novel approach for BGP hijack detection using deep learning. Similar to natural language processing (NLP) models, we show that by using BGP route announcements as sentences, we can embed each AS number (ASN) to a vector that represents its latent characteristics. In order to solve this supervised learning problem, we use these vectors as an input to a recurrent neural network and achieve an excellent result: an accuracy of 99.99% for BGP hijack detection with 0.00% false alarm. We test our method on 48 past hijack events between the years 2008 and 2018 and detect 32 of them, and in particular, all the events within two years from our training data.

Recommended citation: T. Shapira and Y. Shavitt, "A Deep Learning Approach for IP Hijack Detection Based on ASN Embedding," ACM SIGCOMM Workshop on Network Meets AI & ML (NetAI 2020), New York, NY, USA, Aug 2020, pp. 35–41. https://dl.acm.org/doi/abs/10.1145/3405671.3405814

Unveiling the Type of Relationship Between Autonomous Systems Using Deep Learning

Published in NOMS 2020 - 2020 IEEE/IFIP Network Operations and Management Symposium, 2020

The ToR inference problem had been widely investigated in the last two decades, mostly using heuristic algorithms. In this problem, we attempt to reveal the economic relationships between ASes, data with applications in network routing management and routing security. In this paper, we introduce a novel approach for ToR classification, which is based on embedding the AS numbers (ASN) in high dimensional space using neural networks. Similar to natural language processing (NLP) models, the embedding represents latent characteristics of the ASN and its interactions on the Internet. The embedding coordinates of each AS are represented by a vector; thus, we call our method BGP2VEC. In order to solve the supervised learning problem presented, we use these vectors as an input to an artificial neural network and achieve a state of the art accuracy of 95.2% for ToR classification.

Recommended citation: T. Shapira and Y. Shavitt, "Unveiling the Type of Relationship Between Autonomous Systems Using Deep Learning," NOMS 2020 - 2020 IEEE/IFIP Network Operations and Management Symposium, Budapest, Hungary, 2020, pp. 1-6. https://ieeexplore.ieee.org/document/9110358

FlowPic: Encrypted Internet Traffic Classification is as Easy as Image Recognition

Published in IEEE INFOCOM 2019 - IEEE Conference on Computer Communications Workshops (INFOCOM WKSHPS), 2019

Identifying the type of a network flow or a specific application has many advantages, but become harder in recent years due to the use of encryption, e.g., by VPN and Tor. Current solutions rely mostly on handcrafted features and then apply supervised learning techniques for the classification. We introduce a novel approach for encrypted Internet traffic classification by transforming basic flow data into a picture, a FlowPic, and then using known image classification deep learning techniques, Convolutional Neural Networks (CNNs), to identify the flow category (browsing, chat, video, etc.) and the application in use. We show using the UNB ISCX datasets that our approach can classify traffic with high accuracy. We can identify a category with very high accuracy even for VPN and Tor traffic. We classified with high success VPN traffic when the training was done for a non-VPN traffic. Our categorization can identify with good success new applications that were not part of the training phase. We can also use the same CNN to classify applications with an accuracy of 99.7%. Overall, our approach achieves significant better performance than previous work, and can handle classification problems that were not studied in the past.

Recommended citation: T. Shapira and Y. Shavitt, "FlowPic: Encrypted Internet Traffic Classification is as Easy as Image Recognition," IEEE INFOCOM 2019 - IEEE Conference on Computer Communications Workshops (INFOCOM WKSHPS), Paris, France, 2019, pp. 680-687. https://ieeexplore.ieee.org/document/8845315