Fork me on GitHub

Understanding and Unbiasing IPv6 Hitlists

On this website we present additional information about our IMC papers "Clusters in the Expanse: Understanding and Unbiasing IPv6 Hitlists" and "Rusty Clusters? Dusting an IPv6 Research Foundation", and provide access to our IPv6 Hitlist Service.

IPv6 Hitlist Service

We provide an IPv6 Hitlist Service where we publish responsive IPv6 addresses, address categories, aliased prefixes, and non-aliased prefixes to interested researchers. The IPv6 Hitlist Service consists of an openly accessible one and a registration-first service.

Hitlist addresses

This graph shows the development of the full, aliased and non-aliased hitlist over time.

Responsive addresses

Here, the development of the different protocol responses over time is shown. We scan five different protocols, an additional graph shows the amount of IP addresses which respond to at least one of the protocols.
The plot of responsive addresses is an updated version. We removed all responses to the UDP/53 scan injected by the Great Firewall of China. The original plot and additional information is provided here: We provide both filtered and unfiltered, historic UPD/53 files for registered users.

Categorized addresses

Lastly, the network categories represented in the Hitlist can be seen. Via the dropdown the categories represented in the hitlist, the total responsive addresses, or the responsive addresses per protocol may be chosen. The categories are filtered with PeeringDB. For the details, see the 2023 Paper.
Select the data type:

Openly Accessible Service

You can use the weekly generated list of responsive IPv6 addresses, aliased prefixes, and non-aliased prefixes without registration: The responsive addresses include addresses from non-aliased prefixes only. Please see the notes about aliased prefixes below to make use of them.

Registration-First Service

We provide additional data which can be used to conduct in-depth research on IPv6 networks and addresses. This includes: To get free access to this registration-first service, you can send a quick registration email. We use the gathered data for statistical purposes and might very occasionally send a survey or other requests for feedback.

Referencing the Hitlist Service

If you are using data from the IPv6 Hitlist Service in your publication, please refer to it with the following references:
@inproceedings{gasser2018clusters,
   title = {Clusters in the Expanse: Understanding and Unbiasing IPv6 Hitlists},
   author = {Gasser, Oliver and Scheitle, Quirin and Foremski, Pawel and Lone, Qasim and Korczynski, Maciej and Strowes, Stephen D. and Hendriks, Luuk and Carle, Georg},
   booktitle = {Proceedings of the 2018 Internet Measurement Conference},
   year = {2018},
   location = {Boston, MA, USA},
   numpages = {15},
   doi = {10.1145/3278532.3278564},
   publisher = {ACM},
   address = {New York, NY, USA},
}
@inproceedings{zirngibl2022rustyclusters,
   title = {Rusty Clusters? Dusting an IPv6 Research Foundation},
   author = {Zirngibl, Johannes and Steger, Lion and Sattler, Patrick and Gasser, Oliver and Carle, Georg},
   booktitle = {Proceedings of the 2022 Internet Measurement Conference},
   year = {2022},
   location = {Nice, France},
   numpages = {15},
   doi = {10.1145/3517745.3561440},
   publisher = {ACM},
   address = {New York, NY, USA},
}
@inproceedings{steger2023targetacquired,
   title = {Target Acquired? Evaluating Target Generation Algorithms for IPv6},
   author = {Steger, Lion and Kuang, Liming and Zirngibl, Johannes and Carle, Georg and Gasser, Oliver},
   booktitle = {Proceedings of the Network Traffic Measurement and Analysis Conference (TMA)},
   year = {2023},
   month = jun,
   location = {Naples, Italy},
   publisher = {},
}

The same reference applies to the open and registered service. [bib], [bib] and [bib]

Software and Tools

During our IPv6 hitlist analysis we developed software to analyze and understand IPv6 hitlist. We publish the following software and tools for use by the scientific community:

ZMapv6

We extend the original ZMap to add IPv6 capabilities. ZMapv6 supports the following new IPv6-specific probe modules: ZMapv6 can read IPv6 target addresses from a file or from standard input.
Source: github.com/tumi8/zmap

zesplot

zesplot is a tool to visualize IPv6 networks. It uses the concept of squarified treemaps and plots IPv6 networks in a space-filling way. Note that unlike a Hilbert curve visualizing IPv4 address space, zesplot does not plot the entire IPv6 address space.
Source: github.com/zesplot/zesplot

Entropy Clustering

Entropy Clustering is a software to find and visualize clusters in IPv6 addressing schemes.
Source: github.com/pforemski/entropy-clustering

Entropy/IP

Entropy/IP is a software to find patterns in IPv6 addresses and generate addresses based on these patterns. Entropy/IP was presented during the 2016 Internet Measurement Conference. For more information see the Entropy/IP website. The Entropy/IP software is published by Akamai.
Source: github.com/akamai/entropy-ip

New Entropy/IP Generator

This generator uses the output of Entropy/IP to generate IPv6 addresses which follow the specified model.
Source: github.com/pforemski/eip-generator

Longest prefix matching for aliased prefixes

To make use of our published lists of aliased and non-aliased prefixes for your custom IPv6 address list, you need to perform longest prefix matching. This ensures that your addresses are matched to the longest aliased or non-aliased prefix. For your convenience we publish a simple Python script and a Go tool for this purpose.
Python source: aliases-lpm.py
Go source: aliases-lpm.go

Distance Clustering

This method generates IPv6 address candidates from a seed list. It extends more densely clustered address regions that show high entropy in the last nibble(s) of the address. Note, these regions are not fully responsive but only densely populated.
Python source: distance-clustering

GFW filer

We provide a script to filter the output of UDP/53 scans from the impact of the GFW.
Python source: filter_gfw.py

Target Acquired? Evaluating Target Generation Algorithms for IPv6

Abstract. Internet measurements are a crucial foundation of IPv6-related research. Due to the infeasibility of full address space scans for IPv6 however, those measurements rely on collections of reliably responsive, unbiased addresses, as provided e.g., by the IPv6 Hitlist service. Although used for various use cases, the hitlist provides an unfiltered list of responsive addresses, the hosts behind which can come from a range of different networks and devices, such as web servers, customer-premises equipment (CPE) devices, and Internet infrastructure.
In this paper, we demonstrate the importance of tailoring hitlists in accordance with the research goal in question. By using PeeringDB we classify hitlist addresses into six different network categories, uncovering that 42% of hitlist addresses are in ISP networks. Moreover, we show the different behavior of those addresses depending on their respective category, e.g., ISP addresses exhibiting a relatively low lifetime. Furthermore, we analyze different Target Generation Algorithms (TGAs), which are used to increase the coverage of IPv6 measurements by generating new responsive targets for scans. We evaluate their performance under various conditions and find generated addresses to show vastly differing responsiveness levels for different TGAs.
Paper. Read the final version of our paper here: [PDF]
Authors. Lion Steger, Liming Kuang, Johannes Zirngibl, Georg Carle, Oliver Gasser.

Rusty Clusters? Dusting an IPv6 Research Foundation

Abstract. The long-running IPv6 Hitlist service is an important foundation for IPv6 measurement studies. It helps to overcome infeasible, complete address space scans by collecting valuable, unbiased IPv6 address candidates and their responsiveness. However, the Internet itself is a quickly changing ecosystem that can effect long-running services, potentially inducing new biases and obscurities into ongoing data collection means. Frequent analyses but also updates are necessary to enable a valuable service to the community.
In this paper, we show that the existing hitlist is highly impacted by the Great Firewall of China and we offer a cleaned view on its development. While the accumulated input shows an increasing bias towards some networks, the cleaned set of responsive addresses is well distributed and shows a steady increase.
Although it is a best practice to remove aliased prefixes from IPv6 hitlists, we show that this removes not only single hosts responsive to complete prefixes, but also major content delivery networks. More than 98% of all IPv6 addresses announced by Fastly are labeled as aliased and Cloudflare prefixes hosting more than 10M domains are excluded. Depending on the hitlist usage, e.g., higher layer protocol scans, inclusion of addresses from these providers can be valuable.
Lastly, we evaluate different new address candidate sources, including target generation algorithms. We show that a combination of different methodologies is able to identify 5.6M new, responsive addresses. This accounts for an increase by 174% and combined with the current IPv6 Hitlist, we identify 8.8M responsive addresses.
Paper. Read the final version of our paper here: [PDF]
Authors. Johannes Zirngibl, Lion Steger, Patrick Sattler, Oliver Gasser, Georg Carle.

Clusters in the Expanse: Understanding and Unbiasing IPv6 Hitlists

Abstract. Network measurements are an important tool in understanding the Internet. Due to the expanse of the IPv6 address space, exhaustive scans as in IPv4 are not possible for IPv6. In recent years, several studies have proposed the use of target lists of IPv6 addresses, called IPv6 hitlists.
In this paper, we show that addresses in IPv6 hitlists are heavily clustered. We present novel techniques that allow IPv6 hitlists to be pushed from quantity to quality. We perform a longitudinal active measurement study over 6 months, targeting more than 50 M addresses. We develop a rigorous method to detect aliased prefixes, which identifies 1.5 % of our prefixes as aliased, pertaining to about half of our target addresses. Using entropy clustering, we group the entire hitlist into just 6 distinct addressing schemes. Furthermore, we perform client measurements by leveraging crowdsourcing.
To encourage reproducibility in network measurement research and to serve as a starting point for future IPv6 studies, we publish source code, analysis tools, and data.
Paper. Read the final version of our paper at arXiv.org: [abstract] and [PDF].
Authors. Oliver Gasser, Quirin Scheitle, Paweł Foremski, Qasim Lone, Maciej Korczyński, Stephen D. Strowes, Luuk Hendriks, Georg Carle.

Additional Plots

We provide additional plots for in-depth analysis accompanying the evaluations in our paper.

Interactive zesplot plots

Entropy clustering plots

Responsiveness over time

Reproducibility

Target Acquired? Evaluating Target Generation Algorithms for IPv6

We publish data and scripts to reproduce our analysis at the TUM university library to guarantee long-term availability.
Dataset DOI: 10.14459/2023mp1709953

NOTE: The dataset is missing some scripts needed for the reproducibility

Please find the scripts, without the data, in our GitHub repository

Rusty Clusters? Dusting an IPv6 Research Foundation

We publish data and scripts to reproduce our analysis at the TUM university library to guarantee long-term availability.
Dataset DOI: 10.14459/2022mp1686542

Clusters in the Expanse: Understanding and Unbiasing IPv6 Hitlists

We publish data and scripts to reproduce our analysis at the TUM university library to guarantee long-term availability.
Dataset DOI: 10.14459/2018mp1452739

Reproducibility update 2019-04-10 We update the requirements.txt file and provide additional information in the README.md file. You can download the updated files from this directory.

SIGCOMM Artifacts Evaluation Badges We were awarded all three badges by the SIGCOMM Artifacts Evaluation Committee underlining that our published data and scripts are available, functional, and reusable.

SIGCOMM Artifacts Evaluation available badge
SIGCOMM Artifacts Evaluation functional badge
SIGCOMM Artifacts Evaluation reusable badge

Contact

Oliver Gasser: oliver.gasser [AT] mpi-inf.mpg.de
Johannes Zirngibl: zirngibl [AT] net.in.tum.de
Lion Steger: stegerl [AT] net.in.tum.de

Partners

TUM logo
IITiS logo
TU Delft logo
University Grenoble Alpes logo
RIPE NCC logo
UTwente logo
MPI logo

External Data Providers

IPinfo logo
Rapid7 logo
RIPE Atlas logo