On this website we present additional information about our IMC papers "Clusters in the Expanse: Understanding and Unbiasing IPv6 Hitlists" and "Rusty Clusters? Dusting an IPv6 Research Foundation", and provide access to our IPv6 Hitlist Service.
IPv6 Hitlist Service
We provide an IPv6 Hitlist Service where we publish responsive IPv6 addresses, address categories, aliased prefixes, and non-aliased prefixes to interested researchers.
The IPv6 Hitlist Service consists of an openly accessible one and a registration-first service.
Hitlist addresses
This graph shows the development of the full, aliased and non-aliased hitlist over time.
Responsive addresses
Here, the development of the different protocol responses over time is shown. We scan five different protocols, an additional graph shows the amount of IP addresses which respond to at least one of the protocols.
The plot of responsive addresses is an updated version.
We removed all responses to the UDP/53 scan injected by the Great Firewall of China.
The original plot and additional information is provided here:
We provide both filtered and unfiltered, historic UPD/53 files for registered users.
Categorized addresses
Lastly, the network categories represented in the Hitlist can be seen.
Via the dropdown the categories represented in the hitlist, the total responsive addresses, or the responsive addresses per protocol may be chosen.
The categories are filtered with PeeringDB. For the details, see the 2023 Paper.
Select the data type:
Openly Accessible Service
You can use the weekly generated list of responsive IPv6 addresses, aliased prefixes, and non-aliased prefixes without registration:
The responsive addresses include addresses from non-aliased prefixes only.
Please see the notes about aliased prefixes below to make use of them.
Registration-First Service
We provide additional data which can be used to conduct in-depth research on IPv6 networks and addresses.
This includes:
Daily input addresses for scanning, including aliased and non-responsive addresses
Daily responsive IPv6 addresses, aliased prefixes, and non-aliased prefixes
Raw ZMap output files including TCP options (e.g. usable for further alias analysis)
Access to complete historical data
To get free access to this registration-first service, you can send a quick
registration email.
We use the gathered data for statistical purposes and might very occasionally send a survey or other requests for feedback.
Referencing the Hitlist Service
If you are using data from the IPv6 Hitlist Service in your publication, please refer to it with the following references:
@inproceedings{gasser2018clusters,
title = {Clusters in the Expanse: Understanding and Unbiasing IPv6 Hitlists},
author = {Gasser, Oliver and Scheitle, Quirin and Foremski, Pawel and Lone, Qasim and Korczynski, Maciej and Strowes, Stephen D. and Hendriks, Luuk and Carle, Georg},
booktitle = {Proceedings of the 2018 Internet Measurement Conference},
year = {2018},
location = {Boston, MA, USA},
numpages = {15},
doi = {10.1145/3278532.3278564},
publisher = {ACM},
address = {New York, NY, USA},
}
@inproceedings{zirngibl2022rustyclusters,
title = {Rusty Clusters? Dusting an IPv6 Research Foundation},
author = {Zirngibl, Johannes and Steger, Lion and Sattler, Patrick and Gasser, Oliver and Carle, Georg},
booktitle = {Proceedings of the 2022 Internet Measurement Conference},
year = {2022},
location = {Nice, France},
numpages = {15},
doi = {10.1145/3517745.3561440},
publisher = {ACM},
address = {New York, NY, USA},
}
@inproceedings{steger2023targetacquired,
title = {Target Acquired? Evaluating Target Generation Algorithms for IPv6},
author = {Steger, Lion and Kuang, Liming and Zirngibl, Johannes and Carle, Georg and Gasser, Oliver},
booktitle = {Proceedings of the Network Traffic Measurement and Analysis Conference (TMA)},
year = {2023},
month = jun,
location = {Naples, Italy},
publisher = {},
}
The same reference applies to the open and registered service. [bib], [bib] and [bib]
Software and Tools
During our IPv6 hitlist analysis we developed software to analyze and understand IPv6 hitlist.
We publish the following software and tools for use by the scientific community:
We extend the original ZMap to add IPv6 capabilities.
ZMapv6 supports the following new IPv6-specific probe modules:
ICMPv6 Echo Request
IPv6 TCP SYN (any port)
IPV6 UDP (any port and payload)
ZMapv6 can read IPv6 target addresses from a file or from standard input.
Source: github.com/tumi8/zmap
zesplot
zesplot is a tool to visualize IPv6 networks. It uses the concept of
squarified treemaps and plots IPv6 networks in a space-filling way. Note
that unlike a Hilbert curve visualizing IPv4 address space, zesplot does
not plot the entire IPv6 address space.
Source: github.com/zesplot/zesplot
Entropy/IP is a software to find patterns in IPv6 addresses and generate addresses based on these patterns.
Entropy/IP was presented during the 2016 Internet Measurement Conference.
For more information see the Entropy/IP website.
The Entropy/IP software is published by Akamai.
Source: github.com/akamai/entropy-ip
New Entropy/IP Generator
This generator uses the output of Entropy/IP to generate IPv6 addresses which follow the specified model.
Source: github.com/pforemski/eip-generator
Longest prefix matching for aliased prefixes
To make use of our published lists of aliased and non-aliased prefixes for your custom IPv6 address list, you need to perform longest prefix matching.
This ensures that your addresses are matched to the longest aliased or non-aliased prefix.
For your convenience we publish a simple Python script and a Go tool for this purpose.
Python source: aliases-lpm.py
Go source: aliases-lpm.go
Distance Clustering
This method generates IPv6 address candidates from a seed list.
It extends more densely clustered address regions that show high entropy in the last nibble(s) of the address.
Note, these regions are not fully responsive but only densely populated.
Python source: distance-clustering
GFW filer
We provide a script to filter the output of UDP/53 scans from the impact of the GFW.
Python source: filter_gfw.py
Target Acquired? Evaluating Target Generation Algorithms for IPv6
Abstract. Internet measurements are a crucial foundation of
IPv6-related research. Due to the infeasibility of full address space
scans for IPv6 however, those measurements rely on collections
of reliably responsive, unbiased addresses, as provided e.g., by
the IPv6 Hitlist service. Although used for various use cases, the
hitlist provides an unfiltered list of responsive addresses, the hosts
behind which can come from a range of different networks and
devices, such as web servers, customer-premises equipment (CPE)
devices, and Internet infrastructure.
In this paper, we demonstrate the importance of tailoring
hitlists in accordance with the research goal in question. By
using PeeringDB we classify hitlist addresses into six different
network categories, uncovering that 42% of hitlist addresses are
in ISP networks. Moreover, we show the different behavior of
those addresses depending on their respective category, e.g., ISP
addresses exhibiting a relatively low lifetime. Furthermore, we
analyze different Target Generation Algorithms (TGAs), which are
used to increase the coverage of IPv6 measurements by generating
new responsive targets for scans. We evaluate their performance
under various conditions and find generated addresses to show
vastly differing responsiveness levels for different TGAs.
Paper. Read the final version of our paper here: [PDF] Authors.Lion Steger,
Liming Kuang,
Johannes Zirngibl,
Georg Carle,
Oliver Gasser.
Rusty Clusters? Dusting an IPv6 Research Foundation
Abstract. The long-running IPv6 Hitlist service is an important
foundation for IPv6 measurement studies. It helps to overcome infeasible,
complete address space scans by collecting valuable, unbiased IPv6 address
candidates and their responsiveness. However, the Internet itself is a
quickly changing ecosystem that can effect long-running services, potentially
inducing new biases and obscurities into ongoing data collection means.
Frequent analyses but also updates are necessary to enable a valuable service
to the community.
In this paper, we show that the existing hitlist is highly impacted by the
Great Firewall of China and we offer a cleaned view on its development. While
the accumulated input shows an increasing bias towards some networks, the
cleaned set of responsive addresses is well distributed and shows a steady
increase.
Although it is a best practice to remove aliased prefixes from IPv6 hitlists,
we show that this removes not only single hosts responsive to complete
prefixes, but also major content delivery networks. More than 98% of all
IPv6 addresses announced by Fastly are labeled as aliased and Cloudflare
prefixes hosting more than 10M domains are excluded. Depending on the
hitlist usage, e.g., higher layer protocol scans, inclusion of addresses
from these providers can be valuable.
Lastly, we evaluate different new address candidate sources, including
target generation algorithms. We show that a combination of different
methodologies is able to identify 5.6M new, responsive addresses. This
accounts for an increase by 174% and combined with the current IPv6 Hitlist,
we identify 8.8M responsive addresses.
Paper. Read the final version of our paper here: [PDF] Authors.Johannes Zirngibl,
Lion Steger,
Patrick Sattler,
Oliver Gasser,
Georg Carle.
Clusters in the Expanse: Understanding and Unbiasing IPv6 Hitlists
Abstract. Network measurements are an important tool in understanding the
Internet. Due to the expanse of the IPv6 address space, exhaustive
scans as in IPv4 are not possible for IPv6. In recent years, several
studies have proposed the use of target lists of IPv6 addresses, called
IPv6 hitlists.
In this paper, we show that addresses in IPv6 hitlists are heavily
clustered. We present novel techniques that allow IPv6 hitlists to
be pushed from quantity to quality. We perform a longitudinal
active measurement study over 6 months, targeting more than 50 M
addresses. We develop a rigorous method to detect aliased prefixes,
which identifies 1.5 % of our prefixes as aliased, pertaining to about
half of our target addresses. Using entropy clustering, we group the
entire hitlist into just 6 distinct addressing schemes. Furthermore,
we perform client measurements by leveraging crowdsourcing.
To encourage reproducibility in network measurement research
and to serve as a starting point for future IPv6 studies, we publish
source code, analysis tools, and data.
Paper. Read the final version of our paper at arXiv.org:
[abstract] and
[PDF].
Authors.Oliver Gasser, Quirin Scheitle, Paweł Foremski, Qasim Lone,
Maciej Korczyński, Stephen D. Strowes, Luuk Hendriks, Georg Carle.
Additional Plots
We provide additional plots for in-depth analysis accompanying the evaluations in our paper.
Reproducibility update 2019-04-10
We update the requirements.txt file and provide additional information in the README.md file.
You can download the updated files from this directory.
SIGCOMM Artifacts Evaluation Badges
We were awarded all three badges by the SIGCOMM Artifacts Evaluation Committee underlining that our published data and scripts are available, functional, and reusable.