Understanding and Unbiasing IPv6 Hitlists
On this website we present additional information about our IMC papers
"Clusters in the Expanse: Understanding and Unbiasing IPv6 Hitlists" and
"Rusty Clusters? Dusting an IPv6 Research Foundation", and provide access to our IPv6 Hitlist Service.
IPv6 Hitlist Service
We provide an IPv6 Hitlist Service where we publish
responsive IPv6 addresses, aliased prefixes, and non-aliased prefixes to interested researchers.
The IPv6 Hitlist Service consists of an openly accessible one and a registration-first service.
The plot of responisve addresses is an updated version.
We removed all responses to the UDP/53 scan injected by the Great Firewall of China.
The original plot and additional information is provided here:
We provide both filtered and unfiltered, historic UPD/53 files for registered users.
Openly Accessible Service
You can use the weekly generated list of responsive IPv6 addresses, aliased prefixes, and non-aliased prefixes without registration:
The responsive addresses include addresses from non-aliased prefixes only.
Please see the
notes about aliased prefixes below to make use of them.
Registration-First Service
We provide additional data which can be used to conduct in-depth research on IPv6 networks and addresses.
This includes:
- Daily input addresses for scanning, including aliased and non-responsive addresses
- Daily responsive IPv6 addresses, aliased prefixes, and non-aliased prefixes
- Raw ZMap output files including TCP options (e.g. usable for further alias analysis)
- Access to complete historical data
To get
free access to this registration-first service, you can send a quick
registration email.
We use the gathered data for statistical purposes and might very occasionally send a survey or other requests for feedback.
Referencing the Hitlist Service
If you are using data from the IPv6 Hitlist Service in your publication, please refer to it with the following references:
@inproceedings{gasser2018clusters,
title = {Clusters in the Expanse: Understanding and Unbiasing IPv6 Hitlists},
author = {Gasser, Oliver and Scheitle, Quirin and Foremski, Pawel and Lone, Qasim and Korczynski, Maciej and Strowes, Stephen D. and Hendriks, Luuk and Carle, Georg},
booktitle = {Proceedings of the 2018 Internet Measurement Conference},
year = {2018},
location = {Boston, MA, USA},
numpages = {15},
doi = {10.1145/3278532.3278564},
publisher = {ACM},
address = {New York, NY, USA},
}
@inproceedings{zirngibl2022rustyclusters,
title = {Rusty Clusters? Dusting an IPv6 Research Foundation},
author = {Zirngibl, Johannes and Steger, Lion and Sattler, Patrick and Gasser, Oliver and Carle, Georg},
booktitle = {Proceedings of the 2022 Internet Measurement Conference},
year = {2022},
location = {Nice, France},
numpages = {15},
doi = {10.1145/3517745.3561440},
publisher = {ACM},
address = {New York, NY, USA},
}
The same reference applies to the open and registered service. [bib] and [bib]
During our IPv6 hitlist analysis we developed software to analyze and understand IPv6 hitlist.
We publish the following software and tools for use by the scientific community:
ZMapv6
We extend the original ZMap to add IPv6 capabilities.
ZMapv6 supports the following new IPv6-specific probe modules:
- ICMPv6 Echo Request
- IPv6 TCP SYN (any port)
- IPV6 UDP (any port and payload)
ZMapv6 can read IPv6 target addresses from a file or from standard input.
Source:
github.com/tumi8/zmap
zesplot
zesplot is a tool to visualize IPv6 networks. It uses the concept of
squarified treemaps and plots IPv6 networks in a space-filling way. Note
that unlike a Hilbert curve visualizing IPv4 address space, zesplot does
not plot the entire IPv6 address space.
Source:
github.com/zesplot/zesplot
Entropy Clustering
Entropy Clustering is a software to find and visualize clusters in IPv6 addressing schemes.
Source:
github.com/pforemski/entropy-clustering
Entropy/IP
Entropy/IP is a software to find patterns in IPv6 addresses and generate addresses based on these patterns.
Entropy/IP was presented during the 2016 Internet Measurement Conference.
For more information see the
Entropy/IP website.
The Entropy/IP software is published by Akamai.
Source:
github.com/akamai/entropy-ip
New Entropy/IP Generator
This generator uses the output of Entropy/IP to generate IPv6 addresses which follow the specified model.
Source:
github.com/pforemski/eip-generator
Longest prefix matching for aliased prefixes
To make use of our published lists of aliased and non-aliased prefixes for your custom IPv6 address list, you need to perform longest prefix matching.
This ensures that your addresses are matched to the longest aliased or non-aliased prefix.
For your convenience we publish a simple Python script and a Go tool for this purpose.
Python source:
aliases-lpm.py
Go source:
aliases-lpm.go
Distance Clustering
This method generates IPv6 address candidates from a seed list.
It extends more densely clustered address regions that show high entropy in the last nibble(s) of the address.
Note, these regions are not fully responsive but only densely populated.
Python source:
distance-clustering
GFW filer
We provide a script to filter the output of UDP/53 scans from the impact of the GFW.
Python source:
filter_gfw.py
Rusty Clusters? Dusting an IPv6 Research Foundation
Abstract. The long-running IPv6 Hitlist service is an important
foundation for IPv6 measurement studies. It helps to overcome infeasible,
complete address space scans by collecting valuable, unbiased IPv6 address
candidates and their responsiveness. However, the Internet itself is a
quickly changing ecosystem that can effect long-running services, potentially
inducing new biases and obscurities into ongoing data collection means.
Frequent analyses but also updates are necessary to enable a valuable service
to the community.
In this paper, we show that the existing hitlist is highly impacted by the
Great Firewall of China and we offer a cleaned view on its development. While
the accumulated input shows an increasing bias towards some networks, the
cleaned set of responsive addresses is well distributed and shows a steady
increase.
Although it is a best practice to remove aliased prefixes from IPv6 hitlists,
we show that this removes not only single hosts responsive to complete
prefixes, but also major content delivery networks. More than 98% of all
IPv6 addresses announced by Fastly are labeled as aliased and Cloudflare
prefixes hosting more than 10M domains are excluded. Depending on the
hitlist usage, e.g., higher layer protocol scans, inclusion of addresses
from these providers can be valuable.
Lastly, we evaluate different new address candidate sources, including
target generation algorithms. We show that a combination of different
methodologies is able to identify 5.6M new, responsive addresses. This
accounts for an increase by 174% and combined with the current IPv6 Hitlist,
we identify 8.8M responsive addresses.
Paper. Read the final version of our paper here:
[PDF]
Authors.
Johannes Zirngibl,
Lion Steger,
Patrick Sattler,
Oliver Gasser,
Georg Carle.
Clusters in the Expanse: Understanding and Unbiasing IPv6 Hitlists
Abstract. Network measurements are an important tool in understanding the
Internet. Due to the expanse of the IPv6 address space, exhaustive
scans as in IPv4 are not possible for IPv6. In recent years, several
studies have proposed the use of target lists of IPv6 addresses, called
IPv6 hitlists.
In this paper, we show that addresses in IPv6 hitlists are heavily
clustered. We present novel techniques that allow IPv6 hitlists to
be pushed from quantity to quality. We perform a longitudinal
active measurement study over 6 months, targeting more than 50 M
addresses. We develop a rigorous method to detect aliased prefixes,
which identifies 1.5 % of our prefixes as aliased, pertaining to about
half of our target addresses. Using entropy clustering, we group the
entire hitlist into just 6 distinct addressing schemes. Furthermore,
we perform client measurements by leveraging crowdsourcing.
To encourage reproducibility in network measurement research
and to serve as a starting point for future IPv6 studies, we publish
source code, analysis tools, and data.
Paper. Read the final version of our paper at arXiv.org:
[abstract] and
[PDF].
Authors. Oliver Gasser,
Quirin Scheitle,
Paweł Foremski,
Qasim Lone,
Maciej Korczyński,
Stephen D. Strowes,
Luuk Hendriks,
Georg Carle.
Additional Plots
We provide additional plots for in-depth analysis accompanying the evaluations in our paper.
Interactive zesplot plots
Entropy clustering plots
Responsiveness over time
Reproducibility
Rusty Clusters? Dusting an IPv6 Research Foundation
The dataset will be available soon
We publish data and scripts to reproduce our analysis at the
TUM university library to guarantee long-term availability.
Dataset DOI:
10.14459/2022mp1686542
Clusters in the Expanse: Understanding and Unbiasing IPv6 Hitlists
We publish data and scripts to reproduce our analysis at the
TUM university library to guarantee long-term availability.
Dataset DOI:
10.14459/2018mp1452739
Reproducibility update 2019-04-10
We update the requirements.txt file and provide additional information in the README.md file.
You can download the updated files from this directory.
SIGCOMM Artifacts Evaluation Badges
We were awarded all three badges by the SIGCOMM Artifacts Evaluation Committee underlining that our published data and scripts are available, functional, and reusable.
Oliver Gasser:
oliver.gasser [AT]
mpi-inf.mpg.de
Johannes Zirngibl:
zirngibl [AT]
net.in.tum.de
Partners