Abstract:
The TorDNSEL project is concerned with identifying individual hosts as valid and accessible Tor exit relays. Each Tor exit relay has an associated exit policy governing what traffic may leave the Tor circuit and go out as requests to the internet. A public database that can be easily queried or scraped would be of huge benefit to the Tor community and to services that are interested in whether clients originate from the Tor network, such as Wikipedia and IRC networks.
My primary interest is in the TorDNSEL rewrite. Currently unmaintained and written in Haskell, I would like to rework it from the ground up using Python and the Torflow interface.
To ensure the honesty of advertised exit nodes, the program must actively build circuits over the Tor network to the intended exit node and verify the IP address and exit policies listed in the cached router list. This will be accomplished via TorFlow, most likely using the NetworkScanners tool SoaT. If more detailed checking is required than provided by SoaT, it will be modified and extended to suit the new requirements.
Since new relays entering the Tor network are almost immediately available for use, it is important that new relays are checked and added as quickly as possible. Testing ExitPolicy honesty may be time-consuming for certain relays, destinations, and services.
To improve exit honesty checking latency, hosts that have complex exit policies may be checked incrementally; for example, if popular services such as http/https/domain/ssh are allowed by the relay's ExitPolicy, these should be checked first and the relay can be marked as "honest (preliminary)", so that partial results may be listed earlier pending more thorough circuit testing.
Scraping of all known exit relays and their exit policies.
As noted in the Tor dir-spec v3, in the future it will not scale to have every Tor client/directory cache know the IP of every other router. We need to be able to accurately obtain this data, up-to-date and in bulk, from an authoritative source. The DNSEL should be able to verify all exit relays, in the shortest time span allowable.Currently, users can check the TorBulkExitList on check.torproject.org or perform DNS queries against exitlist.torproject.org, but this is not ideal for all consumers of this data; currently it is expensive to perform these queries and they must be done one at a time over the network. While this is less of an issue for services that need to perform infrequent exit checks, I propose that this mechanism can actually harm anonymity, as an adversary that can track these queries (those with access to the network of the querying service, or those running rogue DNSEL implementations, should it become more distributed) can determine, for example, what exit nodes are currently serving a large amount of IRC connections. This is more important for services that do frequent queries on a wide range of IP addresses. By encouraging these queries to be done locally, we can improve network latency, throughput, and anonymity together.
On the other hand, for some services, requiring services to download the entire exit set when they only need to query a few addresses daily would be a waste of valuable resources on both ends. Thus, both single-exit queries and bulk exit description lists must be provided; raw queries would be used by services that are simply doing manual checks of possible Tor addresses, or infrequent automated checks.
The primary consumers of bulk data are services that need to do frequent automated checks and benefit strongly from local caching of data. Some prominent examples would be:
A particularly important goal is to ensure that Tor users that also happen to run Tor relays are not automatically blocked by services simply because they are relays or exit nodes. If a service is able to ascertain that an IP address corresponds to a Tor relay, but its exit policy would not allow traffic to access the service from Tor, ideally it should not block access to its resources. The cheaper and easier it is for a service operator to validate this kind of information, the more likely the service is to use it and collaborate with the Tor community. The more the service is used, the more likely we are to get feedback, whether it be able the data format, false positives or negatives, invalid/incorrect exit policies, etc.
For each weekly milestone, appropriate documentation should be written to coincide with the completed work. For example, end-user tools should have manual pages at the very least, and preferably include a LaTeX manual. Milestones that are primarily experimental in nature should include complete descriptions and proposals in plain-text where appropriate. All source code will be thoroughly commented and include documentation useful to developers.
April 26 - May 24 (Pre-SoC): Get up to speed with Tor directory and caching architecture, pick apart existing Haskell implementation of TorDNSEL, and master TorFlow.
May 31 (end of week 1): Have a working mechanism for compiling as much testable information about exit relays as possible. This data must be easily accessible for subsequent work. This may be taken, adapted, or abstracted from existing data directory crawling in TorFlow.
June 14 (week 3): Working implementation of tests using TorFlow, especially ExitAuthority tools. This will probably be the most time-consuming period; may take up to a week more than anticipated.
June 21 (week 4): Be able to produce consistent, constantly updating exit lists with tested and untested exit policies listed. Find Tor developer guinea pigs to test and hunt for glaring holes in exit relay honesty testing and verification. :)
June 28 (week 5): Begin proof-of-concept production of bulk data formats (raw, SQLite and JSON), all of which should be similar in format. Consultations should be made with consumers of such data (Freenode, Wikipedia, etc.) to ensure the current data presentation is not overreaching or missing information that would be useful to them.
July 12 (week 7): Integrate existing functionality and data access methods into a Python API that is usable for consumers and the DNSEL application itself. Style should be similar to TorFlow where possible.
July 16 (midterm evals): Completed specifications of TorDNSEL operation, basic data formats, and delivery methods. Completed first proof-of-concept implementation. First major review with mentors and as much of the Tor developer community as time permits.
July 19 (week 8): Work on designing and testing exit list cache update mechanisms. Start with something similar to cached-descriptors.new journaling, and work up to something for useful for other data formats. Integrate mechanism into API.
July 26 (week 9): Solidify main scrape/check application and perform as much real-world testing as time permits, adjusting for major setbacks, if any.
August 2 (week 10): Make adjustments based on feedback from (hopefully) several real-world consumers of TorDNSEL data. Generally polish and improve usability of core application(s).
August 9 ("pencils down"):: Start pumping out documentation and comprehensive code and review.
August 16 ("okay really, pencils down"):Major remaining kinks should be ironed out; polish specification and documentation and begin writing final evaluations. Plan for future maintenance of TorDNSEL.
Code from almost any project I've worked on is available at http://git.spanning-tree.org/. Some of my better code:
The Tor Project interests me primarily from architectural and information security perspectives; my primary focus in information security has always been authentication and authorization - verifying the identity of a user to explicitly or implicitly control access to machine and network resources. The goal of all forms of public-key and secure hash cryptography is the authentication of a third party or data, essentially pinning their identity down.
Tor greatly interests me because it has the opposite goal; it tries to ensure that pinning down the identity of any particular user is (ideally) impossible or at least greatly hindered for any non-global adversary. Protecting the rights of network users by preserving their anonymity is an incredibly important and complicated goal, and Tor's role in increasing anonymity of internet access in the face of many types of adversaries is extremely valuable. To this end, I hope that my contributions will be found useful by the Tor project, its users, and those working to protect these end users.
While nearly all of the projects I've worked on have been free software, my experience working directly with the free software community at large is minimal. I have contributed briefly to the KDE project, working on their display configuration application, and submitted patches to other open source projects (QoSient's Argus netflow tools and Google's ipaddr-py, for example). I have collaborated with various universities in New England on development of the Nautilus project (http://nautilus.oshean.org/) and its main subproject, Periscope (http://nautilus.oshean.org/wiki/Periscope), while working at the OSHEAN non-profit consortium.
I sincerely look forward to working with the vibrant development community of the Tor project and hope to gain more experience in collaborating with an experienced group of developers.
I will be working part-time at the University of Rhode Island Information Security Office, and will have one summer class for five weeks starting in late May. I don't anticipate either will significantly affect my involvement with the Tor project.
While I am confident I can produce a working initial implementation of dnsel in the time allotted, I anticipate it will need more work at the end of summer. One of my primary goals for the dnsel project is to make it easier to maintain, as its operation will have to be adjusted to fit with changes in the Tor architecture. Making the project more accessible to other maintainers will allow for greater collaboration and improvements to dnsel where development on the current implementation has stagnated.
I will do my best to communicate with my mentors and the Tor developer community at large as frequently and directly as possible, via #tor-dev and the mailing lists. I also hope to inform others of more major milestones in the project via a blog or web page, and keep detailed documentation and progress updates on the Tor wiki.
I am currently attending the University of Rhode Island. This is my fourth year in college and second at URI; I am a Computer Engineering major, intending to graduate next year and obtain my masters degree the following year. My primary interests are low-level software development and systems programming, networking, information security, and signal processing.
You can contact me at hbock@ele.uri.edu; my nickname on IRC is hbock.