Tor: Volunteer

Seven things everyone can do now:

We need users like you to try Tor out, and let the Tor developers know about bugs you find or features you don't find.
Please consider running a server to help the Tor network grow.
We especially need people with Windows programming skills to run an exit server on Windows, to help us debug.
Run a Tor hidden service and put interesting content on it.
Take a look at the Tor GUI Competition, and come up with ideas or designs to contribute to making Tor's interface and usability better. Free T-shirt for each submission!
Tell your friends! Get them to run servers. Get them to run hidden services. Get them to tell their friends.
Consider joining the Electronic Frontier Foundation. More EFF donations means more freedom in the world, including more Tor development.

Installers

Extend our NSIS-based Windows installer to include Privoxy. Include a preconfigured config file to work well with Tor. We might also want to include FreeCap -- is it stable enough and useful enough to be worthwhile?
Develop a way to handle OS X uninstallation that is more automated than telling people to manually remove each file.
Our RPM spec file needs a maintainer, so we can get back to the business of writing Tor. If you have RPM fu, please help out.

Usability and Interface

We need a way to intercept DNS requests so they don't "leak" while we're trying to be anonymous. (This happens because the application does the DNS resolve before going to the SOCKS proxy.) One option is to use Tor's built-in support for doing DNS resolves; but you need to ask via our new socks extension for that, and no applications do this yet. A nicer option is to use Tor's controller interface: you intercept the DNS resolve, tell Tor about the resolve, and Tor replies with a dummy IP address. Then the application makes a connection through Tor to that dummy IP address, and Tor automatically maps it back to the original query.
People running servers tell us they want to have one BandwidthRate during some part of the day, and a different BandwidthRate at other parts of the day. Rather than coding this inside Tor, we should have a little script that speaks via the Tor Controller Interface, and does a setconf to change the bandwidth rate. Perhaps it would run out of cron, or perhaps it would sleep until appropriate times and then do its tweak (that's probably more portable). Can somebody write one for us and we'll put it into tor/contrib/?
We have a variety of ways to exit the Tor network from a particular country, but they all require specifying the nickname of a particular Tor server. It would be nice to be able to specify just a country, and have something automatically pick. This requires having some component that knows what country each Tor node is in. The script on serifos manually parses whois entries for this. Maybe geolocation data will also work?
Speaking of geolocation data, somebody should draw a map of the Earth with a pin-point for each Tor server. Bonus points if it updates as the network grows and changes.
Tor provides anonymous connections, but we don't support keeping multiple pseudonyms in practice (say, in case you frequently go to two websites and if anybody knew about both of them they would conclude it's you). We should find a good approach and interface for handling pseudonymous profiles in Tor. See this post and followup for details.

Documentation

Please volunteer to help maintain this website: code, content, css, layout. Step one is to hang out on the IRC channel until we get to know you.
We have too much documentation --- it's spread out too much and duplicates itself in places. Please send us patches, pointers, and confusions about the documentation so we can clean it up.
Help translate the web page and documentation into other languages. See the translation guidelines if you want to help out. We also need people to help maintain the existing (Italian and German) translations.
Investigate privoxy vs. freecap vs. sockscap for win32 clients. Are there usability or stability issues that we can track down and resolve, or at least inform people about?
Can somebody help Matt Edman with the documentation and how-tos for his Windows Tor Controller?
Evaluate, create, and document a list of programs that can be routed through Tor.
We need better documentation for dynamically intercepting connections and sending them through Tor. tsocks (Linux) and freecap (Windows) seem to be good candidates.
We have a huge list of potentially useful programs that interface to Tor. Which ones are useful in which situations? Please help us test them out and document your results.

Coding and Design

We recommend Privoxy as a good scrubbing web proxy, but it's unmaintained and still has bugs, especially on Windows. While we're at it, what sensitive information is not kept safe by Privoxy? Are there other scrubbing web proxies that are more secure?
tsocks appears to be unmaintained: we have submitted several patches with no response. Can somebody volunteer to start maintaining a new tsocks branch? We'll help.
Some popular clients that people use with Tor include Gaim and xchat. These programs support socks, but they don't support socks4a or socks5-with-remote-dns. Please write a patch for them and submit it to the appropriate people. Let us know if you've written the patch but you're having trouble getting it accepted.
Right now the hidden service descriptors are being stored on just a few directory servers. This is bad for privacy and bad for robustness. To get more robustness, we're going to need to make hidden service descriptors even less private because we're going to have to mirror them onto many places. Ideally we'd like to separate the storage/lookup system from the Tor directory servers entirely. Any reliable distributed storage system will do, as long as it allows authenticated updates. As far as we know, no implemented DHT code supports authenticated updates. What's the right next step?
Tor exit servers need to do many DNS resolves in parallel. But gethostbyname() is poorly designed --- it blocks until it has finished resolving a query --- so it requires its own thread or process. So Tor is forced to spawn many separate DNS "worker" threads. There are some asynchronous DNS libraries out there, but historically they are buggy and abandoned. Are any of them stable, fast, clean, and free software? (Remember, Tor uses OpenSSL, and OpenSSL is (probably) not compatible with the GPL, so any GPL libraries are out of the running.) If so (or if we can make that so), we should integrate them into Tor. See Agl's post for one potential approach. Also see c-ares and libdnsres.
Tor 0.1.1.x includes support for hardware crypto accelerators via OpenSSL. Nobody has ever tested it, though. Does somebody want to get a card and let us know how it goes?
Long ago, we added dmalloc support to Tor, to track leaks. But we never quite got it working. Is dmalloc unfit for the job? Look at the --with-dmalloc configure option and go from there.
Because Tor servers need to store-and-forward each cell they handle, high-bandwidth Tor servers end up using dozens of megabytes of memory just for buffers. We need better heuristics for when to shrink/expand buffers. Maybe this should be modelled after the Linux kernel buffer design, where you have many smaller buffers that link to each other, rather than monolithic buffers?
How do ulimits work on Win32, anyway? We're having problems, especially on older Windowses with people running out of file descriptors, connection buffer space, etc. (We should handle WSAENOBUFS as needed, look at the MaxConnections registry entry, look at the MaxUserPort entry, and look at the TcpTimedWaitDelay entry. We may also want to provide a way to set them as needed. See bug 98.)
Encrypt identity keys on disk, and implement passphrase protection for them. Right now they're just stored in plaintext.
Patches to Tor's autoconf scripts. First, we'd like our configure.in to handle cross-compilation, e.g. so we can build Tor for obscure platforms like the Linksys WRTG54. Second, we'd like the with-ssl-dir option to disable the search for ssl's libraries.
Implement reverse DNS requests inside Tor (already specified in Section 5.4 of tor-spec.txt).
Perform a security analysis of Tor with "fuzz". Determine if there good fuzzing libraries out there for what we want. Win fame by getting credit when we put out a new release because of you!
How hard is it to patch bind or a DNS proxy to redirect requests to Tor via our tor-resolve socks extension? What about to convert UDP DNS requests to TCP requests and send them through Tor?
Tor uses TCP for transport and TLS for link encryption. This is nice and simple, but it means all cells on a link are delayed when a single packet gets dropped, and it means we can only reasonably support TCP streams. We have a list of reasons why we haven't shifted to UDP transport, but it would be great to see that list get shorter.
We're not that far from having IPv6 support for destination addresses (at exit nodes). If you care strongly about IPv6, that's probably the first place to start.

Research

The "website fingerprinting attack": make a list of a few hundred popular websites, download their pages, and make a set of "signatures" for each site. Then observe a Tor client's traffic. As you watch him receive data, you quickly approach a guess about which (if any) of those sites he is visiting. First, how effective is this attack on the deployed Tor codebase? Then start exploring defenses: for example, we could change Tor's cell size from 512 bytes to 1024 bytes, we could employ padding techniques like defensive dropping, or we could add traffic delays. How much of an impact do these have, and how much usability impact (using some suitable metric) is there from a successful defense in each case?
The "end-to-end traffic confirmation attack": by watching traffic at Alice and at Bob, we can compare traffic signatures and become convinced that we're watching the same stream. So far Tor accepts this as a fact of life and assumes this attack is trivial in all cases. First of all, is that actually true? How much traffic of what sort of distribution is needed before the adversary is confident he has won? Are there scenarios (e.g. not transmitting much) that slow down the attack? Do some traffic padding or traffic shaping schemes work better than others?
The "run two servers and wait attack": Tor clients pick a new path periodically. If the adversary runs an entry and an exit, eventually some Alice will build a circuit that begins and ends with his nodes. The current Tor threat model assumes the end-to-end traffic confirmation attack is trivial, and instead aims to limit the chance that the adversary will be able to see both sides of a circuit. One way to help this is helper nodes -- Alice picks a small set of entry nodes and uses them always. But in reality, Tor nodes disappear sometimes. So it would seem that the attack continues, albeit slower than before. How much slower?
The "routing zones attack": most of the literature thinks of the network path between Alice and her entry node (and between the exit node and Bob) as a single link on some graph. In practice, though, the path traverses many autonomous systems (ASes), and it's not uncommon that the same AS appears on both the entry path and the exit path. Unfortunately, to accurately predict whether a given Alice, entry, exit, Bob quad will be dangerous, we need to download an entire Internet routing zone and perform expensive operations on it. Are there practical approximations, such as avoiding IP addresses in the same /8 network?
Tor doesn't work very well when servers have asymmetric bandwidth (e.g. cable or DSL). Because Tor has separate TCP connections between each hop, if the incoming bytes are arriving just fine and the outgoing bytes are all getting dropped on the floor, the TCP push-back mechanisms don't really transmit this information back to the incoming streams. Perhaps Tor should detect when it's dropping a lot of outgoing packets, and rate-limit incoming streams to regulate this itself? I can imagine a build-up and drop-off scheme where we pick a conservative rate-limit, slowly increase it until we get lost packets, back off, repeat. We need somebody who's good with networks to simulate this and help design solutions; and/or we need to understand the extent of the performance degradation, and use this as motivation to reconsider UDP transport.
A related topic is congestion control. Is our current design sufficient once we have heavy use? Maybe we should experiment with variable-sized windows rather than fixed-size windows? That seemed to go well in an ssh throughput experiment. We'll need to measure and tweak, and maybe overhaul if the results are good.
To let dissidents in remote countries use Tor without being blocked at their country's firewall, we need a way to get tens of thousands of relays, not just a few hundred. We can imagine a Tor client GUI that has a "help China" button at the top that opens a port and relays a few KB/s of traffic into the Tor network. (A few KB/s shouldn't be too much hassle, and there are few abuse issues since they're not being exit nodes.) But how do we distribute a list of these volunteer clients to the good dissidents in an automated way that doesn't let the country-level firewalls intercept and enumerate them? Probably needs to work on a human-trust level.
Tor circuits are built one hop at a time, so in theory we have the ability to make some streams exit from the second hop, some from the third, and so on. This seems nice because it breaks up the set of exiting streams that a given server can see. But if we want each stream to be safe, the "shortest" path should be at least 3 hops long by our current logic, so the rest will be even longer. We need to examine this performance / security tradeoff.
It's not that hard to DoS Tor servers or dirservers. Are client puzzles the right answer? What other practical approaches are there? Bonus if they're backward-compatible with the current Tor protocol.

Drop by the #tor IRC channel at irc.oftc.net or email tor-volunteer@freehaven.net if you want to help out!