Adding Karsten's metrics project to the volunteer page
Damian Johnson

Damian Johnson commited on 2012-02-27 16:08:12
Zeige 1 geänderte Dateien mit 24 Einfügungen und 0 Löschungen.

... ...
@@ -543,6 +543,11 @@ meetings around the world.</li>
543 543
     Karsten Loesing.
544 544
     </p>
545 545
     
546
+    <p>
547
+    <b>Project Ideas:</b><br />
548
+    <i><a href="#metricsSearch">Searchable Tor descriptor and Metrics data archive</a></i> (Python/Django?)
549
+    </p>
550
+    
546 551
     <a id="project-torstatus"></a>
547 552
     <h3><a href="https://trac.torproject.org/projects/tor/wiki/projects/TorStatus">TorStatus</a> (<a
548 553
     href="https://gitweb.torproject.org/torstatus.git">code</a>)</h3>
... ...
@@ -968,6 +973,25 @@ meetings around the world.</li>
968 973
     </li>
969 974
     -->
970 975
     
976
+    <a id="metricsSearch"></a>
977
+    <li>
978
+    <b>Searchable Tor descriptor and Metrics data archive</b>
979
+    <br>
980
+    Priority: <i>Medium</i>
981
+    <br>
982
+    Effort Level: <i>Medium</i>
983
+    <br>
984
+    Skill Level: <i>Medium</i>
985
+    <br>
986
+    Likely Mentors: <i>Karsten</i>
987
+    <p>The <a href="https://metrics.torproject.org/data.html">Metrics data archive</a> of Tor relay descriptors and other Tor-related network data has grown to over 100G in size, bz2-compressed.  We have developed two search interfaces: the <a href="https://metrics.torproject.org/relay-search.html">relay search</a> finds relays by nickname, fingerprint, or IP address in a given month; <a href="https://metrics.torproject.org/exonerator-beta.html">ExoneraTor</a> finds whether a given IP address was a relay on a given day.</p>
988
+    
989
+    <p>We'd like to have a more general search application for Tor descriptors and metrics data.  There are more <a href="https://metrics.torproject.org/formats.html">descriptor types</a> that we'd like to include in the search.  The search application should handle most of them and understand some semantics like what's a timestamp, what's an IP address, and what's a link to another descriptor.  Users should then be able to search for arbitrary strings or limit their search to given time periods or IP address ranges.  Descriptors that reference other descriptors should contain links, and descriptors should be able to say from where they are linked.  The goal is to make the archive easily browsable.</p>
990
+    
991
+    <p>The search application shall be separate from the metrics website and shouldn't rely on the metrics website codebase.  The search application will contain hourly updated descriptor data from the metrics website via rsync.  Programming language and database system are not specified yet, though there's a slight preference for Python/Django and Postgres for maintenance reasons.  If there are good reasons to pick something else, e.g, some NoSQL variant or some search application framework, that's fine, too.  Further requirements are that lookups should be really fast and that changes to the search application can be implemented in reasonable time.</p>
992
+    
993
+    <p>Applications for this project should come with a design of the proposed search application, ideally with a proof-of-concept based on a subset of the available data to show that it will be able to handle the 100G+ of data.</p>
994
+    
971 995
     <a id="unitTesting"></a>
972 996
     <li>
973 997
     <b>Improve our unit testing process</b>
974 998