Browse code

Add Ahmia project idea

GSoC project idea from Juha Nurmi.

Damian Johnson authored on31/01/2017 17:52:35
Showing1 changed files
... ...
@@ -1114,6 +1114,68 @@ ideas.
1114 1114
     </p>
1115 1115
     </li>
1116 1116
 
1117
+    <a id="ahmiaSearch"></a>
1118
+    <li>
1119
+    <b>Ahmia - Hidden Service Search</b>
1120
+    <br>
1121
+    Language: <i>Python, Django</i>
1122
+    <br>
1123
+    Likely Mentors: <i>Juha Nurmi (numes), George (asn)</i>
1124
+    <p>
1125
+    Ahmia is open-source search engine software for Tor hidden service deep
1126
+    dark web sites. You can test the running search engine at ahmia.fi. For
1127
+    more information see our <a
1128
+    href="https://blog.torproject.org/category/tags/ahmiafi">blog post about
1129
+    Ahmia's GSoC2014 development</a>.
1130
+    </p>
1131
+
1132
+    <p>
1133
+    Ahmia is a working search engine that indexes, searches, and catalogs
1134
+    content published on Tor Hidden Services. Furthermore, it is an environment
1135
+    to share meaningful insights, statistics, insights, and news about the Tor
1136
+    network itself. In this context, there is a lot of work to do.
1137
+    </p>
1138
+
1139
+    <p>
1140
+    The Ahmia web service is written using the Django web framework. As a
1141
+    result, the server-side language is Python. On the client-side, most of the
1142
+    pages are plain HTML. There are some pages that require JavaScript, but the
1143
+    search itself works without client-side JavaScript.
1144
+    </p>
1145
+
1146
+    <p>
1147
+    There are several possible directions for this project, including...
1148
+    </p>
1149
+
1150
+    <ol>
1151
+      <li>Automate blacklisting (very important)<br />
1152
+        <ul>
1153
+          <li>Fetch a list of child abuse media sites</li>
1154
+          <li>Remove these sites from the search results</li>
1155
+        </ul>
1156
+      </li>
1157
+      <li>Add hidden services funtion (very important)<br />
1158
+        <ul>
1159
+          <li>You can add onions using HTML form</li>
1160
+          <li>Call the crawler immidiately when a new site is added</li>
1161
+        </ul>
1162
+      </li>
1163
+      <li>Elasticsearch<br />
1164
+        <ul>
1165
+          <li>Must be updated to 5.X.X sooner or later</li>
1166
+          <li>Adjust the settings</li>
1167
+          <li>Automatically remove data older than, for instance, 90 days</li>
1168
+        </ul>
1169
+      </li>
1170
+      <li>Maintainance<br />
1171
+        <ul>
1172
+          <li>Update all software dependencies</li>
1173
+          <li>Automate crash recovery for Tor, Elasticsearch and crawler</li>
1174
+        </ul>
1175
+      </li>
1176
+    </ol>
1177
+    </li>
1178
+
1117 1179
 <!--
1118 1180
     <a id=""></a>
1119 1181
     <li>