Hackathon Lets Students Explore Dark Web

Noah Johnson presents some of his team's findings from the Dark Web Hackathon, photo by Todd Richmond/RAND Corporation

Noah Johnson presents his team's results

Todd Richmond/RAND Corporation

Chart by Pedro Lima

Rouslan Karimov's team analyzed the location of cryptomarket activity

Todd Richmond/RAND Corporation

December 4, 2019

"Have you ever wondered what dark and weird things lurk on the Dark Web? Well wonder no more! Thanks to RAND's Dark Web Observatory, you can now safely plumb the murky depths of the internet."

So began an invitation to participate in the Tech and Narrative Lab's latest hackathon, an offer that ten students could hardly refuse.

Hardika Dayalani (cohort '18), who was an engineer "in a past career" and had participated in hackathons in college, encouraged two of her "cubicle neighbors" — Noah Johnson ('18) and Pedro Lima ('19) — to join her on a team.

The other two teams consisted of Carlos Calvo Hernandez, David DeSmet, and Keller Scholl (all cohort '19); and Megan Franco ('17), Rouslan Karimov ('15), Omair Khan ('18), and Hillary Reininger ('16).

Dayalani said none of her teammates knew much about machine learning going into the event, but she added that also made it interesting. "One of us had to learn it — one of the briefs was to develop an ML algorithm to get a better understanding of the listings — so I decided to take on that part."

Specifically, Professor Osonde Osoba, the hackathon's organizer, asked participants to "tell a useful, interesting, or illuminating story about vendor behaviors and offerings on cryptomarkets, and use machine learning to develop models to improve the classification of goods or to identify vendor patterns."

Osoba provided participants with two years of data from RAND's Dark Web Observatory, which scrapes listings from eight cryptomarkets and uses some artificial intelligence algorithms to sort the data into 11 categories.

Lima said the data were eye opening: "It was my first contact with the dark web, and I was surprised by how dispersed it is. At one point we looked at how many different sellers there are out there, and they're all over the place — and our data were from only a small fraction of the dark web."

"There are hundreds of thousands of listings," Johnson said. "We had to be able to navigate that and know how to explore data. It was a great learning experience because of the creativity and openness, and because it was quantitatively rigorous."

He also said he appreciated that the guidelines were pretty broad: "Here's some data, here are some clear deliverables we want. Dig around and explore, put together a presentation and make it policy relevant."

Osoba explained that, as with past hackathons, the goal was for the students to "showcase their data-wrangling, data visualization, and data modeling skills while plumbing two years of cryptomarket postings that RAND researchers had scraped from the dark web."

Dayalani said after Johnson did an initial review of the data the team decided to focus on drug sales on the dark web.

"About 60 percent of the categorized listings, by our calculation, were drugs," Johnson said. "There was also jewelry and counterfeit goods, among others, but that didn't really seem as policy relevant."

Dayalani added, "I'd worked on drug-related research once before, but from a service-provider side. I thought it was an interesting problem to explore."

Lima said this was his first hackathon, "and in the process I learned how to muddle through and wrangle with the data. Once we did we were able to get some interesting insights."

It was also the first hackathon in which first-years were invited to participate, but Lima said that didn't pose a problem for him.

"The timing of the hackathon was the same as other classes and OJT projects, but this was something that let me think about new things. It didn't add a burden; it was just a really fun and productive skills-based activity."

One of the key observations that their team made was that a large number of items that weren't properly categorized were in fact drugs. "AI is all about looking for familiar words," Johnson explained, "and there's a lot of slang on the web. Lots of drug postings in the Dark Web Observatory were categorized as 'other' or weren't categorized at all."

Dayalani said Johnson created an initial dictionary of drug terms, and she applied it to the 6% of listings that had been uncategorized, and the 3% that were categorized as "other." The results, she said, were that about 69 percent of the uncategorized items for sale on the crytpomarkets were drugs, as were 34% of the "other" items.

Johnson and Lima both said the data reminded them of "a craigslist for illegal things"; Dayalani compared the listings to eBay. All three agreed it was fascinating to see what was listed and how.

"The vendors are all just trying to sell their products," Dayalani said. "There's a lot of slang, and by far the biggest category was marijuana," which Johnson said accounted for at least half of the drugs for sale.

Dayalani said that, in their final presentation, "Noah made the observation that, if you legalize marijuana, people won't need to go on the dark web to look for it, which could reduce further illegal activity. Because once you go on the dark web, you can find all sorts of other illegal things. It's a slow creep, a slippery slope."

— Monica Hertzman