6 Degrees of Urban Dictionary
- 2 Devlogs
- 9 Total hours
6 Degrees of Wikipedia but for.. well.. Urban Dictionary
6 Degrees of Wikipedia but for.. well.. Urban Dictionary
It has been nearly a week since my first devlog and a few (three) major changes have happened.
First change, I realized I am a stupid dumb idiot. The code I had for finding the path was incredibly inefficient.
Second change, I then realized, again, I am a stupid dumb idiot, the code I had was only finding one path for every set of terms.
And the third major change, I now have an API in place.
When I made the code for the BFS (breadth-first search) I literally just copy-pasted the code from the crawler and changed a few lines. This, was terrible at being BFS.
Apart from a few little bugs that I ironed out early, the model for crawling sucked at BFS. Essentially, the crawler crawled a link, add the hyperlinks in there to the queue and then grabbed the newest term from the queue. However, because there were 24 different threads all running the same code there were tons of locks which were put in place to prevent two threads from processing the same term and therefore wasting time and resources. This however, is not the best for BFS, because the data already exists locally in a database, each term lookup is pretty much instantaneous and therefore having several threads all doing it at once and waiting for each other (due to the locks) is incredibly inefficient. This setup was replaced by a far simpler setup that just does all the things in a loop.
Due to the way I wrote the code, the BFS search was only finding one route for a given set of terms, this was a simple fix and now it works fine. (This change wasn’t major in itself but it made a big difference)
If I was going to incorporate this with a frontend I obviously needed an API. So, that’s what I made.
This is an incredibly simple API made using the FastAPI framework. I didn’t really need to do much for this. I did a bit of restructuring in main.py to help facilitate the new API and then it was as simple as adding a call which needs two inputs, sending it off to main and then returning the result. (As well as adding Prometheus for future analytics)
request:
GET /paths?source=toby&target=world+peace
response:
[
[
"toby",
"hard",
"damn",
"emerald",
"world peace"
],
[
"toby",
"t",
"subway",
"windsor",
"world peace"
],
etc..
]
I still need to make a lot of stuff, most notable ALL of the frontend. I also spent a LOT of time just messing around and seeing different paths to things instead of doing actual coding. (Dog attached)
Hi there, I’m Toby, and this is the first devlog for my new project titled “6 Degrees of Urban Dictionary” (6DoUD for short).
The title is technically WIP but it’s pretty much here to stay.
This project is based off of (in concept, not code) Jacob Wenger’s 6 Degrees of Wikipedia. If you are unfamiliar with the concept, you choose a starting Wikipedia article and a target article, and the website finds the shortest path between them by following only hyperlinks between pages.
It’s a fairly simple concept but super cool in practice and super fun to mess around with. This got me thinking, what if there was something like 6DoW, but for another website?
That ‘other website’ turned out to be Urban Dictionary.
If you’ve never heard of it, Urban Dictionary is a website where users can upload their own definitions for words and phrases. One interesting feature of Urban Dictionary is that, similar to Wikipedia articles, the definitions have hyperlinks that link to other definitions.
This essentially creates a massive interlocking web of definitions similar to the Wikipedia articles.
This eventually turned into the basis of 6 Degrees of Urban Dictionary. Find the shortest path between two definitions by only using the hyperlinks on the definitions in between.
Despite being early in development, the core functionality is already up and running.
At its very core 6DoUD treats Urban Dictionary as a graph. Each term is represented as a node, with the hyperlinks on these definitions connecting these nodes. This allows the website to essentially be modeled as a large network of interconnecting terms. (Of course this graph isn’t actually visualized, my computer would die..)
To build this network I made a crawler which uses the Urban Dictionary API to retrieve definitions and examples for a given term. Using regular expressions, the crawler extracts all linked terms and stores the relationships inside of a SQLite database.
As the crawler discovers new terms, it continues exploring them recursively, gradually building a larger graph of connected entries. To improve performance the crawler runs across multiple threads, allowing many terms to be processed simultaneously.
Once enough data has been collected, the second part of the project takes over: finding the shortest path between two terms. This is done using Breadth-First Search (BFS), a graph traversal algorithm that guarantees the shortest route is found.
At this stage, the core concept is fully functional. Given two terms, the system can search through the collected graph and determine the shortest sequence of links connecting them.
While the backend functionality is largely complete, there is still a significant amount of work before the project is ready for end users.
Currently, the project is designed primarily for testing and development. The next step is creating an interface that makes the system easy and intuitive to use.
I plan to build a web frontend inspired by 6 Degrees of Wikipedia. Users will be able to enter a starting term and a destination term without needing to interact directly with the underlying Python code.
One feature I would like to implement is a visual graph display. Rather than presenting users with a simple list of terms, the application will display the discovered
route as an interactive network of connected nodes.
To connect the frontend with the backend, I will also develop an API which exposes the pathfinding functionality. This will allow the frontend and backend to remain separate whilst still communicating efficiently.
Thanks for taking the time to read through my devlog, attached is a fairly relevant xkcd comic.