Guess it is time to write my first post using Hugo. Yesterday I downloaded a torrent consisting of 2 years worth of 4chan posts, the plan was to mess with it and use the data to train a chatbot. Dealing with big datasets is always fun because even the easiest tasks tend to get complicated, for example extracting the data from a ~3 GB tar.gz compressed archive was a challenge by itself. …