Guess it is time to write my first post using Hugo. Yesterday I downloaded a torrent consisting of 2 years worth of 4chan posts, the plan was to mess with it and use the data to train a chatbot. Dealing with big datasets is always fun because even the easiest tasks tend to get complicated, for example extracting the data from a ~3 GB tar.gz compressed archive was a challenge by itself. Running “tar -xzvf archive.tar.gz” resulted in TAR/the Linux kernel eating the whole available memory to use it as cache, when that was down to ~200 MB of free RAM my workstation started lagging so hard that even Xorg was freezing for a couple of seconds every 20 or so seconds. To solve the issue what I did was running the following commands: …