02 / 25 / 2022

The YouTube Crawler

A variety of social media platforms can be found easily online by anyone of properly legal age with access to the internet. YouTube is such a case where multimodal multimedia of diversified content become viral. In order to study trends and correlated psychological reactions of the crowds, one firstly needs to deploy specialized software called “Crawlers”. In this particular platform, this software should target legally and ethically the communication services provided by the placeholder.

From a deeper technical perspective, this specialized software is actually a category of a unique wrapper. Its name derives from the fact that this digital tool wraps around the freely accessible Application Programming Interfaces (APIs) provided by YouTube. In other words, both the company and interested-on-the-data 3rd parties, each cover half the distance and meet in the middle to enable this data transaction. By engineering APIs, it is implied that YouTube desires to keep a somewhat open approach to the public, either to individuals or companies.

Nevertheless, an open approach does not mean uncontrollable abuse, thus YouTube describes thoroughly both the legal rights of each part and the ethical restrictions someone should oblige to. The ethical perspective of crawling dictates that the interested 3rd party should always try to use the offered APIs and not try to hardcodedly scrape content directly from webpages. On another aspect, the interested 3rd party should always keep in mind not to abuse the company’s resources and create bottlenecks, e.g. internet speed or server socket availability. As for the legal part, YouTube supports many forms of multimedia licenses and, additionally, they have created a unique amalgamation of legal parts called the “standard YouTube license”. Consequently, it may be legal to retrieve metadata from every video inside the platform with the blessings of the company but you are able to reuse actual videos only and strictly specified as with Creative Commons licenses.

CERTH abided by all those rules and the YouTube crawler engineered and developed anew is fully functional at the moment. The logic behind this tool is to enable users of the Memory Center Platform to feed some desired keywords to the crawler through custom APIs. Then, relevant metadata from videos based on keyword-search are retrieved where the license type is always “Creative Commons”. In sequence, a second framework is utilized in order to actually download those videos and along with their metadata which were retrieved earlier, store them inside a local database for future reusability.

In conclusion, the YouTube crawler is a powerful sophisticated tool, addressing requirements from multiple points of view. It enriches righteously the multimedia content pool towards the effort of creating immersive experiences to bring communities and individuals closer and cultivate intimacy.

The Twitter Crawler