Youtube RSS processing
As I mentioned in the RSS feeds post it is totally possible to get an RSS feed from a youtube channel. The problem is that sometimes the same channel posts videos on different topics & you only want to know about one of them (or more, the processing principles are the same).
For the sake of a target let's pick the Numberphile channel. Go to the main channel page & press F12, then search for channelid & you'll find that it's "UCoxcjq-8xIDTYp3uz647V5A". We'll need that in a minute.
Open Huginn & create a new agent (type is "RSS agent"). Call it what you like (I used "Numberphile" & edit the json to
{
"expected_update_period_in_days": "1",
"clean": "false",
"url": "https://www.youtube.com/feeds/videos.xml?channel_id=UCoxcjq-8xIDTYp3uz647V5A"
}
The problem is, if you do a dry run with the json above is that you get an output like this...
[
{
"id": "yt:video:p-HN_ICaCyM",
"url": "https://www.youtube.com/watch?v=p-HN_ICaCyM",
"urls": [
"https://www.youtube.com/watch?v=p-HN_ICaCyM"
],
"links": [
{
"href": "https://www.youtube.com/watch?v=p-HN_ICaCyM",
"rel": "alternate"
}
],
"title": "The Troublemaker Number - Numberphile",
"description": null,
"content": "Dr Harini Desiraju discusses Somos Sequences and a number which breaks a streak.\nMore links & stuff in full description below ↓↓↓\n\nDr Harini Desiraju is a postdoctoral fellow at The University of Sydney. This video was recorded at MSRI.\n\nLike sequences - see these videos with Neil Sloane: http://bit.ly/Sloane_Numberphile\n\nNumberphile is supported by the Mathematical Sciences Research Institute (MSRI): http://bit.ly/MSRINumberphile\n\nWe are also supported by Science Sandbox, a Simons Foundation initiative dedicated to engaging everyone with the process of science. https://www.simonsfoundation.org/outreach/science-sandbox/\n\nAnd support from The Akamai Foundation - dedicated to encouraging the next generation of technology innovators and equitable access to STEM education - https://www.akamai.com/company/corporate-responsibility/akamai-foundation\n\nNUMBERPHILE\nWebsite: http://www.numberphile.com/\nNumberphile on Facebook: http://www.facebook.com/numberphile\nNumberphile tweets: https://twitter.com/numberphile\nSubscribe: http://bit.ly/Numberphile_Sub\n\nVideos by Brady Haran\n\nPatreon: http://www.patreon.com/numberphile\n\nNumberphile T-Shirts and Merch: https://teespring.com/stores/numberphile\n\nBrady's videos subreddit: http://www.reddit.com/r/BradyHaran/\n\nBrady's latest videos across all channels: http://www.bradyharanblog.com/\n\nSign up for (occasional) emails: http://eepurl.com/YdjL9",
"image": null,
"enclosure": null,
"authors": [
"Numberphile (https://www.youtube.com/channel/UCoxcjq-8xIDTYp3uz647V5A)"
],
"categories": [
],
"date_published": "2022-05-23T14:13:57+00:00",
"last_updated": "2022-05-29T03:55:28+00:00"
},
All that I want in my RSS reader is the video URL, the title & the first line of the description - which is actually in the "content" tag & only goes as far as the first "\n" in the string.
So, we need to process things a bit...
Huginn uses a "trigger" agent to filter events generated by an agent that feeds events to it. So, create one of those & set the "source" to the agent that we just created. For the sake of adding a filter, let's say that we are only interested in videos about prime numbers - so we'll filter those out in order to pass just those ones on to the next step using this json for the trigger agent.
{
"expected_receive_period_in_days": "1",
"keep_event": "true",
"rules": [
{
"type": "regex",
"value": ".*Prime.*",
"path": "title"
}
],
"message": "Looks like your pattern matched in '{{value}}'!"
}
In the output snippet above you can see that the title is "The Troublemaker Number - Numberphile", so that wouldn't match. (Incidentally the regex describes "match Prime, with any number of characters (.*), including no characters, before or after the string 'Prime'")
Now we need a data output agent to produce the RSS feed for us. Create one & tell it to recieve data from the trigger agent. Edit the json to something like this
{
"secrets": [
"Prime Numbers Only"
],
"expected_receive_period_in_days": 2,
"template": {
"title": "Youtube channel Feed",
"description": "Simple notification of new youtube content",
"item": {
"title": "{{title}}",
"description": "{{content | split: '\\n' | first }}",
"link": "{{url}}",
"Last Updated": "{{last_updated}}"
}
},
"ns_media": "true"
}
SO, our RSS output copies the title of the video & the link that plays it. However, it reads the content part of the original feed, breaks it into chunks using the newline character ('\n') & then returns the first chunk only. Note that the split command needs ' characters to identify the string that we use for the split & that the "\" character needs to be escaped with another "\" so that it gets seen.
You can, of course, mess with the filtering & content of your RSS output to suit your needs. Mine are pretty simple. :)