After invesitigation, I found the feed respone content is not conform to the standard, it contains a new line at the start of the feed, and has javascript block at the end of the response.
When doing further investigation with Liferea, I found that it supports a very useful feature, a conversion filter. After try and error, I formuate a sed filter commands that could remove all those invalid entries in the source:
sed -e '1d' -e 's/<\/rss>.*$/<\/rss>/' -e '/<\/rss>/q'The first script removes the first line, the 2nd script remvoes all characters after </rss> in the same line, the 3rd scripte stops when encounter </rss>, effectively remove all contents after the matching line.
No comments:
Post a Comment