2014-05-21, 10:30
I am using a Fritzbox! 7390 with attached USB storage as a poor man's NAS to serve video content to my LAN. The NAS throughput is around 3,5 Mbit/s, so it's pretty slow but sufficient to stream most everything. However, scraping of new content like weekly updated TV shows is very slow, especially over Wifi. That's why I wanted to investigate about the current method of scraping. I'm not a programmer and I have no idea how scraping is currently implemented but it seems to me there's room for improvement.
So, my idea was if it could be accelerated with the use of checksums. My folder tree structure is pretty big, so XBMC has to check a lot of directories for changes. What if for any given directory, a checksum would be stored which is calculated from its contents and the scraper then only checks the contents of a folder itself if the checksum has changed. Obviously, larger trees benefit more from this than smaller trees.
Example:
I have to main folders, MOVIES and TV. The movie folder is not updated that often, so if its checksum remains the same, the scraper would ignore it. Under MOVIES, I have folders according to genre, like SF, HORROR, COMEDY, etc. Same principle would apply if only the contents of one folder change. Alternatively one could make folders such as A-E, F-H, etc. to benefit from the idea.
In my TV folder, there are like 30 TV shows. Again, reading only the checksums should be lightning fast.
The checksum values could be stored as a file in each folder for portability or in the XBMC roaming folder - this could be a preference setting.
So, my idea was if it could be accelerated with the use of checksums. My folder tree structure is pretty big, so XBMC has to check a lot of directories for changes. What if for any given directory, a checksum would be stored which is calculated from its contents and the scraper then only checks the contents of a folder itself if the checksum has changed. Obviously, larger trees benefit more from this than smaller trees.
Example:
I have to main folders, MOVIES and TV. The movie folder is not updated that often, so if its checksum remains the same, the scraper would ignore it. Under MOVIES, I have folders according to genre, like SF, HORROR, COMEDY, etc. Same principle would apply if only the contents of one folder change. Alternatively one could make folders such as A-E, F-H, etc. to benefit from the idea.
In my TV folder, there are like 30 TV shows. Again, reading only the checksums should be lightning fast.
The checksum values could be stored as a file in each folder for portability or in the XBMC roaming folder - this could be a preference setting.