Req Scanning to/updating library - performance issues and a suggestion to improve
#1
tl;dr
I'm guessing my problem must be all the old episodes/documentaries I have that that cannot be scanned/resolved by TheTVDB and those are being searched for before all the most recently added ep's and shows?


I've been using mySQL (running on a 32GB i7 ZFS based server over a Gbps lan) for a few years now with a variety of Kodi endpoints and I've noticed that scanning my TV share (and Movie share to a lesser extent) for new content often takes more than 30 minutes to complete.  There doesn't seem to be a default sort order for scanning directories/folders (I've 'tried' looking through the source on github - so I could be wrong), since most users are interested in the content they added most recently wouldn't it make sense to scan by date modified/created in descending order so that the scan will identify the most recently changed folders first and populate those in the Recently Added view/widget that most people now use?  This at least gives the perception of a quick scan.

I can see in xbmc/video/VideoInfoScanner.h the reference to using a hash to see if a directory has been modified - so using those hashes along with the descending time-stamps should make scanning very quick.  I see in the video107.path table that the majority of my folders (3397/3740) have hash values, are the hashes still being used in this way?

Code:
   static int GetPathHash(const CFileItemList &items, std::string &hash);

   /*! \brief Retrieve a "fast" hash of the given directory (if available)
    Performs a stat() on the directory, and uses modified time to create a "fast"
    hash of the folder. If no modified time is available, the create time is used,
    and if neither are available, an empty hash is returned.
    In case exclude from scan expressions are present, the string array will be appended
    to the md5 hash to ensure we're doing a re-scan whenever the user modifies those.
    \param directory folder to hash
    \param excludes string array of exclude expressions
    \return the md5 hash of the folder"
I'm guessing my problem must be all the old episodes/documentaries I have that that cannot be scanned/resolved by TheTVDB and those are being searched for before all the most recently added ep's and shows?
Is there an easy to way to change this - I've never been a C++ dev, but I'm happy to take a look and try to fix/improve this given a few pointers on where the actual directory search takes place?

EDIT: I just did a scan for new content (with debug log enabled) and it took about 90 mins to scan my TV Shows. AFTER this I did another scan that took about 10 mins and quickly noticed in the debug log that many items were skipped this time
Code:
DEBUG: VideoInfoScanner: Skipping dir 'smb://chipper/tank_television/him and her/Him.and.Her.S03E01.WS.PDTV.XviD/' due to no change

So this means that one of my 6 or so kodi devices is generating an invalid checksum that is then causing other boxes to rescan all the directories again. I'm sure they all have the exact same sources.xml and password.xml files, so I wonder if running Update Library on one of my boxes (as opposed to scan for new content) could be causing the directory checksums to change? Unless anyone knows if LibreElec, ShieldTV, AFTV, Win10 and Unbutu see a different directory when scanning across SMB onto a ZFS hosted share, i.e. hidden files, last accessed dates, etc?



Cheers
D.
Reply
#2
Hello @Anastrophe

If you have that debug log, I would be interested to see it, so I can discover what is holding up the scan. You don;t mention how many titles you have so that can also be a factor, along with your OS, Kodi version and Scraper versions- Which can all be found in the log.

I doubt you will get those suggested changes, but it is possible to scrape in various ways.

1. Run the Update LIbrary that will scan for new content for all sources. (I am guessing this the method you use)
2. Scrape an individual title only (broken in v17.4, update to v17.5 required)
3. Scrape a particular source only

I assume you are not aware of all 3 Scraping methods.

Here is a wiki page that describes these functions for Music. Refer to Section 2. You can translate these procedures for Videos also. (I haven't got around to creating the Videos page yet)
http://kodi.wiki/view/Update_Music_Library#Scan_Library

Does this help resolve some of your concerns?
My Signature
Links to : Official:Forum rules (wiki) | Official:Forum rules/Banned add-ons (wiki) | Debug Log (wiki)
Links to : HOW-TO:Create Music Library (wiki) | HOW-TO:Create_Video_Library (wiki)  ||  Artwork (wiki) | Basic controls (wiki) | Import-export library (wiki) | Movie sets (wiki) | Movie universe (wiki) | NFO files (wiki) | Quick start guide (wiki)
Reply
#3
Hi Karrellen,

Thanks for the reply; I am aware of the various scanning modes - I often initiate a scan through the update library option on the IOS/Android remote app when I'm not in a rush to see the Latest Added, however when I am in a rush I often start a scan from the root of my TV Show share instead of navigating on a dir by dir basis.  I have about 14,000 videos in 3700 directories.

The debug log of my previous scan (where all the checksums needed to be regenerated) was 20Mb, compared to 1.8Mb when I ran the scan again on the same machine.  I'll see if I can track down the what's invalidating the MD5 checksums first before publishing the logs .
Reply

Logout Mark Read Team Forum Stats Members Help
Scanning to/updating library - performance issues and a suggestion to improve0