Slow and inneficient http/https source parsing?
#1
<moved this from the general support forum, as I'm not sure anyone who would know the answer to my questions hangs out there>
------------------

Hi,

I'm new to the forum, but have been using XBMC for a while (and dabbled in plex as well). I do prefer XBMC due to the multiple platform compatibility, and the maturity when compared to plex. I was also very excited to find http and https source creation capability in XBMC, which is still lacking in plex (plex supports local disk, smb:// and ftp sources only).

I have all my content on a HTTPS server on the Internet. I have basic auth, certs, etc all configured (I have administered a number of apache servers in my lifetime, that was the easy part).

I have XBMC installed on a few machines, and even set it up on relative's computers. All pointed to my HTTPS content on the server, and all content is streamed. Works VERY well, except for one thing - the speed of looking through folders in file mode, or indexing content. A directory with 500 movies takes upwards of 24 hours to index! And going in through file mode into any source/folder with more than about 50 items takes upwards of 5 minutes... it is very frustrating, to say the least.

I started looking into what might be causing this, and the logs on the webserver make me believe that XBMC does the following when it opens a http/https source (the curl libraries being used to do the http/https calls):

1. Do a 'GET' for the top level directory of the configured HTTPS source
2. For each item returned, do a 'HEAD' method call for the content, initiating a new http connection every time!?!

Let's call the two steps above a 'LIST' action ;-)

in addition to the above, when an indexing process is being performed (library update), I see XBMC perform exactly one full cycle of the two steps above, per each movie being indexed!? In other words, it seems to do a 'LIST' as per above to get a listing of all the files/directories in the source folder, then for each one in there - scrape/index and update library, followed by another 'LIST', and so on.

The above seems to be an *extremely* inefficient way to do this, considering I can get a full file/directory listing (even recursive one) of all the content there using a tool like 'wget', in mere seconds, and in a single https connection... For a directory with 500 movie files, that means it will do 501 full 'LIST' actions, each one (including each 'HEAD' call) being made in a brand new https connection... just the time to setup and tear down that many http connections to a server that is any reasonable amount of latency away from the client is a lot of wasted time...

EDIT: after some more log parsing - I think the majority of the un-necessary calls to the web server (all in brand new http connections) are due to XBMC trying to get mimetype for the content...

So, is that behavior by design, is it a bug, is it a side-effect of how the browsing/indexing was initially written for a local or mounted file systems calls, and then was 'ported' to leverage curl for a remote http/https 'filesystem' ?

Can mimetypes be set or 'cached' in some way locally, so XBMC doesn't need to go back to the server every time?

Can someone point me to which part of the code implements this?

cheers,
jordan

debugs from xbmc on the client side (Dharma RC1 running on windows) - this is for ONE item of content. Had to wrap it in php as the I couldn't post it raw due to having '45 smileys' ;-) :

PHP Code:
15:56:30 T:372 M:1945710592   DEBUGCFileCurl::GetMimeType - [url]https://[email protected]:443/media/Movies/Some%20Movie%20(1994).avi[/url] - > video/x-msvideo
15:56:30 T:372 M:1945714688   DEBUGCurl::Debug About to connect() to xbmcserver.org port 443 (#0)
15:56:30 T:372 M:1945714688   DEBUGCurl::Debug   Trying 1.2.3.4...
15:56:30 T:372 M:1945694208   DEBUGCurl::Debug connected
15
:56:30 T:372 M:1945694208   DEBUGCurl::Debug Connected to xbmcserver.org (1.2.3.4port 443 (#0)
15:56:30 T:372 M:1945694208   DEBUGCurl::Debug SSL re-using session ID
15
:56:30 T:372 M:1945694208   DEBUGCurl::Debug SSLv3TLS handshakeClient hello (1):
15:56:30 T:372 M:1945694208   DEBUGCurl::Debug .
15:56:30 T:372 M:1945694208   DEBUGCurl::Debug
15
:56:30 T:372 M:1945694208   DEBUGCurl::Debug SSLv3TLS handshakeServer hello (2):
15:56:30 T:372 M:1945694208   DEBUGCurl::Debug .
15:56:30 T:372 M:1945694208   DEBUGCurl::Debug SSLv3TLS change cipherClient hello (1):
15:56:30 T:372 M:1945694208   DEBUGCurl::Debug .
15:56:30 T:372 M:1945694208   DEBUGCurl::Debug SSLv3TLS handshakeFinished (20):
15:56:30 T:372 M:1945694208   DEBUGCurl::Debug Â¶
15
:56:30 T:372 M:1945694208   DEBUGCurl::Debug SSLv3TLS change cipherClient hello (1):
15:56:30 T:372 M:1945694208   DEBUGCurl::Debug .
15:56:30 T:372 M:1945694208   DEBUGCurl::Debug SSLv3TLS handshakeFinished (20):
15:56:30 T:372 M:1945694208   DEBUGCurl::Debug Â¶
15
:56:30 T:372 M:1945694208   DEBUGCurl::Debug SSL connection using DHE-RSA-AES256-SHA
15
:56:30 T:372 M:1945694208   DEBUGCurl::Debug Server certificate:
15:56:30 T:372 M:1945694208   DEBUGCurl::Debug         subjectC=XXST=YYYL=ZZZO=asdfOU=HomeCN=xbmcserver.orgemailAdd ress=postmaster@xbmcserver.org
15
:56:30 T:372 M:1945694208   DEBUGCurl::Debug         start date2008-11-28 22:56:52 GMT
15
:56:30 T:372 M:1945694208   DEBUGCurl::Debug         expire date2018-11-26 22:56:52 GMT
15
:56:30 T:372 M:1945694208   DEBUGCurl::Debug         issuerC=XXST=YYYL=ZZZO=asdfOU=HomeCN=asdf Root CAemailAddre ss=postmaster@xbmcserver.org
15
:56:30 T:372 M:1945694208   DEBUGCurl::Debug         SSL certificate verify resultunable to get local issuer certificate (20), continuing anyway.
15:56:30 T:372 M:1945694208   DEBUGCurl::Debug Server auth using Basic with user 'username'
15:56:30 T:372 M:1945694208   DEBUGCurl::Debug HEAD /media/Movies/Some%20Other%20Movie%20(1984).avi HTTP/1.1
15
:56:30 T:372 M:1945694208   DEBUGCurl::Debug AuthorizationBasic asdfiowekasdldfs
15
:56:30 T:372 M:1945694208   DEBUGCurl::Debug User-AgentXBMC/10.0-rc1 r35325 (WindowsWindows 764-bit (WoWbuild 7600; [url]http://www.xbmc.org[/url])
15:56:30 T:372 M:1945694208   DEBUGCurl::Debug Hostxbmcserver.org:443
15
:56:30 T:372 M:1945694208   DEBUGCurl::Debug Accept: */*
15:56:30 T:372 M:1945694208   DEBUG: Curl::Debug Referer:
15:56:30 T:372 M:1945694208   DEBUG: Curl::Debug Connection: keep-alive
15:56:31 T:372 M:1948655616   DEBUG: Curl::Debug HTTP/1.1 200 OK
15:56:31 T:372 M:1948655616   DEBUG: Curl::Debug Date: Fri, 19 Nov 2010 22:56:40 GMT
15:56:31 T:372 M:1948655616   DEBUG: Curl::Debug Server: Apache
15:56:31 T:372 M:1948655616   DEBUG: Curl::Debug Last-Modified: Mon, 15 Nov 2010 01:25:20 GMT
15:56:31 T:372 M:1948655616   DEBUG: Curl::Debug ETag: "20068002-77328000-4950d503b0000"
15:56:31 T:372 M:1948655616   DEBUG: Curl::Debug Accept-Ranges: bytes
15:56:31 T:372 M:1948655616   DEBUG: Curl::Debug Content-Length: 1999798272
15:56:31 T:372 M:1948655616   DEBUG: Curl::Debug Connection: close
15:56:31 T:372 M:1948655616   DEBUG: Curl::Debug Content-Type: video/x-msvideo
15:56:31 T:372 M:1948655616   DEBUG: Curl::Debug Closing connection #0
15:56:31 T:372 M:1948655616   DEBUG: Curl::Debug SSLv3, TLS alert, Client hello (1):
15:56:31 T:372 M:1948655616   DEBUG: Curl::Debug . 

a little less verbosity on the client side:

13:14:02 T:3636 M:2148401152 DEBUG: CFileCurl::GetMimeType - https://someserver.org:443/video/Movies/...2009).m2ts -> text/plain; charset=UTF-8
13:14:03 T:3636 M:2144280576 DEBUG: CFileCurl::GetMimeType - https://someserver.org:443/video/Movies/...(1999).avi -> video/x-msvideo
13:14:04 T:3636 M:2146635776 DEBUG: CFileCurl::GetMimeType - https://someserver.org:443/video/Movies/...(2007).avi -> video/x-msvideo
13:14:04 T:3636 M:2158243840 DEBUG: CFileCurl::GetMimeType - https://someserver.org:443/video/Movies/...(2008).avi -> video/x-msvideo

apache logs on the server side - this is the beginning of a library update run. It repeats fully for every item in a listing:

1.2.3.4 - username [19/Nov/2010:22:56:26 +0000] "GET /videos/Movies/ HTTP/1.1" 200 72530
1.2.3.4 - username [19/Nov/2010:22:56:29 +0000] "HEAD /videos/Movies/8%20Mile%20(2002).mp4 HTTP/1.1" 200 -
1.2.3.4 - username [19/Nov/2010:22:56:30 +0000] "HEAD /videos/Movies/12%20Angry%20Men%20(1957).avi HTTP/1.1" 200 -
1.2.3.4 - username [19/Nov/2010:22:56:31 +0000] "HEAD /videos/Movies/12%20Monkeys%20(1995).avi HTTP/1.1" 200 -
1.2.3.4 - username [19/Nov/2010:22:56:31 +0000] "HEAD /videos/Movies/36%20Hours%20(1965).avi HTTP/1.1" 200 -
1.2.3.4 - username [19/Nov/2010:22:56:32 +0000] "HEAD /videos/Movies/50%20First%20Dates%20(2004).avi HTTP/1.1" 200 -
1.2.3.4 - username [19/Nov/2010:22:56:33 +0000] "HEAD /videos/Movies/A%20Bright%20Shining%20Lie%20(1998).avi HTTP/1.1" 200 -
1.2.3.4 - username [19/Nov/2010:22:56:34 +0000] "HEAD /videos/Movies/A%20Civil%20Action%20(1998).avi HTTP/1.1" 200 -
1.2.3.4 - username [19/Nov/2010:22:56:35 +0000] "HEAD /videos/Movies/A%20Clockwork%20Orange%20(1971).mp4 HTTP/1.1" 200 -
1.2.3.4 - username [19/Nov/2010:22:56:36 +0000] "HEAD /videos/Movies/A%20Few%20Good%20Men%20(1992).avi HTTP/1.1" 200 -
1.2.3.4 - username [19/Nov/2010:22:56:37 +0000] "HEAD /videos/Movies/A%20History%20of%20Violence%20(2005).avi HTTP/1.1" 200 -
1.2.3.4 - username [19/Nov/2010:22:56:37 +0000] "HEAD /videos/Movies/A%20Serious%20Man%20(2009).avi HTTP/1.1" 200 -
1.2.3.4 - username [19/Nov/2010:22:56:38 +0000] "HEAD /videos/Movies/Affliction%20(1997).avi HTTP/1.1" 200 -
1.2.3.4 - username [19/Nov/2010:22:56:39 +0000] "HEAD /videos/Movies/Against%20the%20Wall%20(1994).avi HTTP/1.1" 200 -
1.2.3.4 - username [19/Nov/2010:22:56:40 +0000] "HEAD /videos/Movies/Amadeus%20(1984).avi HTTP/1.1" 200 -
Reply
#2
The normal GetDirectory call will be doing the get + heads as you say (mimetype fetching). It's likely that each additional one during library update is done due to it looking for thumbs and fanart etc. in the folder. I'm pretty sure that fanart and thumb fetching is using a cached directory listing, but perhaps something else doesn't. Your best bet is to step through the code in VideoInfoScanner.cpp while scanning and seeing which bit is hitting the underlying filesystem rather than the cache.

One simple "fix" would be to up the default caching behaviour - see how lastfm etc. does it in the LastFMDirectory.cpp class. Basically you set the default caching behaviour to CACHE_ALWAYS and clear the cache when you hit the root. You'll still get slow browsing the first time you access the share, but after that it will be fast.

Then you can work on speeding up the initial fetch itself.
Always read the XBMC online-manual, FAQ and search the forum before posting.
Do not e-mail XBMC-Team members directly asking for support. Read/follow the forum rules.
For troubleshooting and bug reporting please make sure you read this first.


Image
Reply
#3
15:56:31 T:372 M:1948655616 DEBUG: Curl:Big Grinebug Connection: close

Your server is responding that we should close the connection after the HEAD request so we do. I'm not even sure encrypted http sessions can maintain the session.

Normally we would maintain the session for all the head requests.
Always read the XBMC online-manual, FAQ and search the forum before posting.
Do not e-mail XBMC-Team members directly asking for support. Read/follow the forum rules.
For troubleshooting and bug reporting please make sure you read this first.


Image
Reply
#4
Oh and you should have a look at webdav support in xbmc instead, it should be alot better for your use case. Set it to browse using davs:// instead of https://.
Always read the XBMC online-manual, FAQ and search the forum before posting.
Do not e-mail XBMC-Team members directly asking for support. Read/follow the forum rules.
For troubleshooting and bug reporting please make sure you read this first.


Image
Reply
#5
Look for keepalive options for your server.
Always read the XBMC online-manual, FAQ and search the forum before posting.
Do not e-mail XBMC-Team members directly asking for support. Read/follow the forum rules.
For troubleshooting and bug reporting please make sure you read this first.


Image
Reply
#6
elupus Wrote:Look for keepalive options for your server.

Well, I'll be damned. Shocked I should have checked keepalives, but I ALWAYS set that to On, so it didn't even cross my mind. Just checked, and it was set to Off. Not sure if an update did it under me, or what... I bumped up my MaxKeepAliveRequests while I was there too...

Tested, and now at least all the reads from the server are done in one connection, so it is noticeably faster! Takes about a minute to list the dir (500 entries), so assuming scraper lookups are relatively quick - i'm looking at about 10 hours to index it - fits in a night, so I'm much happier.

However, it still does a full directory dump after indexing every movie, so I think I'll investigate the caching options as suggested by jmarshall next, as well as WebDav. I have setup webdav in the past, didn't seem to work 'great', over a latent connection, but I think the client I was using was probably mostly to blame...

Thanks for the suggestions!
Reply

Logout Mark Read Team Forum Stats Members Help
Slow and inneficient http/https source parsing?0