Req Ember bad scraping choice when picking same title shows + feature request to fix this
#1
Hi
I notice ember has a bad record when scraping a same title show.

e.g. If I got a show with a folder name "titans", 
https://www.thetvdb.com/series/Titans-2018/seasons/1

when ember scrape the show, it ended up picking an older less relevant / likely to be seen show "titans" aired in year 2000 instead of the one that is airing in 2018
https://www.thetvdb.com/series/titans

The choice algorithm should be changed so that it chooses/prioritize
-the more current, most recent show (which is more likely) tv show that we are watching rather than one that was airing in 2000.

Also if there was a setting where we can set the algorithm to choose
-the most recent show or the older show based on our own preference would be good.
-most popular likely choice etc
-prioritize an english or from a certain country (e.g. USA) over another country etc.

I am using the scrap button "custom scrape function"
Scrape type - All - Automatic (Force Best Match)
Modifier type - All items

I am finding that it keeps picking an older show when the same title exists
or choosing the show from a foreign country over USA etc..
Reply
#2
It's a good idea. Actually I use the search results list as it comes from the provider. I think IMDb use some sort of "pre-sorting" based on popularity, TVDb and TMDb probably based on year.
Please create a feature request on the tracker, otherwise I forget it: Link
Reply
#3
(2019-05-20, 11:34)DanCooper Wrote: It's a good idea. Actually I use the search results list as it comes from the provider. I think IMDb use some sort of "pre-sorting" based on popularity, TVDb and TMDb probably based on year.
Please create a feature request on the tracker, otherwise I forget it: Link
Thanks.

Yeah I am using only the TVDB scraper.
If I enable all the other scrapers, would it slowdown the scraping process?
I only use the TVDB since they are TV show focus database site.

I can enable the IMDB+TVDB scraper if it helps with this problem.

I have created the feature request as suggested

http://embermediamanager.thebuggenie.com...es/EMM-157
Reply
#4
Yes, it will slow down the process. To get all information from TMDb I have to run one API call per episode. TMDb has a limit of 30 API calls in 10 seconds, than the scraper has to wait 30 seconds. If you scrape a whole tv show with many episodes it needs ages to get done. IMDb has no API at all and I've to parse each season webpage to get the episode URLs, than I've to parse each episode page to get all episode information. It also tooks ages to get done.

Also a big issue with multiple tv show scrapers is that not all use the same episode handling. For Spongebob TVDb use double/trible episodes, IMDb has an episode number for each one. Or TVDb use the year as season number for American Pickers, TMDb use normal season numbers. It works for many famous tv shows but there are also many exceptions.
Reply
#5
(2019-05-21, 11:30)DanCooper Wrote: Yes, it will slow down the process. To get all information from TMDb I have to run one API call per episode. TMDb has a limit of 30 API calls in 10 seconds, than the scraper has to wait 30 seconds. If you scrape a whole tv show with many episodes it needs ages to get done. IMDb has no API at all and I've to parse each season webpage to get the episode URLs, than I've to parse each episode page to get all episode information. It also tooks ages to get done.

Also a big issue with multiple tv show scrapers is that not all use the same episode handling. For Spongebob TVDb use double/trible episodes, IMDb has an episode number for each one. Or TVDb use the year as season number for American Pickers, TMDb use normal season numbers. It works for many famous tv shows but there are also many exceptions.

Yeah I also noticed sometimes episodes naming or order would be different on IMDB compared TVDB...
That is why I only chose the TVDB scraper for speeding the scraping and consistent episode order.

So is the feature request I created, is clear enough and possible to be implemented for the TVDB or other scrapers that you got in Ember?

If it can fix this problem of ember consistently scraping older less relevant TV shows that would be good.
Reply
#6
(2019-05-21, 11:30)DanCooper Wrote: Yes, it will slow down the process. To get all information from TMDb I have to run one API call per episode. TMDb has a limit of 30 API calls in 10 seconds, than the scraper has to wait 30 seconds. If you scrape a whole tv show with many episodes it needs ages to get done. IMDb has no API at all and I've to parse each season webpage to get the episode URLs, than I've to parse each episode page to get all episode information. It also tooks ages to get done.

Also a big issue with multiple tv show scrapers is that not all use the same episode handling. For Spongebob TVDb use double/trible episodes, IMDb has an episode number for each one. Or TVDb use the year as season number for American Pickers, TMDb use normal season numbers. It works for many famous tv shows but there are also many exceptions.

Hey I have been trying to fix the title in my feature request (http://embermediamanager.thebuggenie.com...es/EMM-157) to something like
"Setting to improve choice algorithm for better choice matching same title"

But for some reason after a while it reverts back to the title "Better" (which was a typo)
I also tried to clean up/format the content of the text but it reverts back to the old text.

Why is this problem occurring and can you delete or fix it so I do the feature request again?
Is this a bug or just cos I don't have rights to edit the title or content once it has been submitted on thebuggenie?

====
update: okay never mind

I need to click save changes to keep the edits
Reply
#7
bump

Can you assign someone to the bugtracker issue and also put a "Targetted for" ?
It was created a while ago and would be good feature to improve the scraping result


http://embermediamanager.thebuggenie.com...es/EMM-157
Reply
#8
The problem of ember scraper result still prioritizing/choosing an older or foreign shows is still occurring in Ember version 1.92.

Just tried scraping some shows e.g. "Normal People" , 

Normal People (TV Mini-Series 2020) ; https://www.imdb.com/title/tt9059760/

It got scrape as
"Normal People (2001) / aka (Os Normais)",  https://www.imdb.com/title/tt0287254/?ref_=fn_al_tt_1

As you can see, the scraper is still choosing an older or foreign show over a newer show.

I think people who are are scraping for a show call "Normal people" is more likely watching the newer show.

There are many more examples of same title shows getting these bad scraping results
Reply

Logout Mark Read Team Forum Stats Members Help
Ember bad scraping choice when picking same title shows + feature request to fix this0