Javascript is required
#1
Reformulating the topic not to violate the rules of the forum:


How to get html source code on sites that require Javascript without using Selenium WebDriver? Is there any other way?

Skipmode A1 Wrote:You can run javascript with nodejs: see here https://github.com/Anorov/cloudflare-scrape

This however requires nodejs present on the computer you running your kodi-add on.


I use Windows, and I installed cfscrape in python 2.7:
Code:
python -m pip install -U pip cfscrape

And installed Node.JS
Code:
https://nodejs.org/dist/v7.9.0/node-v7.9.0-x86.msi

I'm trying to get the source code of the site where javascript is required, eg sites hosted on Cloudflare
first test in Python IDLE (Windows):
Code:
import cfscrape

scraper = cfscrape.create_scraper()
print scraper.get('http://somesite.com.Cloudflare-anti-bot').content

But it still did not open Javascript. Could you give me an example of how to use cfscrape with nodejs?

Thanks
Reply
#2
Here some code that seemed to work when i was messing with it a while back:

Code:
# Make a session
        sess = requests.session()

        # Set cookies for cookie-firewall and nsfw-switch
        if SETTINGS.getSetting('nsfw') == 'true':
            cookies = {"Cookie": "cpc=10", "nsfw": "1"}
        else:
            cookies = {"Cookie": "cpc=10"}

        # Determine if cloudflare protection is active or not
        html_source = sess.get(self.video_list_page_url, cookies=cookies).text
        if str(html_source).find("cloudflare") >= 0:
            cloudflare_active = True
        else:
            cloudflare_active = False

        # Get the page
        if cloudflare_active == True:
            try:
                import cfscrape
            except:
                xbmcgui.Dialog().ok(LANGUAGE(30000), LANGUAGE(30513))
                sys.exit(1)
            try:
                # returns a CloudflareScraper instance
                scraper = cfscrape.create_scraper(sess)
            except:
                xbmcgui.Dialog().ok(LANGUAGE(30000), LANGUAGE(30514))
                sys.exit(1)
            try:
                html_source = scraper.get(self.video_list_page_url).content
            except:
                xbmcgui.Dialog().ok(LANGUAGE(30000), LANGUAGE(30515))
                sys.exit(1)

        # Parse response
        soup = BeautifulSoup(html_source)
Reply
#3
any update on this, Thanks
Reply
#4
How fast this method is?
I've tried something like this, but it's slow. After 25-30 seconds I get the HTML content.
(Cloudfare protection)

Code:
import cfscrape
scraper = cfscrape.create_scraper()
print scraper.get("http://somesite.com").content
Reply

Logout Mark Read Team Forum Stats Members Help
Javascript is required0