Script for Better scraping, File name, and Organization
#1
I rewrote this script because a few people asked me about it (and google doesn't allow searches in the url anymore).
I used it on my personal system.
MegaUpload

I used to have all my movies in one folder. XBMC would scrape it and get it wrong more times then I would have liked. I found Google was pretty good and getting it right from the file names. So it uses google to find the IMDb Link and put it in an NFO file. It will also rename the file to what IMDb uses as their name and put it in a folder with the same name. A format I like MovieName (MovieYear).
You will need to set what kind of files you want it too look for. It's currently .m4v's because that's all I have.
This uses Lynx so it will need to be installed.
There is a delay at the end of about 10+ seconds between searches just because I think it makes it look less like an automated attack on google or IMDb. All this can be changed.

Code:
clear;
IFS=$'\n'

#Script written by PyRo1509 on XBMC forum
#This script uses Lynx browser
#It will read your file names and search them with google.
#It'll take the First IMDb link it finds. If your file names are already pretty good then you should have a Problem. Just make sure that it's the first IMDb result.
#It'll create a folder by that name IMDb has (ex: The Terminator (1983))
#Then it'll create an nfo in that folder with the IMDb link and move the Movie into that folder swell.
#It was created for Mac but can run on Linux (Maybe windows)
#--To Run on a Mac or Linux
#Make sure Lynx is installed
#Place the script in the Movie Folder and then run it via terminal
#Example: bash OrganizeAndRename.sh
#--To Run on Windows
#Easiest way to get this script working on windows is:
#Download and Install VirtualBox (it's free)
#Download a Live CD of Ubuntu (It's free)
#Start up the Virtual Machine with the Ubuntu CD (It's free!)
#Choose "try Ubuntu", it'll start a temp linux
#Go into terminal and type "sudo apt-get install lynx"
#It should then install lynx, you're almost done
#Place the script in the movie folder
#go to your movie folder in the terminal and run the script
#Example: bash OrganizeAndRename.sh

GetMovieNameFromIMDB(){
    #Use lynx to dump the page source of the IMDb page.
    #From there we will extract the Movie name and year from the page title
    #EX "The Terminator (1984)"
    lynx -accept_all_cookies -force_html -source -dump $1 > NewMovieFileNameTemp.PyRo

    #Scan Source for Title Brackets
    Title=`grep "<title>" -m 1 NewMovieFileNameTemp.PyRo`;
    
    #Erase Everything Else
    Title=${Title% - IMDb</title>*};
    Title=${Title%</title>*};
    Title=${Title#*<title>IMDb - };
    Title=${Title#*<title>};
    Title=${Title//"&#x27;"/"'"}
    
    echo $Title
}

GetMovieURL(){
    #echo "Parse the results for the first IMDb link"
    IMDbNumber=`grep "www.imdb.com/title/tt" -m 1 $1`;
    IMDbNumber=${IMDbNumber%/*};
    IMDbNumber=${IMDbNumber#*www.imdb.com/title/tt};
    echo "http://www.imdb.com/title/tt"$IMDbNumber"/"
}

GetMovieName(){
    ABSfileNoExtension=${1%.$2};
    fileNoExtension=${ABSfileNoExtension##*/};
    echo $fileNoExtension
}

LynxWrite(){
    StingLength=${#1}
    for ((k=0; k <= StingLength-1 ; k++ ))
    do
        l=${1:k:1}
        if [ $l = " " ]; then
            echo "key <space>" >> $2;
        else
            echo "key $l" >> $2;
        fi
    done
}

LynxClear(){
    for k in {1..300}
    do
        echo "key <delete>" >> $1
    done
}

EraseIfThere(){
    if [ -f $1 ]; then
        rm $1
    fi
}


CreateCmdLogForGoogleSearch(){
#This will Write a CMD log to google because google no longer allows direct searching from the URL


#Move to the search input field
for k in {1..10}
do
    echo "key Down Arrow" >> $2
done
#Enter in the Movie name
searchQ="${1} site:imdb.com"
LynxWrite $searchQ $2

#Move to accept the search
echo "key Down Arrow" >> $2
echo "key ^J" >> $2

#Print Output to file
echo "key p" >> $2
echo "key ^J" >> $2
#Erase existing filename
LynxClear $2
#Erase old file if there
EraseIfThere "SearchOutput.PyRo"
#Write in new file name
LynxWrite "SearchOutput.PyRo" $2
echo "key ^J" >> $2

#Create file for Source
EraseIfThere "SearchOutputSource.PyRo"
echo "key \\" >> $2
echo "key p" >> $2
echo "key ^J" >> $2
LynxClear $2
LynxWrite "SearchOutputSource.PyRo" $2
echo "key ^J" >> $2

#Quit Lynx
echo "key q" >> $2
echo "key y" >> $2
}

LynxLogFile="LynxCmdLog.PyRo"
searchForFileType="m4v"
for x in `ls *.$searchForFileType`
do
    MovieName=`GetMovieName $x $searchForFileType`
    echo $MovieName
    EraseIfThere $LynxLogFile
    CreateCmdLogForGoogleSearch $MovieName $LynxLogFile
    lynx -accept_all_cookies -force_html -cmd_script="$LynxLogFile" http://www.google.com

    MovieURL=`GetMovieURL "SearchOutput.PyRo"`
    echo $MovieURL
    NewMovieFileName=`GetMovieNameFromIMDB $MovieURL`
    
    #Now that we have the MovieURL and a new file name we can Create the folder, nfo, and rename the movie
    mkdir $NewMovieFileName
    echo "$MovieURL" > $NewMovieFileName/$NewMovieFileName.nfo
    mv $x $NewMovieFileName/$NewMovieFileName.$searchForFileType
    
    timeDelay=$[ ( $RANDOM % 10 )  + 10 ]
    echo "Time delay set at "$timeDelay" Seconds"
    sleep $timeDelay

done
rm *.PyRo
XBMC on MacMini + <12TB of material displaying on 60" or 150".
Image
Reply
#2
are there any updates regarding the script?

I am looking for a bash script to scraping the folders and rename according to imdb, and create nfo with imdb link, yet.
Reply

Logout Mark Read Team Forum Stats Members Help
Script for Better scraping, File name, and Organization0