Thanks for your input. It is greatly appreciated.
ProphetVX Wrote:Simple question, but when you are scanning and processing the information.. Is your process multi-threaded? With VB.NET it's quite common for developers to not take advantage of threading, it'll make a huge difference when dealing with large datasets. I assume given the complexity of the project you've already taken this route, but even just 1 thread handling the file I/O and then another thread processing a queue would increase performance exponentially.
You are correct. Currently I'm not doing any threading. Here is the basic process that accounts for most of the import function. I admit my experience creating multi-threaded apps is limited. How would you recommend threading this?
Once I find an Artist path, I generate a list of all music files within that folder and all subfolders. I then scan each of these files for tagging metadata and write everything discovered for the file to the database. After all files have been scanned, I examine the combined tagging info to look for inconsistencies. Then I write the information obtained for this artist into the artist table and if naming conflicts are found, write that information to the database as well before moving on to the next artist folder and repeating the process.
ProphetVX Wrote:Another way to speed up the processing is to make sure your regex is compiled and not interpreted every time it goes through a loop etc. Your application can take a severe performance hit if the interpreter needs to translate the regex pattern every time it uses it.
Rather than simply use regex, I wrote a custom function to parse the filename and compare it to the folder structure definition. It seemed to nearly as fast and offered more flexibly for inexact matches (items that did not conform to the folder structure pattern defined before the import). I'll have to take another look at this and see how much of a performance hit/gain I get revising it. Up until this point, my main concern was functionality, but soon enough optimization will take a more important role.
ProphetVX Wrote:In saying that for a large collection taking an hour in the initial import isn't necessarily slow. The biggest cost is the I/O, and unfortunately much of the time there is little you can do to optimise it.
Yeah. I agree. I think all the disk I/O is causing most of the slow down.