I will keep a separated library for each sport so differenciating sports is not an issue - all sports must be in a different folder. I think that is fine because people usually only collect 1 or 2 sport, not all of them. I'll try to explain how I think we can do this, comments appreciated.
Then I was thinking in doing something like having 2 options, one for "all events on the same folder" (a) another for "each subfolder is an event"(b). This for each sport.
Then define a sort of priority:
1) if the folder (or file in (a)) has a tag like [thesportsdb:1234] then assume 1234 is the event ID and we are done. [NEGATIVE SQUARED LATIN CAPITAL LETTER B]finished![/B]
2) The next step is getting the season. Try to match 4digits/4digits or 2digits/2digits. If there's a match we know the season and move to 3) if not move to 10)
3) If we are here we know the season. The next step is finding the league. Try to find the league of the event checking against all available leagues for the sport we are looking for. If a league is found -> move to 4) if not move to 5)
4) If we are here we already know the league and the season so try to find the round by matching "roundX" or "round_x" or "round x " or "round-x" in which the world round can be any other translation like jornada (in PT) or similar in another language. If we found a round we move to 6) if not move to 7)
5) If we are here we don't know which league we are talking about. In this case we will try to grab the home and the away team by matching something like "(.+?) vs. (.+?)","(.+?) vs (.+?)", "(.+?) x (.+?)". If we can't get a result than the event won't be scrapped and the user is notified. If we can match the teams we'll try to search thesportsdb for both teams and grab the league each team belong. If they are the same we can than assume the event belongs to this league. In this case, we found the league and can then move to 3).
6) If we get here we know the season, league and round. Try to find a date string in the file/folder to complement the available information. Then search thesportsdb filtering the results for round and date or only round. If a match is found -> [NEGATIVE SQUARED LATIN CAPITAL LETTER B]finished![/B], if not move to 7)
7) If we are here we know the league and season but we don't know the round. We try first do to the same as 6 and try to find a date string. Then we do the same as 5) to try to match home and away teams. Then we search thesportsdb for events in the league we know, which features the home and away team and, if a date string is available, use this parameter to also filter the results. We are likely to find a match ([NEGATIVE SQUARED LATIN CAPITAL LETTER B]finished![/B]), or multiple matches (ask the user for the different results, let him decide and [NEGATIVE SQUARED LATIN CAPITAL LETTER B]finished![/B]) or we don't have matches and we report we couldn't grab the event.
10) if we are here we don't know the season but we know the league. So we go back to the procedure above and do some logic. Firsrt round. Then date. If round we know league and round -> match home and away teams. If home and away teams search thesportsdb for league,home/away teams, round, date. If not round match by round, league and away/home teams. If match -> [NEGATIVE SQUARED LATIN CAPITAL LETTER B]finished![/B], if multiple matches -> let the user decide and [NEGATIVE SQUARED LATIN CAPITAL LETTER B]finished![/B], if no match -> informe the user we couldn't do it.
@
zag please have a look at my requests at thesportsdb forum