Kodi Community Forum

Full Version: [Release] Parsedom and other functions
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Pages: 1 2 3 4 5 6 7 8 9
oh crap I came in on the wrong page and you've discussed that already woops
newatv2user Wrote:All of 'em Smile

You never know what codes you may encounter. So I'd go for a comprehensive list of all codes that might be encountered.

This has proven slightly difficult in python.

I've run tests with a random sampling of html entity characters. I have taken the first and the last character in each of the four groups on this page:

http://www.ascii.cl/htmlcodes.htm

And a random sampling of other entities.

- Beautiful Soup
Fails with '.
I wouldn't want to use it anyways.

- htmlentitydefs:
Also fails 5 tests, including the first and last char in all four groups.

- HTMLParser unescape
Passes all tests.
Is not documented.
May only support 128 entities on some systems.

None of these options appeal to me, but I'm inclined to try HTMLParser in 0.9.1 to see if it is something we can rely on.

Note:   will be returned as u"\xa0". That could be replaced with a real space. But since \xa0 is the correct ascii value I'm not sure that's the correct implementation.

Any suggestions?
I'll try that. Thanks.
newatv2user Wrote:I'll try that. Thanks.

I've updated the post, since i was done analyzing for now. Crossposted the edit with your post.

I do intend to implement your feature request.

You are free to play with any of the solutions of course, but by 0.9.1 there will be something better in place.

ETA: Is in trunk.
two questions:

1. is there no notion of a tree in this dom parser? Say i want to get an element based on some info from one of it's children. In beautiful soup you just walk to the parent, but how can i do this with parseDOM?

2. Is it possible to use regex in attributes? E.g. I want to match the class "somethingsomething-toggle-something" so I write {"class":".*toggle.*"} but it seems to just completely ignore the attribute and match everything.
ventech Wrote:two questions:

1. is there no notion of a tree in this dom parser? Say i want to get an element based on some info from one of it's children. In beautiful soup you just walk to the parent, but how can i do this with parseDOM?

2. Is it possible to use regex in attributes? E.g. I want to match the class "somethingsomething-toggle-something" so I write {"class":".*toggle.*"} but it seems to just completely ignore the attribute and match everything.

1. There is no way to "Walk" the tree.

2. It should be possible to do "toggle.*". I will investigate and fix for 0.9.1.
TobiasTheCommie Wrote:1. There is no way to "Walk" the tree.

2. It should be possible to do "toggle.*". I will investigate and fix for 0.9.1.

great. thanks
TobiasTheCommie!

Great work. Using Parsedom along with SimpleDownloader is going to save me a crap load of time on my addon projects. Thanks Big Grin
As it should be Smile
Looking at the source code it's obvious why regex in names and attribute keys/values fails. They're just concatenated into the main expression so when you write ".*" it will just match the entire document. Replace the dot [^\"\'] and it all works as expected. I think this is something that could be done by the library.

Also, can't say I'm too happy about the class wrapping you're using here. Is this
just knowledge porting from other languages or something intentional?
ventech Wrote:Looking at the source code it's obvious why regex in names and attribute keys/values fails. They're just concatenated into the main expression so when you write ".*" it will just match the entire document. Replace the dot [^\"\'] and it all works as expected. I think this is something that could be done by the library.
Yes, this issue will be fixed in the library for the next release.

ventech Wrote:Also, can't say I'm too happy about the class wrapping you're using here. Is this
just knowledge porting from other languages or something intentional?

I'm not sure exactly what you are referring too with this.

This is basically some code for the YouTube plugin that we just extracted out as a dependency, because we also wanted to use it in our other plugins. So there is some legacy from that(some of which have been fixed in trunk).

But until version 1.0.0 is released you should be able to convince me of almost anything, as long as it makes sense. So do tell me what you find lacking or faulty, and lets get it fixed. Smile
TobiasTheCommie Wrote:Yes, this issue will be fixed in the library for the next release.
ok good. is it implemented? if so where do you host the code so i can get it?


TobiasTheCommie Wrote:I'm not sure exactly what you are referring too with this.

This is basically some code for the YouTube plugin that we just extracted out as a dependency, because we also wanted to use it in our other plugins. So there is some legacy from that(some of which have been fixed in trunk).

But until version 1.0.0 is released you should be able to convince me of almost anything, as long as it makes sense. So do tell me what you find lacking or faulty, and lets get it fixed. Smile
Well, my suggestion is to remove the class and make it a module to utilize pythons module system. Correct me if im wrong but to me the CommonFunctions class looks completely stateless, so the class makes no sense for python. What i want to write is just "import common" and "from common import log" and not doing the whole object creation and setting properties all the time. I think this is much more pythonic dont you agree.

Another 'problem' with not using the module system is that, lets say you have multiple modules in your addon (which i do) which all use the common functions. Then, all of these will each have it's own, yet identical, instance of CommonFunctions. With modules you do not..
ventech Wrote:ok good. is it implemented? if so where do you host the code so i can get it?
A fix for this has not been commited yet.

ETA: I just looked at our tests, and we do have a test for wild cards that does pass.
http://tc.tobiasussing.dk/jenkins/job/Co...rd_search/

You can get the latest code with mercurial from: http://hg.tobiasussing.dk/hgweb.cgi/commonxbmc/

ventech Wrote:Well, my suggestion is to remove the class and make it a module to utilize pythons module system. Correct me if im wrong but to me the CommonFunctions class looks completely stateless, so the class makes no sense for python. What i want to write is just "import common" and "from common import log" and not doing the whole object creation and setting properties all the time. I think this is much more pythonic dont you agree.

Another 'problem' with not using the module system is that, lets say you have multiple modules in your addon (which i do) which all use the common functions. Then, all of these will each have it's own, yet identical, instance of CommonFunctions. With modules you do not..
Ahh, i see what you are talking about. We will look into this.

ETA: The next version will be stateless. This have been committed to our trunk.
python conversion is lower case for modules and camel case for classes, so it should really be 'common_functions' or preferably just 'common', if you want to follow conversion.

I have another feature request for improving xml support, more precisely: a way of getting the CDATA content. parseDOM does not do this, and stripTags removes the entire thing. The minidom solution is not exactly elegant so I'd rather have parseDOM strip this automatically.

Also, how about moving your code to github or something, so people can contribute..?
ventech Wrote:python conversion is lower case for modules and camel case for classes, so it should really be 'common_functions' or preferably just 'common', if you want to follow conversion.
Hm, not a bad idea, we might do this.

ventech Wrote:I have another feature request for improving xml support, more precisely: a way of getting the CDATA content. parseDOM does not do this, and stripTags removes the entire thing. The minidom solution is not exactly elegant so I'd rather have parseDOM strip this automatically.
I will have to investigate if that is really doable.

As such parseDOM isn't meant to do XML, and it has other glaring omissions (it will not match the tag in <element tag=False> because it is missing quotation marks.) But we are planning to move away from minidom, elementtree, et al. And try to use parseDOM in BlipTV/Vimeo/YouTube for the XML we do encounter.

So, i will look into it, since we do want (some) XML support.

ventech Wrote:Also, how about moving your code to github or something, so people can contribute..?
Never gonna happen.

Our repository is at http://hg.tobiasussing.dk/hgweb.cgi/commonxbmc/ and patches are of course welcome. But github is NOT gonna happen.
Pages: 1 2 3 4 5 6 7 8 9