HTML Parser

NTech · Post by **NTech** » Thu May 24, 2018 9:52 pm

Here are HTML parser functions in JustBasic. Eventually, this will include all that is needed to equip a rendering engine.
It will include:

--GetNextElement() ~Finds an element, and returns that along with the text before and after it. It finds the first element present. Elements may be present in text after the element found, so it returns that as well.

--GetAttributes() ~Processes attributes in the element found by GetNextElement()

I beleive that this is all that is needed to parse an HTML file for a rendering engine. Please reply if you have found any bugs or have a way to improve things. I am also going to work on a JB rendering engine (once these parsing functions are complete)!

NTech · Post by **NTech** » Thu May 24, 2018 9:54 pm

Version 1.00

Includes code demonstrating how to use it. All you need to do is paste the function into your own code.

Code: Select all

'HTML Parser | GetNextElement

a$=HTML.GetNextElement$("text before element <element> text within element </element> text after element") 'We pass a line of HTML to the parser, wanting to know the text before
                                                                                                           'the element, what the element is, and text after the element. This
                                                                                                           'will not detect attributes. A seperate function will have to process
                                                                                                           'what this function has detected as the element, to find attributes
                                                                                                           'if there are any.
print "The text before the detected element is: ";word$(a$,1,"`")
print "The element detected is: ";word$(a$,2,"`")
print "The text after the detected element is: ";word$(a$,3,"`")

end

FUNCTION HTML.GetNextElement$(str$) 'Parses text before the next element, the next element, and the text after the next element.

    beforeElement$ = ""
    NoElement = 0
    i = 0

    while return$ = ""
        i = i + 1
        bit$ = right$(left$(str$,i),1) 'Get a letter.

        if bit$ = "<" then
            LeftMargin = i
            exit while
        end if

        if i = len(str$) then
            NoElement = 1
            exit while
        end if

        beforeElement$ = beforeElement$ + bit$

    wend

    if NoElement then exit function

    while return$ = ""
        i = i + 1
        bit$ = right$(left$(str$,i),1) 'Get a letter.

        if bit$ = ">" then
            RightMargin = i
            exit while
        end if

        if i = len(str$) then
            NoElement = 1
            exit while
        end if
    wend

    if NoElement then exit function

    for p = LeftMargin+1 to RightMargin-1 'Detect element's name
        elem$ = elem$ + right$(left$(str$,p),1)
    next

    'We will now return the string passed to us without the element we have just processed.
    postElement$ = right$(str$,len(str$)-RightMargin)

    'We have found an element!
    HTML.GetNextElement$ = beforeElement$;"`";elem$;"`";postElement$

END FUNCTION

Just BASIC Files Archive

HTML Parser

HTML Parser

HTML.GetNextElement()