computing
  • 4

Solved Parse Html File With VBScript

  • 4

Hello all,

I am writing a script to bulk copy/rename files, and need a little help. I really don’t have any experience with vbscript, however I think I’ve done pretty well thus far getting to where I am.

I need help writing a portion of code that will loop until it finds a given file name. For example, the part of the html file looks like this:

 

<td width=544 valign=top style='width:407.7pt;padding:0cm 5.4pt 0cm 5.4pt'>
  <p class=OHHpara align=left style='text-align:left'><span style='font-size:
  11.0pt'><a href="#">Servicing Agreement</a></span></p>

EDIT: Not sure why, but the # in the href should be a link to a local document, in this case 08.pdf. It keeps changing to a # when I edit it.

I need help getting the script to locate the file name given, in this case “08.pdf”, which I already have stored in a variable. I then need it to get the text that follows, in this case “Servicing Agreement” and store it in a new variable. In this case, the text is all on the same line, however there are ones that will be longer and span two+ lines. So I need the text between the following > and <

I think I can manage the copying/renaming myself, but I am lost as to how to correctly parse this file… Can anyone help?

I can post the code I have so far, as long as you all promise not to laugh ;).. It’s a Frankenstein of examples I’ve found online and is very likely not optimal, but so far does what I need I think.

Thank you,

Matt

Can I put a question here?

message edited by Matt123

Share

1 Answer

  1. Well, I said I’d do code, so here it is. I’m really rusty and I always go overboard when I have a GUI, so the script ends up being bigger than it needs to be. Normally you’d never bother showing IE. It’d simplify matters a bit, as you wouldn’t need to use events like I end up doing.

    Out of 50 lines, 2 of them are basic initialization. 5 of them are involved with getting the information out of the HTML. 15 are handling the new file name and the actual rename. The rest are about using IE as a UI.

    To use: Run the script, then drag the HTML file onto the IE window that spawns. Repeat as needed.

    'The setup. All work is handled by IE_NavigateComplete2.
    initing = True
    Set fso = CreateObject("Scripting.FileSystemObject")
    Set ie = WScript.CreateObject("InternetExplorer.Application", "IE_")
    ie.RegisterAsDropTarget = True : ie.AddressBar = False
    ie.Navigate "about:blank"
    With ie.Document
      Dim msg : Set msg = .createElement("div")
      msg.innerText = "Please drag/drop the web page here"
      .body.appendChild msg
    End With
    initing = False : ie.Visible = True
    
    While True 'Script exit handled by IE_OnQuit
      WScript.Sleep 100
    Wend
    
    Sub IE_DocumentComplete(pDisp, URL)
      If initing Then Exit Sub
      ie.RegisterAsDropTarget = False
      dir = fso.GetParentFolderName(URL) & ""
      
      For Each a In ie.Document.getElementsByTagName("a")
        RenameFile dir, a.getAttribute("href", 2), a.innerText
      Next 'a
      WScript.Echo "Done. You can close IE, or drag/drop another file."
      ie.RegisterAsDropTarget = True
    End Sub
    
    Sub IE_OnQuit()
      WScript.Quit
    End Sub
    
    Sub RenameFile(sDir, sOld, sNew)
      invalids = Array(":", "", "/", "*", "?", "<", ">", "|", """")
      For Each c in invalids
        sNew = Replace(sNew, c, "")
      Next 'c
      sNew = Trim(sNew)
    
      ext = "." & fso.GetExtensionName(sOld)
      If fso.FileExists(sDir & sNew & ext) Then
        cnt = 1
        Do While fso.FileExists(sDir & sNew & " (" & cnt & ")" & ext)
          cnt = cnt + 1
        Loop
        sNew = sNew & " (" & cnt & ")"
      End If
      fso.GetFile(sDir & sOld).Name = sNew & ext
    End Sub

    How To Ask Questions The Smart Way

    message edited by Razor2.3

    • 0