[Question] Scrape data using Groovy from Ajax website

Hi

Not sure whether this is the right category.
Does anyone have any experience in scrapping Ajax website using Groovy?

Requirements:

  1. To pull "bin collection day" information from an Ajax website (in my case https://www.banyule.vic.gov.au/Waste-environment/Bin-collection)

  2. To show the appropriate information on Hubitat dashboard

The idea is similar to BinDayCator Lets You Know When To Take Out The Trash | Hackaday

Thanks

Ps: you can use this random address to test: 1/1 Todman Street, WATSONIA

Possibly, but it's hard to say without seeing the actual page you posted but you need an address to do so.

If you can provide an address for testing (doesn't need to be yours) then I'm sure someone will take a look.

1 Like

Ran into a few challenges.

  1. To get the schedule you need a location ID that seems locally significant. Easy enough to get though.
  2. The red tape: they're using a bot detection service that is blocking the HTTP GET request from Hubitat.

Oh wow! You're a legend!

Need to learn stuff from you, for sure!

Ha...I wish.

The second thing was a non-starter. No matter what I tried, I couldn't get past the bot protection. Querying the site from PowerShell and PostMan worked just fine. It's a little beyond mine skills.

Maybe @thebearmay would be gracious enough to take a look.

https://www.banyule.vic.gov.au/ocapi/Public/myarea/wasteservices?geolocationid=804ecb18-478b-411b-b804-80f208d36740&ocsvclang=en-AU

1 Like

If you run the schedule once a week, would the bot still blocking you?
As I don't need to keep querying the website every day.

Thanks

Yes. My presumption is that the way the hub handles the get request is what is triggering the bot protection. I couldn't get a single request to go through.

Perhaps it is to do with you being outside Australia? Maybe Postman and PowerShell we're not providing something HE is that indicates your location?

How do you know where I'm at?

someone vision GIF

It works fine in a standard browser as well and I have location disabled in there.

1 Like

If you've flown over here just to provide IT support, that's true commitment :slightly_smiling_face:

Perhaps I can try it from my end? Consider I'm in AU? Could you please kindly share your groovy code?

This is a driver code. So you'll need to manually install it, then create a virtual device using the code. In the preferences, add this URL, then save

https://www.banyule.vic.gov.au/ocapi/Public/myarea/wasteservices

A good bit of stuff is hard coded, so if it works, it will need some changes before it'll be ready for actual use. After setup, hit the button and you should get some log entries from the driver. Screenshot and post those here.

Code
/*
 * Http GET Switch
 *
 * Calls URIs with HTTP GET for switch on or off
 * 
 */
metadata {
    definition(name: "Http GET Switch", namespace: "community", author: "Community", importUrl: "https://raw.githubusercontent.com/hubitat/HubitatPublic/master/examples/drivers/httpGetSwitch.groovy") {
        capability "Actuator"
        
        attribute "response", "STRING"
        
        command "runGet"
    }
}

preferences {
    section("URIs") {
        input "onURI", "text", title: "On URI", required: false
        input name: "logEnable", type: "bool", title: "Enable debug logging", defaultValue: true
    }
}

def logsOff() {
    log.warn "debug logging disabled..."
    device.updateSetting("logEnable", [value: "false", type: "bool"])
}

def updated() {
    log.info "updated..."
    log.warn "debug logging is: ${logEnable == true}"
    if (logEnable) runIn(1800, logsOff)
}

def parse(String description) {
    log.debug(description)
}

def runGet() {
    def params = [uri: settings.onURI, 
                 query: ["geolocationid": "1fb7fa97-33e2-48e4-aff2-d2cfd9ec30e6", "ocsvclang": "en-AU"],
                headers: [
        "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.7",
        "Accept-Encoding": "gzip, deflate, br",
        "Accept-Language": "en-US,en;q=0.9",
        "Cache-Control": "max-age=0",
        "Connection": "keep-alive",
        "Host": "www.banyule.vic.gov.au",
        "sec-ch-ua": """Chromium";v="112", "Google Chrome";v="112", "Not:A-Brand";v="99""",
        "sec-ch-ua-mobile": "?1",
        "sec-ch-ua-platform": """Android""",
        "Sec-Fetch-Dest": "document",
        "Sec-Fetch-Mode": "navigate",
        "Sec-Fetch-Site": "none",
        "Sec-Fetch-User": "?1",
        "Upgrade-Insecure-Requests": "1",
        "User-Agent": "Mozilla/5.0 (Linux; Android 6.0; Nexus 5 Build/MRA58N) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/112.0.0.0 Mobile Safari/537.36"]]
    log.debug "Sending GET request to ${params}"

    httpGet(params) { resp ->
        log.debug resp
        log.debug resp.data
        //def json = resp.data.parseJson()

        //def binDay = json.responseContent.findAll { it.contains("Bin day") }.collect { it.replace('\n', '').trim() }.join(", ")
        //def fogo = json.responseContent.findAll { it.contains("FOGO") }.collect { it.replace('\n', '').trim() }.join(", ")
        //def recycling = json.responseContent.findAll { it.contains("Recycling") }.collect { it.replace('\n', '').trim() }.join(", ")
        //def rubbish = json.responseContent.findAll { it.contains("Rubbish") }.collect { it.replace('\n', '').trim() }.join(", ")

        //def response = [binDay, fogo, recycling, rubbish].join("; ")

        //log.debug "Response: ${response}"
        //device.updateAttribute("response", response)
    }
}

2 Likes

Thanks heaps!

The following is the setting page:

The following are the logs:

I ran it three times:

  • First: it was unsuccessful (Request unsuccessful. Incapsula incident ID)
  • Second: I open the website manually through a browser, and ran the script again - this time, it returned a blank response
  • Third: I ran the script again, it was unsuccessful again (Request unsuccessful. Incapsula incident ID)

Did you get a similar response from your end too?

Thanks heaps!

Correct. Incapsula is the bot protection.

1 Like

Ah well...seems there is no easy way...thank you for your kindness in trying this small project of mine.

I've noticed a couple of people been having difficulty beat this bot protection system too...

2 Likes

This topic was automatically closed 365 days after the last reply. New replies are no longer allowed.