Scraping Multiple Webpages In Succession That Require Session


#1

I need to authenticate to a website that requires a GET to retrieve some cookies, a POST with those cookies to authenticate and then the resulting headers and cookies for every call after that. I can httpGet and it appears the headers are in the response but I don't see cookies. Do I have a workaround?

Is there a way I can use a session to group many calls?


#2

Uses a similar technique, captures the cookies then uses them in the 2nd get


[Release] Amazon Alexa Text to Speech (TTS) v0.5.1 - Direct Integration (USA, Canada, UK, & Italy)
#3

Well, I guess this just shows how much I don't know about the HTTP protocol and/or web development. I saw the headers from httpGet's response object but the cookies are not matching what is coming back from Postman.

For example, Postman's headers have two set cookies that HE also shows. Additionally, Postman shows me a XSRF-TOKEN and _MyAccountWeb_session cookie. I don't know where these come from. I can't find them in HE's httpGet response object.

I don't think Postman runs scripts from responses so I have no idea where those other two cookies are coming from. (They don't show up in the Posrman headers either. They do show in the Postman response "Cookies" tab and they are required in the next request.)


#4

I finally had time to look at this again today.

@chuck.schwer I have a problem now that HttpBuilder is gone and I have to rely only on httpGet and httpPost. I have to make a call to a webpage unauthenticated that is going to redirect me with some header/cookies in the response. With HttpBuilder I could get the underlying http client and set it to not redirect so I could look at the response and get the headers. Unfortunately with httpGet I don't see a way to do that. So I hit the page, it sends a response to redirect and some cookies, the cookies aren't saved in the next get to the redirect. The only thing I can get out of the response from the httpGet are the cookies and response from the last call. How can I get around that?