I'm posting this in Technical Questions, as it's not about investing, per se, but let me know if another category is better.
I need a simple program to download a web page, given a URL. I need a Windows PC "console app" (that is, command-line program). In the past I have used WGET, and now I'm trying HTTRACK, but it is not working; I don't get the same web page that is displayed in the browser (using the Brave browser).
The URL is
https://www.marketwatch.com/investing/fund/fmagx/download-data?mod=mw_quote_tabThat's historical NAV data for the Fidelity Magellan fund, for example.
But there are many funds, ETFs, indexes that I want to fetch, on a weekly basis, and pluck out the data I need.
This used to work fine with the bigcharts version of marketwatch, but they've discontinued that.
Thanks for any tips you can give me.
Randy
Comments
It seems like these programs don't return the same web pages that a browser does, given the same URL. I'm suspecting the web servers somehow detect the request is coming from a program that's not a browser, and refuse to send the same data.
Is it possible to write a script with a list of URLs and output files, and have a web browser "execute" it by loading each URL and sending the web page html to a different file for each? Do any of Brave/Chrome/Edge/FireFox have this capability? Again, this is on a Windows PC.
Thanks!
Invoke-WebRequest -Uri "URL" -OutFile
Invoke-WebRequest -Uri "https://www.marketwatch.com/investing/fund/fmagx/download-data?mod=mw_quote_tab" -OutFile marketwatch_data.html
I downloaded Powershell, and I'm trying
invoke-webrequest -uri https://www.nasdaq.com/market-activity/mutual-fund/fmagx/historical -outfile tmp.htm >out.out
And it gives me output in tmp.htm, but it doesn't include the data table. When I load tmp.htm within Brave (browser) it appears as a skeleton of the page, with no data.
Any ideas why the web server is not sending the data?
BTW when I try the same thing with marketwatch.com I get a block of html coming to the console (not to tmp.htm or to the redirected stdout stream), looks like it's a conversation to confirm I'm a human and not a bot. Not sure why it does this with invoke-webrequest but doesn't if I visit the site via the Brave browser.
thx!
Randy