From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from lists.gentoo.org (pigeon.gentoo.org [208.92.234.80]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by finch.gentoo.org (Postfix) with ESMTPS id 01E3C1382C5 for ; Thu, 14 Jan 2021 21:36:42 +0000 (UTC) Received: from pigeon.gentoo.org (localhost [127.0.0.1]) by pigeon.gentoo.org (Postfix) with SMTP id 646C5E0855; Thu, 14 Jan 2021 21:36:37 +0000 (UTC) Received: from mout.web.de (mout.web.de [212.227.15.3]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by pigeon.gentoo.org (Postfix) with ESMTPS id C02BCE0841 for ; Thu, 14 Jan 2021 21:36:36 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=web.de; s=dbaedf251592; t=1610660184; bh=xZO+LQ9qR6PO5Qf/SAGtjd3ExQY7HRdfFnhNe3lJgzs=; h=X-UI-Sender-Class:Date:From:To:Subject:In-Reply-To:References; b=RWnuVFP4rSMKzL65wd1YKM695/nNb59V2ei30SbAVvCg32ksiCbRDW6MKZz2QuIbF jxSJ5jk2HpTJMo1P7L4sZt343fLTQrrjyCRw8czkNhqBslB3N1Z3TA01+uPZiytJuy tEjOeAYeHhhdRYuprBNJmnYqcr1H/tq1omSVI7pg= X-UI-Sender-Class: c548c8c5-30a9-4db5-a2e7-cb6cb037b8f9 Received: from localhost ([213.55.225.168]) by smtp.web.de (mrweb006 [213.165.67.108]) with ESMTPSA (Nemesis) id 1MnpL4-1lp7Io1jw2-00ph97 for ; Thu, 14 Jan 2021 22:36:24 +0100 Date: Thu, 14 Jan 2021 22:36:22 +0100 From: Andreas Fink To: gentoo-user@lists.gentoo.org Subject: Re: [gentoo-user] [OT] Differences between wget and browser file retrieval? In-Reply-To: References: X-Mailer: Claws Mail 3.17.8 (GTK+ 2.24.32; x86_64-pc-linux-gnu) Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-Id: Gentoo Linux mail X-BeenThere: gentoo-user@lists.gentoo.org Reply-to: gentoo-user@lists.gentoo.org X-Auto-Response-Suppress: DR, RN, NRN, OOF, AutoReply MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: quoted-printable Message-ID: <1MxYXF-1lxjzF26S9-00xYxm@smtp.web.de> X-Provags-ID: V03:K1:mM3efI7Ldh6MW9yFrbuYDcmBtrmaulql02QEc0a09i8LCtr6FMM 1zKS4U+k5k9AiuKD5ObAIkdR68qpDRAgS7UnaHWsRnnA8ym8hQVKd2+HcIzpBAA33sljT1e K9waINXJnd8NXDlMzro4srEYS5L1MowKPZlnZPaZXH537r1urE6gZ8j10g8vpbROCU9l1TA 6lFeS3UiqzDAo/OrfJlKQ== X-Spam-Flag: NO X-UI-Out-Filterresults: notjunk:1;V03:K0:hA6M/xABq3U=:/qdTaENH9e9JxgLAjWW10Y 0GYvYKBEEZx8JhYf750Yt5vemWllsmXjadsD4nZvpQFMIQC41hRXipZRoRbWyZuHhYfCLvZIw 97dt1478gv3TvTzOPhpH4yaaZQBEaVQMT/SPdonb/LoVT5YC9NYb5xZUa1Ml2kVMTJ9X+K846 l5QevS7Czt0CPzEgJFczt6H4TdWx52bqlOvvfGWL+WzCS2ZOtBWZqukCfQZ/+zHJUS4Qr9v0k IMiRM4rmSgELxJ1aYFKV2SxzEWAOxes21ZRdkl9mSAXlXw/aVOPgHqwBDtZTUmEXyLxG+F9lD oj9/Kx0AVRRdx7y5FOZeqqnGYc87Kp9KtSOk8qk4lhBy6blkMqUyv34TCP/2JxFuUwRZvS7Hd xHMb1atzEsJRqsjR8QnGDvD9rDUZaSKV8sFhv/c7ccBW0NwBoflmm+vuOwlHZyI+/DPG6BmP9 AwcBpiOE1cKwC0JfDUaS2KRvXUzDjUfi2j+f/AxswW5/juyxEs0IEO68bm3upHjLqtbKYRRpr 1uejh22QNidRVjP2gcPw3YoTAwuz7zNEsUXKm1hGxtp9k86OSDhf/3yIB8M64s8W30cylXYR2 bo72iQJezV0CyGlmaxzLfOCS+KOKgpuY+69HQFhmTsbosnRS4q/5eZqQHINnh3p4sGkGVSTYn TWAwqZOUkH+CpCSjAj1yUbW9Zg8E1WOCrVCpIhwrJhRB6UsVozbzvVMazFvzqloiqUgjMdw8Q Hl04sxfzRWhcTpzMsgbTVeVz2ne2Ns/iytAGayILkheWpPAo8WuvmWjx0Sx11oT6WF3P8l1gJ b5/tlOW5g8ouEt4FT0eT8rJa/0xc5zrTJ0MvZSBwvJeNTmCExoLp0Q/7JeZSg1S/RoemMbRBS lt1FKeJ+vSAdckUOS6tvKF0ATSrDpnsN5Py62XnGI= X-Archives-Salt: 71fcfa21-e876-40be-aa87-4d2cbc5a9045 X-Archives-Hash: 322720f2de16d05bda9c68babfd0f541 On Thu, 14 Jan 2021 16:10:09 -0500 Jack wrote: > On 2021.01.14 15:49, Walter Dnes wrote: > > I'm bored, so I do a regular daily report at the DSL Reports > > "CanChat" > > sub-forum, on the Covid-19 case counts for Ontario, using provincial > > data. I download 2 files daily as source data. One of them is a PDF > > file, which is run through "pdftotext" and then parsed by a bash > > script > > (don't ask). Today, the command... > > > > wget https://files.ontario.ca/moh-covid-19-report-en-2021-01-14.pdf > > > > ...returns a zero-byte file. *BUT*, sticking the URL into the URL bar > > of Pale Moon and Google Chrome (and I assume Firefox/etc) brings up > > the > > PDF file just fine. Is "wget" being blocked? I have to do extra > > steps > > to get from the browser-invoked PDF to get the PDF file saved to the > > standard work area where my script expects it to be, so it can work > > its > > magic and parse out the daily breakdown by PHU (Public Health Unit). > > BTW, today's posts requiring the PDF file are... > > https://www.dslreports.com/forum/r33002718- > > https://www.dslreports.com/forum/r33002752- > > > > I've tried setting --user-agent=3D with my browser's string as shown > > by > > https://www.whatismybrowser.com/detect/what-is-my-user-agent but no > > luck. Is there some way to get around this? I have not updated this > > past week, so I don't think the problem is at my end. > > I just copy/pasted that wget command into my terminal, and it got me a > 1.7M PDF doc. I'm in the US, but I have no idea if location/IP is an > issue or not. > > Jack > I could download the file too with the wget command that you posted. If you still have trouble, you could try using curl and pretend that you're a firefox: curl 'https://files.ontario.ca/moh-covid-19-report-en-2021-01-14.pdf' -H '= User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:84.0) Gecko/20100101 Firefo= x/84.0' -H 'Accept: text/html,application/xhtml+xml,application/xml;q=3D0.= 9,image/webp,*/*;q=3D0.8' -H 'Accept-Language: en,de;q=3D0.7,en-US;q=3D0.3= ' --compressed -H 'DNT: 1' -H 'Connection: keep-alive' -H 'Upgrade-Insecur= e-Requests: 1' -H 'Pragma: no-cache' -H 'Cache-Control: no-cache' > moh-co= vid-19-report-en-2021-01-14.pdf Andreas