2011/01/03

Download All Podcasts from A Html Page (Conversations with History)

Download All Podcasts from A Html Page

(Cross-Post from main website)

I love the excellent uctv TV program 'conversation with history', with Harry Kreisler (podcast url). I like it so much that I want to listen to them all. The problem is that I don't want to bother downloading them one by one to my mp3 player. The webpage labels the .mp3 inconveniently as non-consecutive numbers (ex: 72365.mp3). It makes it even harder to figure out if I listened to one of the downloaded files.

This is a script that fetches the podcast's url, downloads all the mp3 and names the files after the podcasts' title (ex: 'Legislating for the People, with Ronald V. Dellums.mp3').

Run the Scrip


python3 download_cwh.py

Output


INFO: Starting at 2011-01-02 17:38
DEBUG: Fetching http://podcast.uctv.tv/mp3/20378.mp3, writing to: 0001_islam, identity, and globalization with tariq ramadan.mp3
DEBUG: Fetching http://podcast.uctv.tv/mp3/19846.mp3, writing to: 0002_henry kaplan and the story of hodgkin's disease.mp3
DEBUG: Fetching http://podcast.uctv.tv/mp3/19856.mp3, writing to: 0003_america's path to permanent war.mp3
DEBUG: Fetching http://podcast.uctv.tv/mp3/19488.mp3, writing to: 0004_reforming american health care.mp3
DEBUG: Fetching http://podcast.uctv.tv/mp3/19602.mp3, writing to: 0005_the bp disaster - lessons from the niger delta.mp3
DEBUG: Fetching http://podcast.uctv.tv/mp3/19332.mp3, writing to: 0006_political awakenings.mp3
DEBUG: Fetching http://podcast.uctv.tv/mp3/19331.mp3, writing to: 0007_science diplomacy and nuclear threats.mp3
DEBUG: Fetching http://podcast.uctv.tv/mp3/19197.mp3, writing to: 0008_nuclear proliferation with ambassador gregory l. schulte.mp3
DEBUG: Fetching http://podcast.uctv.tv/mp3/18709.mp3, writing to: 0009_from salvation to spirituality.mp3
DEBUG: Fetching http://podcast.uctv.tv/mp3/18527.mp3, writing to: 0010_reflections on u.s.- canada relations.mp3
DEBUG: Fetching http://podcast.uctv.tv/mp3/18541.mp3, writing to: 0011_reflections on the university of california.mp3
DEBUG: Fetching http://podcast.uctv.tv/mp3/18526.mp3, writing to: 0012_islam and the secular state.mp3
DEBUG: Fetching http://podcast.uctv.tv/mp3/18336.mp3, writing to: 0013_the modern presidency and the national security state with garry wills.mp3
DEBUG: Fetching http://podcast.uctv.tv/mp3/18375.mp3, writing to: 0014_the making of a marine officer.mp3
DEBUG: Fetching http://podcast.uctv.tv/mp3/18192.mp3, writing to: 0015_american democracy, veterans, and higher education.mp3
DEBUG: Fetching http://podcast.uctv.tv/mp3/18152.mp3, writing to: 0016_what made california great.mp3
DEBUG: Fetching http://podcast.uctv.tv/mp3/18124.mp3, writing to: 0017_what happens when other countries have the money.mp3
DEBUG: Fetching http://podcast.uctv.tv/mp3/17867.mp3, writing to: 0018_leadership in higher education with hanna holborn gray.mp3
DEBUG: Fetching http://podcast.uctv.tv/mp3/17979.mp3, writing to: 0019_the grand strategy of the byzantine empire with edward n. luttwak.mp3
DEBUG: Fetching http://podcast.uctv.tv/mp3/17870.mp3, writing to: 0020_the diaspora and israel.mp3
DEBUG: Fetching http://podcast.uctv.tv/mp3/17787.mp3, writing to: 0021_finding an authentic voice.mp3
DEBUG: Fetching http://podcast.uctv.tv/mp3/17602.mp3, writing to: 0022_nuclear weapons and international conflict.mp3
DEBUG: Fetching http://podcast.uctv.tv/mp3/17601.mp3, writing to: 0023_a life in science: a sense of wonder.mp3
DEBUG: Fetching http://podcast.uctv.tv/mp3/17429.mp3, writing to: 0024_u.s. policy toward iran: problems and prospects.mp3
DEBUG: Fetching http://podcast.uctv.tv/mp3/17113.mp3, writing to: 0025_dealing with iran.mp3
DEBUG: Fetching http://podcast.uctv.tv/mp3/16935.mp3, writing to: 0026_reaching for the stars.mp3
DEBUG: Fetching http://podcast.uctv.tv/mp3/16749.mp3, writing to: 0027_social science and the public good.mp3
DEBUG: Fetching http://podcast.uctv.tv/mp3/16463.mp3, writing to: 0028_power, ideas and foreign policy in the 21st century.mp3
DEBUG: Fetching http://podcast.uctv.tv/mp3/16593.mp3, writing to: 0029_dignity, human rights, and torture.mp3
DEBUG: Fetching http://podcast.uctv.tv/mp3/16750.mp3, writing to: 0030_the red cross report, the torture memos, and political accountability with mark danner.mp3
DEBUG: Fetching http://podcast.uctv.tv/mp3/16456.mp3, writing to: 0031_building a multilateral international order.mp3
DEBUG: Fetching http://podcast.uctv.tv/mp3/16455.mp3, writing to: 0032_a microbiologist’s intellectual odyssey.mp3
DEBUG: Fetching http://podcast.uctv.tv/mp3/16462.mp3, writing to: 0033_judges and the rule of law.mp3
DEBUG: Fetching http://podcast.uctv.tv/mp3/16461.mp3, writing to: 0034_identity with john perry.mp3
DEBUG: Fetching http://podcast.uctv.tv/mp3/16278.mp3, writing to: 0035_the politics of the veil.mp3
DEBUG: Fetching http://podcast.uctv.tv/mp3/16279.mp3, writing to: 0036_nuclear power and the challenges of global climate change and nuclear proliferation.mp3
DEBUG: Fetching http://podcast.uctv.tv/mp3/16404.mp3, writing to: 0037_congress, globalization, and the economic crisis.mp3
DEBUG: Fetching http://podcast.uctv.tv/mp3/16412.mp3, writing to: 0038_your inner fish with neil shubin.mp3
DEBUG: Fetching http://podcast.uctv.tv/mp3/16283.mp3, writing to: 0039_identity, freedom, and revolution.mp3
DEBUG: Fetching http://podcast.uctv.tv/mp3/16083.mp3, writing to: 0040_lessons from fdr's new deal.mp3
DEBUG: Fetching http://podcast.uctv.tv/mp3/16225.mp3, writing to: 0041_causes and consequences of the global economic collapse.mp3
DEBUG: Fetching http://podcast.uctv.tv/mp3/15887.mp3, writing to: 0042_art and science.mp3
DEBUG: Fetching http://podcast.uctv.tv/mp3/16055.mp3, writing to: 0043_historical perspective on the global economic crisis.mp3
DEBUG: Fetching http://podcast.uctv.tv/mp3/16057.mp3, writing to: 0044_understanding the global environmental crisis.mp3
DEBUG: Fetching http://podcast.uctv.tv/mp3/15882.mp3, writing to: 0045_the politics of food.mp3
DEBUG: Fetching http://podcast.uctv.tv/mp3/15783.mp3, writing to: 0046_islam in the west with jocelyn cesari.mp3
DEBUG: Fetching http://podcast.uctv.tv/mp3/15749.mp3, writing to: 0047_diplomacy with jeremy kinsman.mp3
DEBUG: Fetching http://podcast.uctv.tv/mp3/15744.mp3, writing to: 0048_terrorism, immigration and security since 9/11.mp3
DEBUG: Error (<class 'IOError'>): [Errno 2] No such file or directory: '0048_terrorism, immigration and security since 9/11.mp3'
DEBUG: Fetching http://podcast.uctv.tv/mp3/15426.mp3, writing to: 0049_communication as a tool for european democracy.mp3
DEBUG: Fetching http://podcast.uctv.tv/mp3/15607.mp3, writing to: 0050_global poverty, development, and social change.mp3
DEBUG: Fetching http://podcast.uctv.tv/mp3/15745.mp3, writing to: 0051_the rumsfeld memo and the betrayal of american values.mp3
DEBUG: Fetching http://podcast.uctv.tv/mp3/15591.mp3, writing to: 0052_natural capitalism.mp3
DEBUG: Fetching http://podcast.uctv.tv/mp3/15590.mp3, writing to: 0053_abraham lincoln as commander in chief.mp3
DEBUG: Fetching http://podcast.uctv.tv/mp3/15580.mp3, writing to: 0054_the ascent of money.mp3
DEBUG: Fetching http://podcast.uctv.tv/mp3/15530.mp3, writing to: 0055_charting the geopolitics of a new century.mp3
DEBUG: Fetching http://podcast.uctv.tv/mp3/15444.mp3, writing to: 0056_thinking about religion, secularism and politics.mp3
DEBUG: Fetching http://podcast.uctv.tv/mp3/15411.mp3, writing to: 0057_american foreign policy from the end of the cold war to 9/11.mp3
DEBUG: Error (<class 'IOError'>): [Errno 2] No such file or directory: '0057_american foreign policy from the end of the cold war to 9/11.mp3'
DEBUG: Fetching http://podcast.uctv.tv/mp3/15414.mp3, writing to: 0058_pakistan.mp3
DEBUG: Fetching http://podcast.uctv.tv/mp3/15413.mp3, writing to: 0059_china and the united states.mp3
DEBUG: Fetching http://podcast.uctv.tv/mp3/15394.mp3, writing to: 0060_reflections on the supreme court.mp3
DEBUG: Fetching http://podcast.uctv.tv/mp3/15229.mp3, writing to: 0061_how the war on terror turned into a war on american values.mp3
DEBUG: Fetching http://podcast.uctv.tv/mp3/15135.mp3, writing to: 0062_descent into chaos.mp3
DEBUG: Fetching http://podcast.uctv.tv/mp3/14828.mp3, writing to: 0063_what does china think?.mp3
DEBUG: Fetching http://podcast.uctv.tv/mp3/14818.mp3, writing to: 0064_visualizing the relationship between structure and cellular activity.mp3
DEBUG: Fetching http://podcast.uctv.tv/mp3/14820.mp3, writing to: 0065_terror and consent: the wars for the twenty-first century.mp3
DEBUG: Fetching http://podcast.uctv.tv/mp3/14819.mp3, writing to: 0066_diplomacy and u.s. foreign policy.mp3
DEBUG: Fetching http://podcast.uctv.tv/mp3/14669.mp3, writing to: 0067_biblical insights into the problem of suffering.mp3
DEBUG: Fetching http://podcast.uctv.tv/mp3/14637.mp3, writing to: 0068_the power of words and the power over words.mp3
DEBUG: Fetching http://podcast.uctv.tv/mp3/14632.mp3, writing to: 0069_a surgeon&rsquo;s journey beyond science.mp3
DEBUG: Fetching http://podcast.uctv.tv/mp3/14865.mp3, writing to: 0070_addressing national security challenges in the post 911 world.mp3
DEBUG: Fetching http://podcast.uctv.tv/mp3/14670.mp3, writing to: 0071_reflections on a life as scholar,teacher,and policy advisor.mp3
DEBUG: Fetching http://podcast.uctv.tv/mp3/14633.mp3, writing to: 0072_capitalism, the environment, and crossing from crisis to sustainability.mp3
DEBUG: Fetching http://podcast.uctv.tv/mp3/14510.mp3, writing to: 0073_global competition and the rise of the second world.mp3
DEBUG: Fetching http://podcast.uctv.tv/mp3/14635.mp3, writing to: 0074_vice president cheney and america's response to 911.mp3
DEBUG: Fetching http://podcast.uctv.tv/mp3/14634.mp3, writing to: 0075_afghanistan and pakistan.mp3
DEBUG: Fetching http://podcast.uctv.tv/mp3/14500.mp3, writing to: 0076_u.s. foreign policy and the terrorist threat.mp3
DEBUG: Fetching http://podcast.uctv.tv/mp3/14490.mp3, writing to: 0077_the military in the post 911 world.mp3
DEBUG: Fetching http://podcast.uctv.tv/mp3/14478.mp3, writing to: 0078_the rise of asia and the decline of the west.mp3
DEBUG: Fetching http://podcast.uctv.tv/mp3/14477.mp3, writing to: 0079_chasing the flame.mp3
DEBUG: Fetching http://podcast.uctv.tv/mp3/14341.mp3, writing to: 0080_america&rsquo;s reckless response to terror.mp3
DEBUG: Fetching http://podcast.uctv.tv/mp3/14340.mp3, writing to: 0081_why market reform succeeded and democracy failed in russia.mp3
DEBUG: Fetching http://podcast.uctv.tv/mp3/14259.mp3, writing to: 0082_investigating military conduct at abu ghraib.mp3
DEBUG: Fetching http://podcast.uctv.tv/mp3/14257.mp3, writing to: 0083_the military and political development in egypt, algeria, and turkey.mp3
DEBUG: Fetching http://podcast.uctv.tv/mp3/14234.mp3, writing to: 0084_the shaping of a legal response to 911.mp3
DEBUG: Fetching http://podcast.uctv.tv/mp3/13992.mp3, writing to: 0085_iran, israel, and the united states.mp3
DEBUG: Fetching http://podcast.uctv.tv/mp3/13974.mp3, writing to: 0086_national security and the rule of law.mp3
DEBUG: Fetching http://podcast.uctv.tv/mp3/13889.mp3, writing to: 0087_nuclear terrorism.mp3
DEBUG: Fetching http://podcast.uctv.tv/mp3/13833.mp3, writing to: 0088_science and history.mp3
DEBUG: Fetching http://podcast.uctv.tv/mp3/13832.mp3, writing to: 0089_science, government, and the university.mp3
DEBUG: Fetching http://podcast.uctv.tv/mp3/13830.mp3, writing to: 0090_the moment of empire.mp3
DEBUG: Fetching http://podcast.uctv.tv/mp3/13693.mp3, writing to: 0091_global capitalism, labor markets, and inequality.mp3
DEBUG: Fetching http://podcast.uctv.tv/mp3/13692.mp3, writing to: 0092_system change or more of the same.mp3
DEBUG: Fetching http://podcast.uctv.tv/mp3/13688.mp3, writing to: 0093_the imperial temptation of america.mp3
DEBUG: Fetching http://podcast.uctv.tv/mp3/13687.mp3, writing to: 0094_britain and america and the making of the modern world.mp3
DEBUG: Fetching http://podcast.uctv.tv/mp3/13662.mp3, writing to: 0095_economics, politics and public discourse.mp3
DEBUG: Fetching http://podcast.uctv.tv/mp3/13665.mp3, writing to: 0096_iran - domestic politics and foreign policy.mp3
DEBUG: Fetching http://podcast.uctv.tv/mp3/13555.mp3, writing to: 0097_what terrorists want.mp3
DEBUG: Fetching http://podcast.uctv.tv/mp3/13568.mp3, writing to: 0098_domestic politics and international relations.mp3
DEBUG: Fetching http://podcast.uctv.tv/mp3/13466.mp3, writing to: 0099_inside muslim militancy.mp3
DEBUG: Fetching http://podcast.uctv.tv/mp3/13433.mp3, writing to: 0100_wealth, empire, and the future of america.mp3
DEBUG: Fetching http://podcast.uctv.tv/mp3/13432.mp3, writing to: 0101_nationalism, cosmopolitanism and american national identity.mp3
DEBUG: Fetching http://podcast.uctv.tv/mp3/13327.mp3, writing to: 0102_truth, power, and the iraq debacle with mark danner.mp3
DEBUG: Fetching http://podcast.uctv.tv/mp3/13297.mp3, writing to: 0103_the jewish century with yuri slezkine.mp3
DEBUG: Fetching http://podcast.uctv.tv/mp3/13290.mp3, writing to: 0104_business, government and ethics in an era of globalization with david vogel.mp3
DEBUG: Fetching http://podcast.uctv.tv/mp3/13167.mp3, writing to: 0105_domestic politics and international behavior: the case of china and the u.s..mp3
DEBUG: Fetching http://podcast.uctv.tv/mp3/12356.mp3, writing to: 0106_freedom of expression, tolerance, and human rights with t.m. scanlon.mp3
DEBUG: Fetching http://podcast.uctv.tv/mp3/13013.mp3, writing to: 0107_how traders, preachers, adventurers, and warriors shaped globalization with nayan chanda.mp3
DEBUG: Fetching http://podcast.uctv.tv/mp3/12846.mp3, writing to: 0108_challenges for u.s. national security policy with general tony zinni.mp3
DEBUG: Fetching http://podcast.uctv.tv/mp3/12824.mp3, writing to: 0109_confronting global terrorism: the elements of a liberal grand strategy with tom farer.mp3
DEBUG: Fetching http://podcast.uctv.tv/mp3/12623.mp3, writing to: 0110_israel and the 1967 war with tom segev.mp3
DEBUG: Fetching http://podcast.uctv.tv/mp3/12599.mp3, writing to: 0111_america, europe, and the islamic world with mark steyn.mp3
DEBUG: Fetching http://podcast.uctv.tv/mp3/12490.mp3, writing to: 0112_law, politics, and the coming collapse of the middle class with elizabeth warren.mp3
DEBUG: Fetching http://podcast.uctv.tv/mp3/12489.mp3, writing to: 0113_the last days of the american republic with chalmers johnson.mp3
DEBUG: Fetching http://podcast.uctv.tv/mp3/12294.mp3, writing to: 0114_globalization and the conservative movement in the united states, with john micklethwait.mp3
DEBUG: Fetching http://podcast.uctv.tv/mp3/12295.mp3, writing to: 0115_intuition and rationality with daniel kahneman.mp3
DEBUG: Fetching http://podcast.uctv.tv/mp3/12243.mp3, writing to: 0116_globalization and islam, with olivier roy.mp3
DEBUG: Fetching http://podcast.uctv.tv/mp3/12297.mp3, writing to: 0117_the emergence of the new china with john pomfret.mp3
DEBUG: Fetching http://podcast.uctv.tv/mp3/12185.mp3, writing to: 0118_foreign correspondent - the middle east with robert  fisk.mp3
DEBUG: Fetching http://podcast.uctv.tv/mp3/12117.mp3, writing to: 0119_al-qaeda and the road to 9/11, with lawrence wright.mp3
DEBUG: Error (<class 'IOError'>): [Errno 2] No such file or directory: '0119_al-qaeda and the road to 9/11, with lawrence wright.mp3'
DEBUG: Fetching http://podcast.uctv.tv/mp3/12102.mp3, writing to: 0120_reflections on empire, nationalism and globalization, with kenneth d. kaunda.mp3
DEBUG: Fetching http://podcast.uctv.tv/mp3/12087.mp3, writing to: 0121_ethical realism and u.s. foreign policy, with anatole  lieven and john hulsman.mp3
DEBUG: Fetching http://podcast.uctv.tv/mp3/12103.mp3, writing to: 0122_revolutions in military affairs and the war on terror, with max boot.mp3
DEBUG: Fetching http://podcast.uctv.tv/mp3/12082.mp3, writing to: 0123_the war of the world, with niall ferguson.mp3
DEBUG: Fetching http://podcast.uctv.tv/mp3/12061.mp3, writing to: 0124_a cosmologist&rsquo;s intellectual journey, with james e. peebles.mp3
DEBUG: Fetching http://podcast.uctv.tv/mp3/11984.mp3, writing to: 0125_women's rights, religious freedom, and liberal education, with martha c. nussbaum.mp3
DEBUG: Fetching http://podcast.uctv.tv/mp3/11335.mp3, writing to: 0126_meaning, relevance and the limits of technology, with hubert dreyfus.mp3
DEBUG: Fetching http://podcast.uctv.tv/mp3/11738.mp3, writing to: 0127_larry brilliant.mp3
DEBUG: Fetching http://podcast.uctv.tv/mp3/11794.mp3, writing to: 0128_the struggle for human rights in iran, with shirin  ebadi.mp3
DEBUG: Fetching http://podcast.uctv.tv/mp3/11722.mp3, writing to: 0129_journalism in the digital age, with michael kinsley.mp3
DEBUG: Fetching http://podcast.uctv.tv/mp3/11692.mp3, writing to: 0130_climate change and public policy, with lars-erik liljelund.mp3
DEBUG: Fetching http://podcast.uctv.tv/mp3/11495.mp3, writing to: 0131_military victory in the information age, with stephen d. biddle.mp3
DEBUG: Fetching http://podcast.uctv.tv/mp3/11493.mp3, writing to: 0132_thinking about the &ldquo;unthinkables&rdquo; in the post 911 world, with harold p smith, jr.mp3
DEBUG: Fetching http://podcast.uctv.tv/mp3/11496.mp3, writing to: 0133_europe and the world, with the right honorable lord patten of barnes ch.mp3
DEBUG: Fetching http://podcast.uctv.tv/mp3/11398.mp3, writing to: 0134_the transformation of american politics, with paul pierson.mp3
DEBUG: Fetching http://podcast.uctv.tv/mp3/11244.mp3, writing to: 0135_science and society, with dudley herschbach.mp3
DEBUG: Fetching http://podcast.uctv.tv/mp3/9165.mp3, writing to: 0136_the peace movement in historical perspective, with linus pauling.mp3
DEBUG: Fetching http://podcast.uctv.tv/mp3/9510.mp3, writing to: 0137_on theory, with amartya sen.mp3
DEBUG: Fetching http://podcast.uctv.tv/mp3/9511.mp3, writing to: 0138_the pentagon's new map, with thomas p.m. barnett.mp3
DEBUG: Fetching http://podcast.uctv.tv/mp3/9322.mp3, writing to: 0139_economic history, with robert william fogel.mp3
DEBUG: Fetching http://podcast.uctv.tv/mp3/9171.mp3, writing to: 0140_islam and the state, with vali nasr.mp3
DEBUG: Fetching http://podcast.uctv.tv/mp3/8994.mp3, writing to: 0141_science and politics, with richard c. lewontin.mp3
DEBUG: Fetching http://podcast.uctv.tv/mp3/8991.mp3, writing to: 0142_theory and international institutions, with robert o. keohane.mp3
DEBUG: Fetching http://podcast.uctv.tv/mp3/8800.mp3, writing to: 0143_a geographer's perspective on the new american imperialism, with david harvey.mp3
DEBUG: Fetching http://podcast.uctv.tv/mp3/8639.mp3, writing to: 0144_the myths of globalization: markets, democracy, and ethnic hatred, with amy chua.mp3
DEBUG: Fetching http://podcast.uctv.tv/mp3/8634.mp3, writing to: 0145_occupation and terrorism, with amira hass.mp3
DEBUG: Fetching http://podcast.uctv.tv/mp3/8848.mp3, writing to: 0146_a diplomat's odyssey, with joseph wilson.mp3
DEBUG: Fetching http://podcast.uctv.tv/mp3/8642.mp3, writing to: 0147_a scientist's random walk, with steven chu.mp3
DEBUG: Fetching http://podcast.uctv.tv/mp3/8641.mp3, writing to: 0148_militarism and the american empire, with chalmers johnson.mp3
DEBUG: Fetching http://podcast.uctv.tv/mp3/7894.mp3, writing to: 0149_islam, empire, and the left, with tariq ali.mp3
DEBUG: Fetching http://podcast.uctv.tv/mp3/7682.mp3, writing to: 0150_islam and the west, with john l. esposito.mp3
DEBUG: Fetching http://podcast.uctv.tv/mp3/7390.mp3, writing to: 0151_u.s. foreign policy and the american political tradition, with walter russell mead.mp3
DEBUG: Fetching http://podcast.uctv.tv/mp3/7386.mp3, writing to: 0152_theory, international politics, kenneth n. waltz.mp3
DEBUG: Fetching http://podcast.uctv.tv/mp3/7388.mp3, writing to: 0153_islam and state power in middle east and central asia, with vitaly naumkin.mp3
DEBUG: Fetching http://podcast.uctv.tv/mp3/7242.mp3, writing to: 0154_islamic societies, with ira lapidus.mp3
DEBUG: Fetching http://podcast.uctv.tv/mp3/6797.mp3, writing to: 0155_writing, theatre arts, and political activism, with wole soyinka.mp3
DEBUG: Fetching http://podcast.uctv.tv/mp3/6728.mp3, writing to: 0156_intelligence and national security in a democracy, jennifer e. sims.mp3
DEBUG: Fetching http://podcast.uctv.tv/mp3/6726.mp3, writing to: 0157_u.s. foreign policy and multilateral negotiations, with robert l. gallucci.mp3
DEBUG: Fetching http://podcast.uctv.tv/mp3/6604.mp3, writing to: 0158_the political imagination of islam, with olivier roy.mp3
DEBUG: Fetching http://podcast.uctv.tv/mp3/6570.mp3, writing to: 0159_pakistan &amp; islamic fundamentalism, with khaled ahmed.mp3
DEBUG: Fetching http://podcast.uctv.tv/mp3/6568.mp3, writing to: 0160_activism, anarchism, and power, with noam chomsky.mp3
DEBUG: Fetching http://podcast.uctv.tv/mp3/6569.mp3, writing to: 0161_the rise of militant islam, ahmed rashid.mp3
DEBUG: Fetching http://podcast.uctv.tv/mp3/6233.mp3, writing to: 0162_the case of trauma and recovery, psychological insight and political understanding, with judith herman.mp3
DEBUG: Fetching http://podcast.uctv.tv/mp3/6046.mp3, writing to: 0163_adventures of a scientist, with charles w. townes.mp3
DEBUG: Fetching http://podcast.uctv.tv/mp3/5217.mp3, writing to: 0164_legislating for the people, with ronald v. dellums.mp3
DEBUG: Fetching http://podcast.uctv.tv/mp3/4975.mp3, writing to: 0165_art and healing, with kenzaburo oe.mp3
DEBUG: Fetching http://podcast.uctv.tv/mp3/11289.mp3, writing to: 0166_ethics and foreign policy, with father j. bryan hehir.mp3
DEBUG: Fetching http://podcast.uctv.tv/mp3/7126.mp3, writing to: 0167_intellectual journey:  challenging the conventional wisdom, with john kenneth galbraith.mp3
DEBUG: Fetching http://podcast.uctv.tv/mp3/7127.mp3, writing to: 0168_reporting the story of  genocide, with philip gourevitch.mp3
DEBUG: Fetching http://podcast.uctv.tv/mp3/7131.mp3, writing to: 0169_a life in public service, with robert s. mcnamara.mp3
DEBUG: Fetching http://podcast.uctv.tv/mp3/7796.mp3, writing to: 0170_philosophy and the habits of critical thinking, with john r. searle.mp3
INFO: Finishing at 2011-01-02 22:49

Comment


5 hours and 4GB later later the 170 'conversations with history' have been downloaded.

As it can be seen in the output, X files failed to be saved (IOError). This is simply because there was no validation of the filename (the '/' in 9/11 is illegal in a filename). This has been fixed by the substIllegalCharsInFilename() function. After introducing the function, I ran the script again. It downloaded the missing files whilst skipping the ones previously downloaded items.

The Code! (Also Check Download Section)


#!/usr/bin/python3
 
import os.path
import urllib.request
import logging
import time
 
# constants
LOG_FILENAME = 'download_cwh.log'
CACHE_FILE = 'uctv_cwh_htmlcache.html'
SRC_WEBSITE_CWH = 'http://www.uctv.tv/cwh/'
 
class ScriptLogHandler(logging.FileHandler):
  """
  Save to file and output to screen
  """
  def emit(self, record):
    print("{0}: {1}".format(record.levelname, record.getMessage()))
    logging.FileHandler.emit(self, record)
 
# configure logger
logger = logging.getLogger("script_logger")
logger.setLevel(logging.DEBUG)
log_handler = ScriptLogHandler("download_cwh.log")
log_handler.setLevel(logging.DEBUG)
logger.addHandler(log_handler)
 
class Got404WhileAttemptingToDlPodcast(Exception):
  pass
 
def substIllegalCharsInFilename(filename):
  allowedChars = """abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ1234567890_-,?.': """
 
  lstOutChars = []
  for char in filename:
    if char in allowedChars:
      lstOutChars.append(char)
    else:
      lstOutChars.append('_')
  return "".join(lstOutChars)
 
def getTimeNow():
  strTime = time.strftime("%Y-%m-%d %H:%S", time.localtime())
  return strTime
 
def getHtml():
  sock =  urllib.request.urlopen(SRC_WEBSITE_CWH)
  htmlSource = sock.read()
  sock.close()
  return htmlSource.decode("utf-8")
 
def findTitleUrlTuples(htmlSource):
  curr_program_title = ''
  tag_beginTitle = 'Conversations With History:'.lower()
  tag_mp3 = 'Audio Podcast'.lower()
  lst_TitleUrlTuples = []
  for line in htmlSource.splitlines():
 
    # Ignore case
    line = line.lower()
 
    if line.find(tag_beginTitle) != -1:
      pre = line.split(tag_beginTitle, 1)[1]
      curr_program_title = pre.split('<',1)[0].strip()
 
    if line.find(tag_mp3) != -1 and line.find('.mp3'):
      mp3_url_posthttp          = line.split('http')[1]
      mp3_url_posthttp_premp3   = mp3_url_posthttp.split('.mp3', 1)[0]
      mp3_url = 'http' + mp3_url_posthttp_premp3 + '.mp3'
      lst_TitleUrlTuples.append((curr_program_title, mp3_url))
  return lst_TitleUrlTuples
 
def numTo4DigitsStr(iNum):
  strNum = str(iNum)
  while len(strNum) < 4:
    strNum = '0' + strNum
  return strNum
 
def downloadMp3sFromTitleUrlTuple(titleUrlTuple, fileNo):
  (title, mp3_url) = titleUrlTuple
  filename = "{0}_".format(numTo4DigitsStr(fileNo)) + title + '.mp3'
  filename = substIllegalCharsInFilename(filename)
 
  if not os.path.isfile(filename):
    logger.debug("Fetching {0}, writing to: {1}".format(mp3_url, filename))
 
    sock =  urllib.request.urlopen(mp3_url)
    mp3_bytes = sock.read()
    sock.close()
 
    fh = open(filename, 'wb')
    fh.write(mp3_bytes)
    fh.close()
  else:
    logger.debug("Skipping {0}, file alredy in directory.".format(filename))
 
def downloadMp3sFromTitleUrlTuples(lst_TitleUrlTuples):
  fileNo = 0
  for titleUrlTuple in lst_TitleUrlTuples:
    fileNo = fileNo + 1
    try:
      downloadMp3sFromTitleUrlTuple(titleUrlTuple, fileNo)
    except Got404WhileAttemptingToDlPodcast as inst:
      logger.debug("Error (" + str(type(inst)) + "): " + str(inst))
    except Exception as inst:
      logger.debug("Error (" + str(type(inst)) + "): " + str(inst))
 
def do():
 
  htmlSource = getHtml()  
 
  lst_TitleUrlTuples = findTitleUrlTuples(htmlSource)
  downloadMp3sFromTitleUrlTuples(lst_TitleUrlTuples)
 
if __name__ == '__main__':
  logger.info("Starting at {0}".format(getTimeNow()))
  do()
  logger.info("Finishing at {0}".format(getTimeNow()))

Download


download_cwh.zip

Afterword


It might have been faster to use the 'html.parser.HTMLParser' class and regexes in order to extract the files from the html page.

2011/01/01

Website move notification


This website has moved.

The posts here will stay on, but will no longer be updated. The important projects have been moved to their respective pages at: http://david-web.appspot.com. (e.g: http://david-web.appspot.com/cnt/PEasyCrypthttp://david-web.appspot.com/cnt/FileSpaceAnalysis/http://david-web.appspot.com/cnt/WeightWatch/, ...)

Thank you.

2010/12/26

Fixing Latex Quotes With a Script

Fixing Latex Quotes With a Script


Latex is great, but there are a lot of small quirks to take account for. Fortunately, since Latex is like code that is compiled, it is quite easy to fix some of the problem with text-processing scripts.


One of my pet peeve is that in order to get nice English quotes to work, Latex requires the user to use non-standard quotes. This is not correct: 'quotes', this is correct: `quotes'. Quite annoying when you use a non-English keyboard layout.


I saw that there was no quick solution (e.g. a script that fixes the problem) on stackoverflow.


What the scipt does is quite simple, it will replace the 'quotes' in your latex document into 'proper latex quotes'. Here is an example of the input-output:


Input


This is intended to be a moderately 'hard' test for the 'fixLatexQuotes script'.
If all goes 'well', it's going to work, without making the file 'kaput'.
The scrip should 'work in most situations', however, it will not 'try to fix 'compicated' cases'.

Command Line


python3 fixLatexQuotes.py fixLatexQuotes_test.tex fixLatexQuotes_test_out.tex

Output (fixLatexQuotestestout.tex)


This is intended to be a moderately `hard' test for the `fixLatexQuotes script'.
If all goes `well', it's going to work, without making the file `kaput'.
The scrip should `work in most situations', however, it will not `try to fix 'compicated' cases'.

Note


It should fix the quotes properly, but does not try to fix everything (you can see one instance in the example (the indented quote) that is not fixed). The reason is that I guess when it is too hard to make a decision, it should leave the source alone. For my reports it fixes 95% of problems. :)


The Code!


#python3 script, put in 'fixLatexQuotes.py'.    

import sys
import os
import re

def do(filename_in, filename_out):

  fh = open(filename_in, 'r')
  fc = fh.read()
  fh.close()

  dst = ''
  for line in fc.splitlines():
    line_c = re.sub(r" '([ \w-]+)'", " `\\1'", line)
    if line != line_c:
      print("|{}| -> |{}|".format(line, line_c))
    dst = dst + line_c + '\n'

  if filename_out != '':
    fh = open(filename_out, 'w')
    fh.write(dst)
    fh.close()
  else:
    print('No output file specified, changes discarded.')

if __name__ == '__main__':
  filename_in = sys.argv[1]
  if len(sys.argv) > 2:
    filename_out = sys.argv[2]
  else:
    filename_out = ''

  do(filename_in, filename_out)

How it Works


It is all in the regex:
line_c = re.sub(r" '([ \w-]+)'", " `\1'", line)


The script only applies it on every line. Feel free to tune it and send the improvements!

2010/07/02

Analyse Which Files Take the Most Space; Linux Version

I was curious to learn what takes up space on a Linux distribution, so I decided to run the script mentioned in the previous post. Unlike on Windows, here it is more convenient to run the script from python (I did not bother to compile a binary on Linux).

You can download the script here. (The script in the last post did not include a small dependency).

Here is the result on my distribution:


Total size: 3132993k
/usr/lib/libpython2.6.so.1                                   : 2340k
/usr/lib/libpython2.6.so.1.0                                 : 2340k
/usr/lib/python2.6/config/libpython2.6.so                    : 2340k
/usr/lib/dri/r128_dri.so                                     : 2342k
/usr/lib/dri/i810_dri.so                                     : 2350k
apt/archives/xserver-xorg-core_2%3a1.7.6-2ubuntu7.2_i386.deb : 2352k
/var/cache/apt/archives/evolution_2.28.3-0ubuntu10_i386.deb  : 2370k
/usr/lib/dri/savage_dri.so                                   : 2386k
/usr/lib/dri/tdfx_dri.so                                     : 2394k
/usr/bin/python3.1                                           : 2407k
/usr/lib/dri/sis_dri.so                                      : 2410k
/usr/lib/dri/mach64_dri.so                                   : 2414k
/usr/lib/openoffice/basis3.2/program/libchartcontrollerli.so : 2420k
/usr/lib/dri/r600_dri.so                                     : 2430k
/usr/lib/dri/mga_dri.so                                      : 2442k
/var/cache/apt/archives/libgtk2.0-0_2.20.1-0ubuntu1_i386.deb : 2455k
/usr/lib/dri/radeon_dri.so                                   : 2455k
/usr/lib/openoffice/basis3.2/program/libscfiltli.so          : 2467k
/usr/lib/dri/r200_dri.so                                     : 2489k
/usr/lib/dri/r300_dri.so                                     : 2491k
/usr/lib/openoffice/basis3.2/program/libfrmli.so             : 2502k
le-content/Ubuntu_Free_Culture_Showcase/UbuntuIsHumanity.ogv : 2503k
/usr/lib/mono/2.0/mscorlib.dll                               : 2508k
/usr/lib/vmware-tools/configurator/XOrg/7.6/vmwgfx_dri.so    : 2522k
/usr/lib/vmware-tools/configurator/XOrg/7.5/vmwgfx_dri.so    : 2535k
/var/lib/aspell/en-common.rws                                : 2589k
/usr/lib/aspell/en-common.rws                                : 2589k
/usr/lib/dri/i915_dri.so                                     : 2610k
/usr/lib/openoffice/basis3.2/program/libfwkli.so             : 2655k
/usr/lib/dri/i965_dri.so                                     : 2701k
/usr/lib/vmware-tools/configurator/XOrg/7.5_64/vmwgfx_dri.so : 2706k
/usr/lib/vmware-tools/configurator/XOrg/7.6_64/vmwgfx_dri.so : 2730k
/var/lib/defoma/gs.d/dirs/fonts/UnDotumBold.ttf              : 2744k
/lib/defoma/x-ttcidfont-conf.d/dirs/TrueType/UnDotumBold.ttf : 2744k
/usr/share/fonts/truetype/unfonts/UnDotumBold.ttf            : 2744k
r/cache/apt/archives/libgl1-mesa-dri_7.7.1-1ubuntu3_i386.deb : 2799k
/usr/lib/openoffice/basis3.2/program/libtkli.so              : 2880k
usr/lib/vmware-tools/lib32/libconf/gtk-2.0/modules/libgail.a : 2894k
/boot/grub/unicode.pf2                                       : 2898k
/usr/share/grub/unicode.pf2                                  : 2898k
/usr/lib/libc.a                                              : 2960k
/var/cache/apt/archives/emacs23_23.1+1-4ubuntu7_i386.deb     : 2966k
/home/david/.dropbox-dist/libwx_gtk2ud_core-2.8.so.0         : 2968k
/usr/lib/xen/libc.a                                          : 3063k
ink/PxpMisc/CodeTests/c++/monitor_memory_allocation/test.ncb : 3147k
cache/apt/archives/evolution-common_2.28.3-0ubuntu10_all.deb : 3157k
us.archive.ubuntu.com_ubuntu_dists_lucid_main_source_Sources : 3169k
/usr/lib/gcc/i486-linux-gnu/4.4/libgcc.a                     : 3183k
/usr/lib/openoffice/basis3.2/program/libsvxli.so             : 3294k
/usr/lib/perl5/auto/Gtk2/Gtk2.so                             : 3323k
/usr/lib/libpython3.1.a                                      : 3361k
/apt/archives/gnome-do-plugins_0.8.2.1+dfsg-2ubuntu1_all.deb : 3371k
/usr/lib/libgtkmm-2.4.so.1.1.0                               : 3389k
/usr/lib/libgtkmm-2.4.so.1                                   : 3389k
/usr/lib/libgucharmap.so.7                                   : 3442k
/usr/lib/libgucharmap.so.7.0.0                               : 3442k
/var/cache/apt/archives/python3.1_3.1.2-0ubuntu2_i386.deb    : 3463k
/usr/share/hplip/data/pcl/colorcal1_450.pcl.gz               : 3506k
ives/openoffice.org-style-human_1%3a3.2.0-7ubuntu4.1_all.deb : 3584k
/var/lib/defoma/gs.d/dirs/fonts/UnBatang.ttf                 : 3592k
var/lib/defoma/x-ttcidfont-conf.d/dirs/TrueType/UnBatang.ttf : 3592k
/usr/share/fonts/truetype/unfonts/UnBatang.ttf               : 3592k
/var/cache/apt/archives/humanity-icon-theme_0.5.2.1_all.deb  : 3602k
/usr/lib/libflite_cmu_us_kal16.so.1                          : 3617k
/usr/lib/libflite_cmu_us_kal16.so.1.3                        : 3617k
/usr/lib/xen/libc_pic.a                                      : 3672k
/var/cache/apt/archives/libc6_2.11.1-0ubuntu7.1_i386.deb     : 3690k
/usr/lib/openoffice/basis3.2/program/libcuili.so             : 3886k
/cache/apt/archives/libavcodec52_4%3a0.5.1-1ubuntu1_i386.deb : 3904k
/usr/lib/libgtk-x11-2.0.so.0                                 : 3916k
/usr/lib/libgtk-x11-2.0.so.0.2000.1                          : 3916k
/usr/lib/libflite_cmu_time_awb.so.1                          : 3925k
/usr/lib/libflite_cmu_time_awb.so.1.3                        : 3925k
/vmlinuz.old                                                 : 3935k
/boot/vmlinuz-2.6.32-21-generic                              : 3935k
/boot/vmlinuz-2.6.32-22-generic                              : 3935k
/var/lib/defoma/gs.d/dirs/fonts/UnBatangBold.ttf             : 3977k
lib/defoma/x-ttcidfont-conf.d/dirs/TrueType/UnBatangBold.ttf : 3977k
/usr/share/fonts/truetype/unfonts/UnBatangBold.ttf           : 3977k
/var/lib/mlocate/mlocate.db                                  : 3981k
r/lib/vmware-tools/lib32/libgtkmm-2.4.so.1/libgtkmm-2.4.so.1 : 4057k
ache/apt/archives/gnome-screensaver_2.30.0-0ubuntu2_i386.deb : 4078k
/tmp/VMwareDnD/c3f93485/scripts/sa_utils/dist/library.zip    : 4115k
/usr/lib/openoffice/basis3.2/program/libsvtli.so             : 4124k
/usr/lib/openoffice/basis3.2/program/libsfxli.so             : 4257k
b/vmware-tools/lib32/libgtk-x11-2.0.so.0/libgtk-x11-2.0.so.0 : 4315k
/usr/lib/openoffice/basis3.2/program/libvclli.so             : 4336k
/usr/lib/vmware-tools/icu/icudt38l.dat                       : 4496k
t/archives/openoffice.org-calc_1%3a3.2.0-7ubuntu4.1_i386.deb : 4538k
/usr/lib/openoffice/basis3.2/program/libooxli.so             : 4627k
/usr/lib/openoffice/basis3.2/program/libxoli.so              : 4634k
b/vmware-tools/lib64/libgtk-x11-2.0.so.0/libgtk-x11-2.0.so.0 : 4673k
/usr/lib/libsmbclient.so.0                                   : 4698k
/var/cache/apt/archives/libc6-dev_2.11.1-0ubuntu7.1_i386.deb : 4726k
/var/cache/apt/archives/g++-4.4_4.4.3-4ubuntu5_i386.deb      : 4833k
/usr/lib/openoffice/basis3.2/program/libwriterfilterli.so    : 4839k
/usr/share/fonts/truetype/wqy/wqy-microhei.ttc               : 5056k
/usr/share/icons/hicolor/icon-theme.cache                    : 5181k
r/lib/vmware-tools/lib64/libgtkmm-2.4.so.1/libgtkmm-2.4.so.1 : 5247k
/usr/lib/i686/cmov/libavcodec.so.52                          : 5328k
/usr/lib/i686/cmov/libavcodec.so.52.20.1                     : 5328k
/usr/lib/libavcodec.so.52                                    : 5336k
/usr/lib/libavcodec.so.52.20.1                               : 5336k
/var/lib/openoffice/basis3.2/program/services.rdb            : 5408k
/usr/lib/openoffice/basis3.2/program/services.rdb            : 5408k
/cache/apt/archives/vim-runtime_2%3a7.2.330-1ubuntu3_all.deb : 5572k
/usr/share/openoffice/basis3.2/share/config/images_human.zip : 5827k
/usr/lib/openoffice/basis3.2/program/libsdli.so              : 5946k
/usr/bin/net.samba3                                          : 5949k
archives/openoffice.org-writer_1%3a3.2.0-7ubuntu4.1_i386.deb : 5983k
/etc/alternatives/ttf-japanese-gothic.ttf                    : 6088k
/usr/share/fonts/truetype/ttf-japanese-gothic.ttf            : 6088k
/usr/share/fonts/truetype/takao/TakaoPGothic.ttf             : 6088k
/usr/lib/openoffice/basis3.2/program/offapi.rdb              : 6368k
/usr/share/icons/gnome/icon-theme.cache                      : 7081k
ar/cache/apt/archives/libflite1_1.3-release-2build1_i386.deb : 7103k
aves_via_symlink/PxpMisc/CodeTests/c++/win_os_layer/test.ncb : 7243k
/initrd.img                                                  : 7778k
/boot/initrd.img-2.6.32-22-generic                           : 7778k
/initrd.img.old                                              : 7790k
/boot/initrd.img-2.6.32-21-generic                           : 7790k
/home/david/.dropbox-dist/library.zip                        : 7867k
hive.ubuntu.com_ubuntu_dists_lucid_main_binary-i386_Packages : 8396k
/usr/lib/libgs.so.8                                          : 8487k
/usr/lib/libgs.so.8.71                                       : 8487k
/usr/lib/openoffice/basis3.2/program/libsvxcoreli.so         : 9191k
/var/cache/cups/ppds.dat                                     : 9230k
/usr/lib/openoffice/basis3.2/program/libscli.so              : 9584k
he/apt/archives/linux-headers-2.6.32-22_2.6.32-22.36_all.deb : 9636k
/usr/lib/openoffice/basis3.2/program/libswli.so              : 11390k
/opt/google/chrome/libgcflashplayer.so                       : 11511k
avid/.mozilla/firefox/g9eit4ej.default/urlclassifier3.sqlite : 12472k
/usr/lib/xulrunner-1.9.2.3/libxul.so                         : 13332k
rchive.ubuntu.com_ubuntu_dists_lucid_universe_source_Sources : 13563k
/var/lib/apt-xapian-index/index.1/termlist.DB                : 13648k
/var/cache/apt/srcpkgcache.bin                               : 13956k
/var/cache/apt/pkgcache.bin                                  : 13976k
/usr/lib/firefox-3.6.3/libxul.so                             : 14113k
/usr/lib/libwebkit-1.0.so.2                                  : 14502k
/usr/lib/libwebkit-1.0.so.2.17.2                             : 14502k
/usr/lib/libicudata.so.42.1                                  : 15636k
/usr/lib/libicudata.so.42                                    : 15636k
/archives/openoffice.org-common_1%3a3.2.0-7ubuntu4.1_all.deb : 17854k
ar/cache/apt/archives/emacs23-common_23.1+1-4ubuntu7_all.deb : 20145k
.ubuntu.com_ubuntu_dists_lucid_universe_binary-i386_Packages : 26179k
t/archives/openoffice.org-core_1%3a3.2.0-7ubuntu4.1_i386.deb : 27214k
/var/cache/apt/archives/freepats_20060219-1_all.deb          : 28285k
archives/linux-image-2.6.32-22-generic_2.6.32-22.36_i386.deb : 30204k
pbox/misc_saves_via_symlink/PxpMisc/Backups/cygwin_backup.7z : 30643k
/var/lib/apt-xapian-index/index.1/postlist.DB                : 40624k


No so bad considering the biggest file is about 40Mb. But then again I seldom use this Linux machine (my windows machine has much more wear and tear).

Analyse Which Files Take the Most Space

I often need to either clean up space on a drive or to shrink a project / notes for archiving. One way to do it is to navigate blindly and to erase big files when found. But this process can be cumbersome. I wrote a small Python script that lists the 150 biggest files contained under the folder in which the script is executed.




#python 2.6, 3.1

import os, tempfile
import misc.search_files

class FileInfo:
  
  __FullPathFilename = None
  __FileSize = None
  
  def __init__(self, FullPathFilename_, FileSize_):
    self.__FullPathFilename = FullPathFilename_
    self.__FileSize = FileSize_
  
  def __lt__(self, other):
    return (self.__FileSize < other.__FileSize)
  
  def ToRow_NameAndSize(self, SepareAt = 100):
    FullPathStr = self.GetFilename()
    if len(FullPathStr) > SepareAt:
      FullPathStr = FullPathStr[len(FullPathStr)-SepareAt:]
    while len(FullPathStr) < SepareAt:
      FullPathStr+= " "
    return FullPathStr + " : " + str(self.GetFileSize()/1024) + 'k'

  def GetFilename(self):
    return self.__FullPathFilename
  
  def GetFileSize(self):
    return self.__FileSize

def GetBiggestFileList(SepareAt = 110, MaxNumFilesInReport = 150):
  lAllFilesListIncludingSubDirs = misc.search_files.getAllFilesRecursively(['*.*'], '.')
  
  TotalDiskSpace = 0
  AllFileInfo = []
  
  for file in lAllFilesListIncludingSubDirs:
    try:
      lFileSize = os.path.getsize(file)
      TotalDiskSpace += lFileSize
      AllFileInfo.append( FileInfo(file, lFileSize) )
    #@tag Give a better output of why the file exception occurred
    #except Exception as inst:
    except Exception:
      print ("Error! " + file)
  
  AllFileInfo.sort()
  
  if( len(AllFileInfo)>MaxNumFilesInReport ):
    subAllFileInfo = AllFileInfo[-MaxNumFilesInReport:]
  else:
    subAllFileInfo = AllFileInfo
  
  Report = ''
  Report+= 'Total size: ' + str(TotalDiskSpace/1024) + "k\n"
  for lFileInfo in subAllFileInfo:
    Report += str(lFileInfo.ToRow_NameAndSize(SepareAt)) + "\n"
  
  return Report

if __name__ == '__main__':
  print (GetBiggestFileList())




For example, here is the result of launching the command in c:\windows to find out which files take the most space in the OS.



python get_space_hoggers_report.py | tee report.txt


And here is the result:

Total size: 18528928k
orms\9c6fe9d44d22834993e9aa23cc9dc272\System.Windows.Forms.ni.dll : 12139k
31bf3856ad364e35_6.0.6001.18000_none_c0a3fbb5ef29fe27\Mahjong.dll : 12261k
31bf3856ad364e35_6.0.6002.18005_none_c28f74c1ec4bc973\Mahjong.dll : 12261k
orms\17e020ae92d7fab33bcc1c98b25019d0\System.Windows.Forms.ni.dll : 12701k
Entity\642a7b3d47828fb0070a55cfeb58f42b\System.Data.Entity.ni.dll : 12962k
load\41bec7591f57a2b41248a2c1d4189ab0\Windows6.0-KB944036-x86.cab : 13073k
c:\Windows\Fonts\gulim.ttc                                        : 13207k
6ad364e35_6.0.6000.16386_none_4355a8715fa423d5_gulim.ttc_7c526737 : 13207k
m_31bf3856ad364e35_6.0.6000.16386_none_4355a8715fa423d5\gulim.ttc : 13207k
s\System32\DriverStore\FileRepository\nvdj.inf_d1096b58\nvcpl.dll : 13234k
s\System32\DriverStore\FileRepository\nvdj.inf_e166b159\nvcpl.dll : 13234k
s\System32\DriverStore\FileRepository\nvdj.inf_f4eaea07\nvcpl.dll : 13234k
a_31bf3856ad364e35_6.0.6001.18000_none_03ed68ae2c4994ef\dicjp.bin : 13259k
c:\Windows\System32\xlivefnt.dll                                  : 13322k
c:\Windows\System32\nvcpl.dll                                     : 13363k
c:\Windows\Fonts\simsun.ttc                                       : 13424k
ad364e35_6.0.6000.16386_none_f8d25d0e72c3c090_simsun.ttc_eba56c14 : 13424k
_31bf3856ad364e35_6.0.6000.16386_none_f8d25d0e72c3c090\simsun.ttc : 13424k
d_31bf3856ad364e35_6.0.6000.16386_none_770bd33f8d44346e\ehcir.ird : 13575k
ache$\Managed\00002109030000000000000000F01FEC\12.0.4518\OART.DLL : 13819k
c:\Windows\System32\xlive.dll                                     : 13976k
wo#\b89f584d5b315c16d4e57e747158cb69\PresentationFramework.ni.dll : 13992k
wo#\0832f9155d800cb802e70409447c1128\PresentationFramework.ni.dll : 13993k
0319_32\mscorlib\246f1a5abb686b9dcdf22d3505b08cea\mscorlib.ni.dll : 14078k
c:\Windows\Fonts\msjhbd.ttf                                       : 14169k
ad364e35_6.0.6000.16386_none_5c79d760afbbb312_msjhbd.ttf_176cee86 : 14169k
_31bf3856ad364e35_6.0.6000.16386_none_5c79d760afbbb312\msjhbd.ttf : 14169k
c:\Windows\Logs\CBS\CBS.log                                       : 14280k
e$\Managed\00002109030000000000000000F01FEC\12.0.4518\XL12CNV.EXE : 14330k
_31bf3856ad364e35_6.0.6000.16386_none_0c8ed16bb707d3be\msyhbd.ttf : 14341k
c:\Windows\Fonts\msyhbd.ttf                                       : 14343k
ad364e35_6.0.6002.18005_none_10b10c73b114afde_msyhbd.ttf_16e5cd4d : 14343k
_31bf3856ad364e35_6.0.6002.18005_none_10b10c73b114afde\msyhbd.ttf : 14343k
c:\Windows\Fonts\msjh.ttf                                         : 14368k
56ad364e35_6.0.6000.16386_none_6309f686e329e15f_msjh.ttf_ea675e5c : 14368k
ei_31bf3856ad364e35_6.0.6000.16386_none_6309f686e329e15f\msjh.ttf : 14368k
c:\Windows\Fonts\msyh.ttf                                         : 14691k
ei_31bf3856ad364e35_6.0.6000.16386_none_389c8034332e39c5\msyh.ttf : 14691k
c:\Windows\IME\IMEJP10\DICTS\IMJPST.DIC                           : 14726k
_31bf3856ad364e35_6.0.6000.16386_none_7e4e5681ddf0010b\IMJPST.DIC : 14726k
c:\Windows\Installer\1aac20a.msp                                  : 14834k
c:\Windows\System32\nvoglv32.dll                                  : 14878k
ystem32\DriverStore\FileRepository\nvdj.inf_59384ced\nvoglv32.dll : 14878k
c:\Windows\Fonts\simsunb.ttf                                      : 15045k
d364e35_6.0.6000.16386_none_8ec3c7fa1f04c342_simsunb.ttf_08f71e3f : 15045k
31bf3856ad364e35_6.0.6000.16386_none_8ec3c7fa1f04c342\simsunb.ttf : 15045k
c:\Windows\Installer\24fce6d.msp                                  : 15342k
c:\Windows\IME\IMETC10\DICTS\IMTCS.IMD                            : 15444k
y_31bf3856ad364e35_6.0.6000.16386_none_8c1c51f402c169d0\IMTCS.IMD : 15444k
c:\Windows\System32\imageres.dll                                  : 15450k
364e35_6.0.6000.16386_none_da86e136fafaf563_imageres.dll_44f44625 : 15450k
1bf3856ad364e35_6.0.6000.16386_none_da86e136fafaf563\imageres.dll : 15450k
1FEC\12.0.4518\msmdlocal.dll.5DF9D670_534C_4AB2_B0C6_FF0B0C448C29 : 15489k
93892-1000\65AE474ADBD51814280308A67426AEF7\6.2.7000\Combi.04.psi : 15611k
c:\Windows\Fonts\batang.ttc                                       : 15883k
ad364e35_6.0.6000.16386_none_b5b2ca1d695fce16_batang.ttc_949601ce : 15883k
_31bf3856ad364e35_6.0.6000.16386_none_b5b2ca1d695fce16\batang.ttc : 15883k
ndows\System32\spool\drivers\w32x86\PCC\prnhp001.inf_2ade4966.cab : 16103k
c:\Windows\ehome\ehcir.ird                                        : 16170k
d_31bf3856ad364e35_6.0.6000.16663_none_771e77eb8d36a7fc\ehcir.ird : 16170k
d_31bf3856ad364e35_6.0.6000.20804_none_77e9f66ea622b69e\ehcir.ird : 16170k
d_31bf3856ad364e35_6.0.6001.18043_none_791a56698a4d010b\ehcir.ird : 16170k
d_31bf3856ad364e35_6.0.6001.22147_none_79a7f45ca3670631\ehcir.ird : 16170k
d_31bf3856ad364e35_6.0.6002.18005_none_7b2e0e478751108e\ehcir.ird : 16170k
c:\Windows\Fonts\meiryo.ttc                                       : 16318k
ad364e35_6.0.6002.18130_none_76259f2c44aeed75_meiryo.ttc_ab0401d6 : 16318k
_31bf3856ad364e35_6.0.6000.16945_none_72531e3e4a65a4dd\meiryo.ttc : 16318k
_31bf3856ad364e35_6.0.6000.21148_none_72df94096380c3ee\meiryo.ttc : 16318k
_31bf3856ad364e35_6.0.6001.18349_none_743d5e0c47889b2a\meiryo.ttc : 16318k
_31bf3856ad364e35_6.0.6001.22550_none_74b32a3760b66ffd\meiryo.ttc : 16318k
_31bf3856ad364e35_6.0.6002.18130_none_76259f2c44aeed75\meiryo.ttc : 16318k
_31bf3856ad364e35_6.0.6002.22252_none_769b9cb35ddaf7cf\meiryo.ttc : 16318k
c:\Windows\System32\wbem\Logs\WMITracing.log                      : 16384k
c:\Windows\System32\config\COMPONENTS.SAV                         : 16452k
Cache$\Managed\00002109030000000000000000F01FEC\12.0.4518\MSO.DLL : 16475k
c:\Windows\Fonts\meiryob.ttc                                      : 16757k
d364e35_6.0.6002.18130_none_cf13a97974e4cf1c_meiryob.ttc_d9ebd964 : 16757k
31bf3856ad364e35_6.0.6000.16945_none_cb41288b7a9b8684\meiryob.ttc : 16757k
31bf3856ad364e35_6.0.6000.21148_none_cbcd9e5693b6a595\meiryob.ttc : 16757k
31bf3856ad364e35_6.0.6001.18349_none_cd2b685977be7cd1\meiryob.ttc : 16757k
31bf3856ad364e35_6.0.6001.22550_none_cda1348490ec51a4\meiryob.ttc : 16757k
31bf3856ad364e35_6.0.6002.18130_none_cf13a97974e4cf1c\meiryob.ttc : 16757k
31bf3856ad364e35_6.0.6002.22252_none_cf89a7008e10d976\meiryob.ttc : 16757k
000\65AE474ADBD51814280308A67426AEF7\6.2.7000\Le_Petit_Druide.psi : 16829k
Model\52cbaee4e94489731096be5ecc320958\System.ServiceModel.ni.dll : 16996k
che$\Managed\00002109030000000000000000F01FEC\12.0.4518\WWLIB.DLL : 17073k
wo#\7f91eecda3ff7ce478146b6458580c98\PresentationFramework.ni.dll : 17216k
che$\Managed\00002109030000000000000000F01FEC\12.0.4518\EXCEL.EXE : 17471k
Model\250b525aa8c17327216e102569c0d766\System.ServiceModel.ni.dll : 17499k
c:\Windows\Installer\fe5e2c.msi                                   : 17755k
c:\Windows\System32\WDI\LogFiles\BootCKCL.etl                     : 17792k
c:\Windows\System32\IME\IMETC10\applets\MSHWCHTR.dll              : 19522k
1bf3856ad364e35_6.0.6001.18000_none_fb2914a7fb7f05d4\MSHWCHTR.dll : 19522k
1bf3856ad364e35_6.0.6002.18005_none_fd148db3f8a0d120\MSHWCHTR.dll : 19522k
1bf3856ad364e35_6.0.6001.18000_none_fd48368c658afbaa\mshwchtr.dll : 19522k
ache$\Managed\68AB67CA7DA73301B7449A0300000010\9.3.0\AcroRd32.dll : 19957k
_msige52\program files\Google\Google Earth\client\googleearth.exe : 20428k
c:\Windows\System32\winevt\Logs\Security.evtx                     : 20484k
c:\Windows\System32\winevt\Logs\System.evtx                       : 20484k
c:\Windows\Installer\1aac08e.msp                                  : 20889k
c:\Windows\System32\IME\IMEJP10\APPLETS\mshwjpnr.dll              : 20959k
1bf3856ad364e35_6.0.6000.16386_none_29bd61de3dbf60e5\mshwjpnr.dll : 20959k
1bf3856ad364e35_6.0.6001.18000_none_03ed68ae2c4994ef\mshwjpnr.dll : 20959k
c:\Windows\System32\IME\imekr8\applets\mshwkorr.dll               : 21316k
1bf3856ad364e35_6.0.6000.16386_none_4e1eb5b4af3fbd40\mshwkorr.dll : 21316k
1bf3856ad364e35_6.0.6001.18000_none_03ed2a082c4a1514\mshwkorr.dll : 21316k
1bf3856ad364e35_6.0.6001.18000_none_fd484d54658ae209\mshwchsr.dll : 21448k
c:\Windows\System32\wbem\repository\OBJECTS.DATA                  : 22528k
c:\Windows\Fonts\ARIALUNI.TTF                                     : 22730k
c:\Windows\Speech\Engines\SR\en-US\t1033.ngr                      : 22858k
_31bf3856ad364e35_6.0.6000.16386_en-us_cbfb04a3abf30016\t1033.ngr : 22858k
e52\program files\Google\Google Earth\plugin\googleearth_free.dll : 22880k
naged\00002109F10090400000000000F01FEC\12.0.4518\NLSDATA.DLL_1033 : 23818k
0002109030000000000000000F01FEC\12.0.4518\INSTALLED_RESOURCES.XSS : 24288k
c:\Windows\inf\setupapi.dev.log                                   : 24433k
c:\Windows\System32\config\RegBack\SYSTEM.OLD                     : 25552k
c:\Windows\Fonts\mingliu.ttc                                      : 26851k
31bf3856ad364e35_6.0.6000.16386_none_b8e3a7d58b1249ca\mingliu.ttc : 26851k
c:\Windows\Speech\Engines\SR\en-US\l1033.ngr                      : 27833k
_31bf3856ad364e35_6.0.6000.16386_en-us_cbfb04a3abf30016\l1033.ngr : 27833k
3856ad364e35_6.0.6001.18000_none_062b7e7afe71e492\PurblePlace.dll : 27994k
3856ad364e35_6.0.6002.18005_none_0816f786fb93afde\PurblePlace.dll : 27994k
s_31bf3856ad364e35_6.0.6001.18000_none_74d4a1cd7e673a2e\Chess.dll : 28321k
s_31bf3856ad364e35_6.0.6002.18005_none_76c01ad97b89057a\Chess.dll : 28321k
1bf3856ad364e35_6.0.6000.16386_none_0d44c2d7a6e22754\M1033DSK.CSD : 29099k
c:\Windows\System32\config\RegBack\COMPONENTS.OLD                 : 31148k
c:\Windows\System32\mrt.exe                                       : 31710k
c:\Windows\Fonts\mingliub.ttc                                     : 32999k
364e35_6.0.6000.16386_none_c6eae5a23b4a0d1e_mingliub.ttc_b8743970 : 32999k
1bf3856ad364e35_6.0.6000.16386_none_c6eae5a23b4a0d1e\mingliub.ttc : 32999k
32\DriverStore\FileRepository\nvdj.inf_05fd020f\NvCplSetupInt.exe : 37308k
93892-1000\65AE474ADBD51814280308A67426AEF7\6.2.7000\Combi.01.psi : 38261k
32\DriverStore\FileRepository\nvdj.inf_59384ced\NvCplSetupInt.exe : 39343k
c:\Windows\ehome\en-US\Intro.wmv                                  : 45166k
_31bf3856ad364e35_6.0.6000.16386_en-us_35933539ffce9bad\Intro.wmv : 45166k
c:\Windows\System32\config\RegBack\SOFTWARE.OLD                   : 45532k
5_6.0.6000.16386_none_3264f7ee9b82c6e1\Jewels of Caribbean.dvr-ms : 45830k
856ad364e35_6.0.6000.16386_none_3264f7ee9b82c6e1\Apollo 13.dvr-ms : 48902k
c:\Windows\Installer\4dac3f.msp                                   : 49713k
f3856ad364e35_6.0.6000.16386_none_3264f7ee9b82c6e1\Vertigo.dvr-ms : 51846k
c:\Windows\Speech\Engines\SR\en-GB\l2057.ngr                      : 55999k
_31bf3856ad364e35_6.0.6000.16386_en-gb_857893b11436ae5f\l2057.ngr : 55999k
c:\Windows\IME\IMESC5\DICTS\PINTLGT.IMD                           : 65408k
31bf3856ad364e35_6.0.6000.16386_none_b4aaff4041e28397\PINTLGT.IMD : 65408k
c:\Windows\Logs\CBS\CBS.persist.log                               : 68341k
c:\Windows\SoftwareDistribution\DataStore\DataStore.edb           : 77832k
c:\Windows\Installer\13ea212.msp                                  : 99335k
crosoft.NET\Framework\v4.0.30319\SetupCache\Client\netfx_core.mzz : 113164k
c:\Windows\winsxs\ManifestCache\6.0.6002.18005_001c11ba_blobs.bin : 188770k
c:\Windows\Installer\1aac1e6.msp                                  : 335018k

You can see that by clearing the last two files (which seem to just be cache files that were not deleted for whatever reason) one would free ~500Mb.

I bundled the Python script compiled as a win32 binary. It is for my own personal need when I am on a (windows) computer in a lab which does not have Python installed and I need to hunt down a few big files.

On a side note http://www.py2exe.org is an amazing tool that works quite well. It takes about 5 min to download / install / compile your script into a standalone windows application. For your reference here is the small script that uses py2exe:

#Launch with: python26 make_bin.py py2exeb
from distutils.core import setup
import py2exe

setup(
    console=['get_space_hoggers_report.py'],
    options={"py2exe":{"bundle_files":1}}    
    )

Happy hunting!