Winston Churchill Zu verbessern ist zu ändern, so perfekt zu sein ist, sich oft verändert haben. Jeder Einfluß, jedes Motiv, das den Geist des Mordes unter den Menschen hervorruft, macht diese Bergsteiger zu Taten von Verrat und Gewalt. Die starke Aboriginal-Tötungsneigung, die allen Menschen innewohnt, hat sich in diesen Tälern in beispielloser Kraft und Kraft bewahrt. Diese Religion, die vor allem durch das Schwert begründet und propagiert wurde, dessen Lehren und Prinzipien tödlich sind und die auf drei Kontinenten kämpfende Rassen von Menschen hervorbringen, stimuliert einen wilden und gnadenlosen Fanatismus. Die Liebe der Plünderung, die immer ein Merkmal der Hügelstämme ist, wird durch das Spektakel der Opulenz und des Luxus gefördert, die zu ihren Augen die Städte und Ebenen des Südens zeigen. Ein Ehrenkodex, der nicht weniger peinlich ist als der des alten Spanien, wird von Vendettas unterstützt, die so unversöhnlich sind wie die Korsika. Die Geschichte der Malakand-Feldstärke: Eine Episode des Grenzkrieges (1898), Kapitel I Beschreibung der Stammesgebiete des heutigen Pakistan. Allgemein bekannt als Waziristan Downloadable etext Version (en) dieses Buches finden Sie online bei Project Gutenberg Es ist, Gott sei Dank, schwierig, wenn nicht unmöglich für die moderne Europäer voll zu schätzen die Kraft, die Fanatismus unter einem ignoranten, kriegerisch und orientalisch ausübt Bevölkerung. Mehrere Generationen sind vergangen, seit die Völker des Westens das Schwert in religiöser Kontroverse gezogen haben, und die bösen Erinnerungen an die düstere Vergangenheit sind bald in dem starken, klaren Licht des Rationalismus und der menschlichen Sympathie verblasst. In der Tat ist es offensichtlich, daß das durch Grausamkeit und Intoleranz verschlechterte und verfälschte Christentum stets einen modifizierenden Einfluß auf die menschlichen Leidenschaften ausübt und sie vor den heftigeren Formen des fanatischen Fiebers schützt, da wir durch Impfung vor Pocken geschützt sind. Aber die mohammedanische Religion erhöht die Wut der Intoleranz, statt sie zu vermindern. Es wurde ursprünglich durch das Schwert propagiert, und seitdem haben seine Votare, über dem Volk aller anderen Glaubensbekenntnisse, dieser Form des Wahnsinns unterworfen. In einem Augenblick werden die Früchte der geduldigen Mühe, die Perspektiven des materiellen Wohlstands, die Angst vor dem Tod selbst verworfen. Die emotionaleren Pathanen sind widerstandsfähig. Alle rationalen Erwägungen sind vergessen. Wenn sie ihre Waffen greifen, werden sie Ghazisas gefährlich und so vernünftig wie wütende Hunde: passen nur, um als solche behandelt zu werden. Während die großzügigeren Geister unter den Stammesgenossen in einer Ekstase der religiösen Blutschuld erschüttert werden, leiten ärmere und materielle Seelen zusätzliche Impulse aus dem Einfluss anderer, aus der Hoffnung auf Plünderung und aus der Freude am Kämpfen. So werden ganze Völker zu den Waffen geweckt. So brechen die Türken ihre Feinde, die Araber des Sudan brechen die britischen Plätze, und der Aufstieg auf die indische Grenze breitet sich weit und breit aus. In jedem Fall ist die Zivilisation mit dem militanten Mahomedanismus konfrontiert. Die Kräfte des Fortschritts kollidieren mit denen der Reaktion. Die Religion von Blut und Krieg ist von Angesicht zu Angesicht mit dem des Friedens. Zum Glück ist die Religion des Friedens meist besser bewaffnet. Die Geschichte der Malakand Field Force: Eine Episode des Grenzkrieges (1898), Kapitel III. Ich gehe mit Erleichterung vom werfenden Meer von Ursache und Theorie zu dem festen Grund von Ergebnis und Tatsache über. Die Geschichte der Malakand Field Force: Eine Episode des Grenzkrieges (1898), Kapitel III. Es ist besser, die Nachrichten zu machen, als es als Schauspieler zu akzeptieren und nicht als Kritiker. Die Geschichte der Malakand Field Force: Eine Episode des Grenzkriegs (1898), Kapitel VIII. Nichts im Leben ist so berauschend, als auf ohne Ergebnis geschossen werden. Die Geschichte der Malakand-Feldkräfte: Eine Episode des Grenzkrieges (1898), Kapitel X. Wie schrecklich sind die Flüche, die der Mohammedanismus auf seine Votore legt. Neben der fanatischen Raserei, die bei einem Mann so gefährlich ist wie die Hydrophobie bei einem Hund Ist diese furchtbare fatalistische Apathie. Die Auswirkungen sind in vielen Ländern offensichtlich. Ungewohnte Gewohnheiten, schlanke Systeme der Landwirtschaft, schleppende Handelsmethoden und Unsicherheit des Eigentums bestehen überall dort, wo die Anhänger des Propheten herrschen oder leben. Ein degradierter Sensualismus beraubt dieses Leben seiner Gnade und Verfeinerung der nächsten seiner Würde und Heiligkeit. Die Tatsache, daß im mohammedanischen Recht jede Frau einem Mann als seinem absoluten Eigentum gehören muß, sei es als Kind, als Frau oder als Konkubine, muß das endgültige Erlöschen der Sklaverei verzögern, bis der Glaube des Islam aufgehört hat, eine große Macht zu sein Männer. Einzelne Moslems können prächtige Eigenschaften aufweisen. Tausende werden die tapferen und loyalen Soldaten der Königin alle wissen, wie man stirbt, aber der Einfluss der Religion lähmt die soziale Entwicklung derer, die ihr folgen. Keine stärkere retrograde Kraft existiert in der Welt. Weit davon entfernt, moribund zu sein, ist der Mohammedanismus ein militanter und proselytisierender Glaube. Es hat sich schon in ganz Zentralafrika verbreitet und hat bei jedem Schritt furchtlose Krieger geweckt und war es nicht so, daß das Christentum in den starken Waffen der Wissenschaft, der Wissenschaft, gegen die es vergeblich gekämpft hatte, geschützt war, konnte die Zivilisation des modernen Europa fallen, als die Zivilisation fiel Des alten Rom. Der Flußkrieg: Ein historisches Konto der Wiedereroberung des Sudan (1899), Band II S. 248250 (Diese Passage erscheint nicht im 1902 einbändigen Abstrich, die Version von Project Gutenberg.) Downloadable etext version (s) Dieses Buches finden Sie online bei Project Gutenberg Es ist die Gewohnheit der Boa constrictor, um den Körper seines Opfers mit einem faulen Schlamm zu beschmutzen, bevor er es verschlingt und es gibt viele Menschen in England und vielleicht anderswo, die scheinen, unfähig zu sein Um militärische Operationen für klare politische Gegenstände zu betrachten, es sei denn, sie können sich in den Glauben schlichten, daß ihr Feind völlig und hoffnungslos abscheulich sei. Zu diesem Zweck sind die Derwische, von den Mahdi und den Khalifa nach unten, mit allerlei Missbrauch geladen und mit allen denkbaren Verbrechen beladen. Dies kann für philanthropische Personen zu Hause sehr tröstlich sein, aber wenn eine Armee auf dem Feld von der Vorstellung, dass der Feind ein Ungeziefer ist, der die Erde verderbt, durchdrungen wird, kann es leicht zu Instanzen der Barbarei kommen. Diese ungemessene Verurteilung ist übrigens ebenso ungerecht wie gefährlich und unnötig. Der Flußkrieg: Ein historischer Bericht über die Wiedereroberung des Sudan (1899), Band II S. 394395 (Diese Passage erscheint nicht im 1902 einbändigen Abstrich, der Version von Project Gutenberg). Was ist die wahre und ursprüngliche Wurzel der niederländischen Abneigung gegen die britische Herrschaft Es ist die bleibende Angst und der Hass auf die Bewegung, die den Einheimischen auf eine Ebene mit dem Weißen platzieren will, dass der Kaffir zum Bruder des Europäers erklärt werden soll Seine gesetzliche gleich, mit politischen Rechten bewaffnet werden. Auf dem Burenkrieg. London nach Ladysmith über Pretoria (1900). Ich glaube, wir müssen die Chinesen in die Hand nehmen und sie regulieren. Ich glaube, dass, wenn zivilisierte Nationen mächtiger werden, sie rücksichtsloser werden, und die Zeit wird kommen, wenn die Welt ungeduldig die Existenz großer barbarischer Völker tragen wird, die jederzeit zivilisierte Nationen bewaffnen und bedrohen können. Ich glaube an die ultimative Partition von China Ich meine endgültig. Ich hoffe, wir werden es nicht in unserer Zeit tun müssen. Der arische Stamm soll triumphieren. Rede und Interview an der Universität von Michigan, 1902. In früheren Zeiten, als Kriege von einzelnen Ursachen, von der Politik eines Ministers oder der Leidenschaft eines Königs, als sie durch kleine regelmäßige Armeen der Berufssoldaten gekämpft wurden und wenn ihre Kurs wurde durch die Schwierigkeiten der Kommunikation und Versorgung verzögert, und oft von der Wintersaison suspendiert, war es möglich, die Verbindlichkeiten der Kombattanten zu begrenzen. Aber jetzt, wo mächtige Bevölkerungen aufeinandergetrieben werden, wird jeder einzelne, der sich barbarisch verbittert und entzündet, wenn die Ressourcen der Wissenschaft und der Zivilisation alles zerstreuen, was ihre Wut beeinträchtigen könnte, ein europäischer Krieg kann nur in der Ruine der Besiegten und der weniger Tödlichen enden Kommerzielle Dislokation und Erschöpfung der Eroberer. Demokratie ist rachsüchtiger als Kabinette. Die Kriege der Völker werden schrecklicher sein als die der Könige. Unterhaus, 13. Mai 1901, Hansard vol. 93 col. 1572. Die Fähigkeit, vorauszusagen, was passieren wird, morgen, nächste Woche, nächsten Monat und nächstes Jahr und die Fähigkeit, danach zu erklären, warum es nicht passieren. Zeitung Interview (1902), wenn gefragt, welche Qualitäten ein Politiker benötigt, Halle, Kay, unbändige Churchill. Cleveland: World, 1966. zitiert in Churchill von Himself (2008), Hrsg. Langworth, PublicAffairs, p. 489 ISBN 1586486381 Regierungen schaffen nichts und haben nichts zu geben, aber was sie zuerst weggenommen haben, können Sie Geld in die Taschen eines Satzes von Engländern setzen, aber es ist Geld, das von den Taschen eines anderen Satzes von Engländern genommen wird, und der größere Teil Wird auf dem Weg verschüttet werden. Jede Abstimmung, die für Schutz gegeben wird, ist eine Abstimmung, zum Regierungen das Recht zu geben, Peter zu entreißen, Paul zu zahlen und die öffentlichkeit eine hübsche Kommission auf dem Job aufzuladen. Warum ich ein freier Händler bin, Kapitel I in T. W. Steads Zeitschrift Coming Men on Coming Fragen (13. April 1905), unten p. 9. Die Lehren, dass, indem sie ausländische Waren mehr Reichtum, und damit mehr Beschäftigung, zu Hause geschaffen werden, sind entweder wahr oder sie sind nicht wahr. Wir behaupten, dass sie nicht wahr sind. Wir behaupten, dass für eine Nation zu versuchen, sich in Wohlstand steuern ist wie ein Mann, der in einem Eimer und versucht, sich durch den Griff zu heben. 1: 9 Von dem, warum ich ein freier Händler bin (1905), überarbeitete Churchill dies mehrmals, die früheste aufgenommene Version, die von der Rede zum freien Handel in der Freien Handelshalle, Manchester, am 19. Februar 1904 kam: Es ist die Theorie des Protektionismus Dass Importe ein Übel sind. Er denkt, wenn Sie die ausländischen importierten Fertigwaren ausschließen, werden Sie diese Waren zusätzlich zu den Waren herstellen, die Sie jetzt herstellen, einschließlich jener Waren, die wir für die ausländischen Waren, die hereinkommen, tauschen. Wenn ein Mann glauben kann Dass er alles glauben kann. (Lachen.) Wir Freie Händler sagen, es ist nicht wahr. Zu denken, Sie können einen Mann reicher machen, indem Sie auf eine Steuer ist wie ein Mann denken, dass er in einem Eimer stehen und heben sich durch den Griff. (Lachen und Jubel.) 2: Vol. I: 261 Politik ist fast so spannend wie Krieg und ebenso gefährlich Im Krieg können Sie nur einmal getötet werden. Aber in der Politik viele Male. Von einem Gespräch Austausch mit Harold Begbie, wie in Master Workers zitiert. Begbie, Methuen amp Co. (1906), p. 177. Meinerseits habe ich immer gefühlt, daß ein Politiker von den Feindseligkeiten, die er bei seinen Gegnern erregt, beurteilt werden soll. Ich habe mich immer nur darauf eingestellt, nicht nur gnädig zu sein, sondern meine Tadeln zu verdienen. 17. November 1906, Institut für Journalisten Dinner, London in Churchill von Himself (2008), ed. Langworth, PublicAffairs, p. 392 ISBN 1586486381 Die Bedingungen der Transvaal-Verordnung, unter der die chinesische Arbeit nun durchgeführt wird, stellen meiner Meinung nach nicht einen Zustand der Sklaverei dar. Ein Arbeitsvertrag, in den Männer freiwillig für eine begrenzte und für eine kurze Zeit eintreten, unter der sie Löhne zahlen, die sie für angemessen halten, unter denen sie nicht gekauft oder verkauft werden und von denen sie eine Entlastung von siebenzehn Pfund zehn Schilling erhalten können , Die Kosten ihrer Passage, dürfen kein gesunder oder ordnungsgemäßer Vertrag sein, können aber nicht der Meinung Seiner Majestätregierung als Sklaverei in der extremen Akzeptanz des Wortes ohne ein gewisses Risiko einer terminologischen Unausgewogenheit eingestuft werden. Im Unterhaus. 22. Februar 1906 Kings Rede (Bewegung für eine Adresse). Als Untersekretär des Kolonialamtes. Was er während des Wahlkampfs von 1906 gesagt hatte. Dies ist der ursprüngliche Kontext für terminologische Ungenauigkeit. Buchstäblich verwendet, während später der Begriff auf den Sinn eines Euphemismus oder Umschweife für eine Lüge nahm. Wie in Sayings of the Century (1984) von Nigel Rees zitiert. Ich unterbreche dem Haus grundsätzlich respektvoll das Prinzip, dass unsere Verantwortung in dieser Angelegenheit direkt proportional zu unserer Macht ist. Wo große Macht herrscht, herrscht große Verantwortung. Wo es weniger Macht gibt es weniger Verantwortung, und wo es keine Macht gibt, kann ich, glaube ich, keine Verantwortung. Im Unterhaus. 28. Februar 1906 Rede Südafrikanische Eingeborenenrassen Die Times ist sprachlos und nimmt drei Spalten, um ihre Sprachlosigkeit auszudrücken. Rede in Kinnaird Hall, Dundee, Schottland (Die Dundee-Wahl), 14. Mai 1908, in Liberalismus und Soziales Problem (1909), Churchill, BiblioBazaar (Zweite Auflage, 2006), p. 148 ISBN 1426451989 Was ist der Gebrauch des Lebens, wenn es nicht ist, nach edlen Ursachen zu streben und diese verworrene Welt zu einem besseren Platz für die zu machen, die darin leben werden, nachdem wir gegangen sind Wie können wir uns sonst in eine harmonische Beziehung mit dem Große Wahrheiten und Tröstungen des Unendlichen und des Ewigen Und ich bekenne meinen Glauben, dass wir zu besseren Tagen marschieren. Die Menschheit wird nicht niedergeschlagen. Wir schwingen tapfer vorwärts entlang der großen hohen Straße und schon hinter den fernen Bergen ist das Versprechen der Sonne. Rede in Kinnaird Hall, Dundee, Schottland (Arbeitslosigkeit), 10. Oktober 1908, in Liberalismus und Soziales Problem (1909), Churchill, Echo Library (2007), p. 87 ISBN 1406845817 Das unnatürliche und zunehmend rasche Wachstum der schwachsinnigen und wahnsinnigen Klassen, gepaart mit einer stetigen Beschränkung unter allen sparsamen, energischen und überlegenen Beständen, stellt eine nationale und rassische Gefahr dar, die nicht übertrieben werden kann. Ich fühle, dass die Quelle, aus der der Strom des Wahnsinns gefüttert wird, abgeschnitten und versiegelt werden sollte, bevor ein weiteres Jahr vergangen ist. (Home Secretary) Churchill an Premierminister Asquith über die obligatorische Sterilisation der Schwachsinnigen und Irren, wie folgt zitiert (Auszug aus längerer Notiz). Es ist erwähnenswert, dass die Eugenik keine fransige Bewegung von obskuren Wissenschaftlern war, sondern oft in Großbritannien und Amerika von einigen der prominentesten Persönlichkeiten des Tages über die politische Kluft geführt und unterstützt wurde, wie Julian Huxley, Aldous Huxley, DH Lawrence, John Maynard Keynes und Theodore Roosevelt. Tatsächlich . Kein anderer als Winston Churchill. Während der Innenminister im Jahre 1910 die folgende Bemerkung machte: Text des Zitats (zitiert in Jones, 1994: 9). . In Rasse, Sport und in der britischen Gesellschaft (2001), Carrington amp McDonald, Routledge, Introduction, Note 4, p. 20 ISBN 0415246296 Ich schlage vor, dass 100.000 entartete Briten gewaltsam sterilisiert und andere in Arbeitslager gebracht werden sollen, um den Niedergang der britischen Rasse zu stoppen. Als Innenminister im Jahr 1910. Das Originaldokument ist in der Sammlung der Asquiths Papiere an der Bodleian Bibliothek in Oxford. Auch in Clive Ponting zitiert. Churchill (Sinclair Stevenson, 1994). Alles tendiert zur Katastrophe und zum Zusammenbruch. Ich bin interessiert, ausgerichtet und glücklich. Ist es nicht schrecklich, so gemacht zu werden In einem Brief an seine Frau Clemmie, während des Aufbaus bis zum Ersten Weltkrieg. Wie jagen eine Chinin-Pille um eine Kuh Weide. Auf Golf spielen. Wie zitiert in The quote verifier: wer was, wo und wann (2006), Keyes, Macmillan, p. 27 ISBN 0312340044 Klar bin ich davon, dass du nur zu erobern hast zu erobern. Ihr müsst nur beharren, euch zu retten und alle zu retten, die auf euch vertrauen. Man muss nur nach rechts gehen, und am Ende der Straße, sei es kurz oder lang, werden Sieg und Ehre gefunden. Bemerkungen an der Guildhall, 4. September 1914, nach dem ersten britischen Seesieg des Ersten Weltkriegs, dem Untergang von drei deutschen Kreuzern in der Schlacht von Helgoland Bight. Wie zitiert in Churchill: Ein Leben. Martin Gilbert, Macmillan (1992), p. 279. ISBN 0805023968 Ich bin fertig. Auf seine Position bei der Admiralität im Jahre 1915 zu verlieren. Said zu Lord Riddell. Wie in Maxims und Reflections zitiert. Kapitel I (über sich), Churchill, Houghton Mifflin Company (1947). Die Wahrheit ist unbestreitbar. Panik kann es ärgern, Ignoranz kann es verleugnen, Bosheit kann es verzerren, aber da ist es. Rede im Unterhaus, 17. Mai 1916 Königliche Zustimmung. Ich denke, ein Fluch sollte auf mir ruhen, weil ich diesen Krieg liebe. Ich weiß, seine zerschmettert und zerschmettert das Leben von Tausenden jeden Augenblick und doch kann ich nicht helfen, dass ich jede Sekunde davon genieße. Ein Brief an einen Freund (1916). Kein Kompromiß über den Hauptzweck Kein Frieden bis zum Sieg kein Pakt mit unbußfertigem Unrecht - das ist die Deklaration vom 4. Juli 1918. Auf einer gemeinsamen anglo-amerikanischen Rallye in Westminster, 4. Juli 1918, im Gespräch gegen Forderungen nach einem ausgehandelten Waffenstillstand mit Deutschland. Wie gedruckt in Kriegsziele amp Frieden Ideale: Auswahl in Prosa Ampere Vers (1919), bearbeitet von Tucker Brooke amp Henry Seidel Canby, Yale University Press, p. 138. Der Große Krieg unterscheidet sich von allen alten Kriegen in der ungeheuren Macht der Kombattanten und ihrer ängstlichen Zerstörungsorganisationen und von allen modernen Kriegen in der völligen Unbarmherzigkeit, mit der sie gekämpft wurde. Europa und große Teile Asiens und Afrikas wurden ein riesiges Schlachtfeld, auf dem nach Jahren des Kampfes nicht Armeen, sondern Nationen brachen und rannten. Als alles vorbei war, waren Folter und Kannibalismus die einzigen zwei Mittel, die die zivilisierten, wissenschaftlichen, christlichen Staaten in der Lage gewesen waren, sich zu verleugnen: und sie waren von zweifelhaftem Nutzen. Von Der Weltkrise, 19111918. Kapitel I (Die Fläschchen des Zorns), Churchill, Butterworth (1923). Man könnte auch die Sodomie legalisieren, um die Bolschewiki zu erkennen. Paris, 24. Januar 1919. Churchill: Ein Leben. Gilbert, Martin (1992). New York: Holt, p. 408. ISBN 9780805023961 Ich verstehe nicht, diese Zimtheit über die Verwendung von Gas. Wir haben definitiv die Position auf der Friedenskonferenz verabschiedet, die für die Beibehaltung von Gas als dauerhafte Kriegsführung spricht. Es ist die bloße Affektation, einen Mann mit dem giftigen Bruchstück einer zerspringenden Schale zu zerreißen und seine Augen mit Hilfe von Thränengas zu versauen. Ich bin stark dafür, vergiftetes Gas gegen unzivilisierte Stämme zu verwenden. Die moralische Wirkung sollte so gut sein, dass der Verlust des Lebens auf ein Minimum reduziert werden sollte. Es ist nicht notwendig, nur die tödlichsten Gase zu verwenden: Gase können verwendet werden, die große Unannehmlichkeiten verursachen und einen lebhaften Terror verbreiten würden und doch keine ernsthaften dauerhaften Auswirkungen auf die meisten Betroffenen hinterlassen würden. Wir können in keinem Fall die Nicht - Die zur Verfügung stehen, um eine baldige Beendigung der an der Grenze herrschenden Krankheit zu beschaffen. Statement als Präsident des Air Council, Krieg Büro Abteilung Minute (1919-05-12) Churchill Papers 1616, Churchill Archives Center. Cambridge. Viele argumentieren, dass Zitate aus dieser Passage häufig aus dem Zusammenhang genommen werden, weil Churchill zwischen nicht-tödlichen Agenten und den tödlichen Gasen unterscheidet, die im Ersten Weltkrieg verwendet wurden, und betonte die Verwendung von nicht-tödlichen Waffen, aber Churchill lässt den Gebrauch nicht klar aus Der tödlichen Gase, nur darauf hingewiesen, dass es nicht notwendig, nur die tödlichsten verwenden. Manchmal wird behauptet, dass Gas viele junge und ältere Kurden und Araber getötet habe, als die RAF im Irak 1920 während der britischen Okkupation rebellierte Dörfer im Irak bombardierte. Weitere Informationen finden Sie unter Gas in Mesopotamien. Lenin wurde von den Deutschen nach Rußland geschickt, so daß man eine Phiole mit einer Typhus - oder Cholera-Kultur schicken konnte, um in die Wasserversorgung einer großen Stadt gegossen zu werden, und sie arbeitete mit erstaunlicher Genauigkeit. Auf Wladimir Iljitsch Lenin. Im House of Commons, 5. November 1919, wie in Churchill von ihm selbst (2008), Ed. Langworth, PublicAffairs, p. 355 ISBN 1586486381 Zuerst gibt es die Juden, die in jedem Land der Welt wohnen, sich mit diesem Land identifizieren, in ihr nationales Leben eintreten und sich treu in ihrer eigenen Religion als Bürger im vollsten Sinne des Staates ansehen Die sie erhalten hat. Ein solcher in England lebender Jude würde sagen, ich bin ein Engländer, der den jüdischen Glauben praktiziert. Dies ist eine würdige Empfängnis und nützlich im höchsten Grade. Wir wissen in Großbritannien sehr wohl, daß während des großen Kampfes der Einfluß dessen, was man in vielen Ländern nationale Juden nennen könnte, überwiegend auf der Seite der Alliierten geherrscht wurde und in unserer eigenen Armee jüdische Soldaten einen bedeutendsten Teil gespielt haben Die das Viktorianische Kreuz für die Wertschätzung gewonnen haben. Es ist nicht nötig, die Rolle, die in der Schöpfung des Bolschewismus gespielt wurde, und die tatsächliche Herbeiführung der russischen Revolution durch diese internationalen und zumeist atheistischen Juden zu übertreiben Ist sicherlich eine sehr große es wahrscheinlich überwiegt alle anderen. Mit der bemerkenswerten Ausnahme von Lenin ist die Mehrheit der führenden Persönlichkeiten Juden. Darüber hinaus kommt die Hauptinspiration und Antriebskraft von den jüdischen Führern. So wird Tchitcherin, ein reiner Russe, von seinem nominellen untergeordneten Litvinoff in den Schatten gestellt, und der Einfluß von Russen wie Bucharin oder Lunacharski kann nicht mit der Macht von Trotzki oder von Zinovieff, dem Diktator der Roten Zitadelle (Petrograd) oder von Krassin oder Radek - alle Juden. In den sowjetischen Institutionen ist die Vorherrschaft der Juden noch erstaunlicher. Und der prominente, wenn nicht gar der wichtigste Teil des von den außerordentlichen Kommissionen zur Bekämpfung der Gegenrevolution angewandten Terrorismus ist von Juden und in einigen bemerkenswerten Fällen von Jüdinnen getroffen worden. In der kurzen Zeit des Terrors, in der Bela Kun in Ungarn herrschte, wurde die gleiche böse Bedeutung von Juden erreicht. Dasselbe Phänomen wurde in Deutschland (besonders in Bayern) vorgestellt, soweit dieser Wahnsinn erlaubt ist, die vorübergehende Niederwerfung des deutschen Volkes zu verhindern. Obwohl es in allen diesen Ländern viele Nichtjuden gibt, die so schlecht sind wie die schlechtesten jüdischen Revolutionäre, ist die Rolle, die diese im Verhältnis zu ihrer Zahl in der Bevölkerung spielt, erstaunlich. Zionismus versus Bolschewismus, Illustrierte Sonntagsverkünder (Februar 1920) (Anmerkung: Churchill betrachtete den Bolschewismus als ein stark jüdisches Phänomen und verglich die jüdische Rolle bei der Schaffung des Bolschewismus mit einer positiveren Sicht auf die Rolle, die die Juden in England gespielt hatten ). Die Pläne der internationalen Juden. Die Anhänger dieser finsteren Konföderation sind vor allem Männer, die unter den unglücklichen Bevölkerungen der Länder aufgezogen sind, in denen Juden wegen ihrer Rasse verfolgt werden. Die meisten, wenn nicht alle, haben den Glauben ihrer Vorfahren verlassen und sich von allen geistigen Hoffnungen der nächsten Welt getrennt. Diese Bewegung unter den Juden ist nicht neu. Von den Tagen des Spartacus-Weishaupt bis zu denen von Karl Marx und bis hin zu Trotzki (Russland), Bela Kun (Ungarn), Rosa Luxemburg (Deutschland) und Emma Goldman (USA), diese weltweite Verschwörung zum Sturz Zivilisation und für die Wiederherstellung der Gesellschaft auf der Grundlage der verhafteten Entwicklung, der neidischen Bösartigkeit und der unmöglichen Gleichheit, hat stetig zugenommen. Es spielte, wie ein moderner Schriftsteller, Mrs. Webster, so geschickt gezeigt, ein definitiv erkennbarer Teil in der Tragödie der Französischen Revolution. Es war die Triebfeder jeder subversiven Bewegung im 19. Jahrhundert und nun endlich hat diese Band außerordentlicher Persönlichkeiten aus der Unterwelt der großen Städte Europas und Amerikas das russische Volk an den Haaren ihres Kopfes ergriffen und ist praktisch das Unbestrittene geworden Meister dieses riesigen Imperiums. Rt. Hon Winston Churchill Der Bolschewismus gegen den Zionismus ist ein Kampf für die Seele des jüdischen Volkes in der Illustrierten Tageszeitung vom 8. Februar 1920. Wir können jedoch auf die Schwierigkeiten von General Dyer während der Amritsar-Unruhen, auf die ängstliche und kritische Lage im Punjab, Gefahr für die Europäer in der ganzen Provinz, eine gewaltige Tatsache hebt sich hervor, ich meine die Schlachtung von fast 400 Personen und die Verwundung von wahrscheinlich drei bis viermal so viele, am Jallian Wallah Bagh am 13. April. Das ist eine Episode, die mir in der modernen Geschichte des britischen Empire ohne Präzedenzfall oder Parallele erscheint. Es ist ein außergewöhnliches Ereignis, ein monströses Ereignis, ein Ereignis, das in einzigartiger und finsterer Isolation steht. Rede im Unterhaus, 8. Juli 1920 Amritsar zu der Zeit, Churchill diente als Außenminister für den Krieg unter Premierminister David Lloyd George Männer, die Waffen gegen den Staat aufnehmen müssen, müssen in jedem Augenblick gefeuert werden. Männer, die illegal Arme aufnehmen, können nicht erwarten, dass die Truppen warten, bis sie bereit sind, den Konflikt zu beginnen. Rede im Unterhaus, 8. Juli 1920 Amritsar zu der Zeit, Churchill diente als Außenminister für den Krieg unter Premierminister David Lloyd George Frightfulness ist kein Heilmittel, das der britischen Pharmacopaeia bekannt ist. Rede im Unterhaus am 8. Juli 1920 Amritsar zu der Zeit, Churchill diente als Außenminister für den Krieg unter Premierminister David Lloyd George Ich gebe niemandem in meiner Abscheu des Bolschewismus und der revolutionären Gewalt, die ihm vorausgeht . Aber mein Hass auf Bolschewismus und Bolschewiken ist nicht auf ihrem dummen System der Ökonomie oder ihrer absurden Lehre von einer unmöglichen Gleichheit gegründet. Sie entspringt dem blutigen und verheerenden Terrorismus, den sie in jedem Lande üben, in das sie gebrochen haben und durch das allein ihr Verbrechen regiert werden kann. Regierungen, die durch Gewalt und durch Usurpation an die Macht gegriffen haben, haben oft in ihren verzweifelten Bemühungen, das zu behalten, was sie gestohlen haben, zum Terrorismus zurückgegriffen, aber die erhabene und ehrwürdige Struktur des britischen Imperiums braucht diese Hilfe nicht. Solche Ideen sind der britischen Art, Dinge zu tun, absolut fremd. Rede im Unterhaus, 8. Juli 1920 Amritsar Lassen Sie mich die Tatsachen marschieren. Die Menge war unbewaffnet, außer mit Knüppeln. Es war kein Angriff auf irgendjemanden oder irgendetwas. Es hielt ein seditives Treffen. Als das Feuer auf sie gelegt wurde, um es zu zerstreuen, versuchte es, davonzulaufen. An einem engen Platz, der deutlich kleiner war als der Trafalgar-Platz, mit kaum möglichen Ausgängen und zusammengepackt, so dass eine Kugel durch drei oder vier Körper fahren würde, verliefen die Menschen verrückt auf diese Weise und die andere. Als das Feuer auf das Zentrum gerichtet war, liefen sie zu den Seiten. Das Feuer war dann auf die Seiten gerichtet. Viele warfen sich auf den Boden, und das Feuer wurde dann auf den Boden gerichtet. Dies wurde für 8 oder 10 Minuten fortgesetzt. Wenn die Straße nicht so eng gewesen wäre, hätten sich die Maschinengewehre und die Panzerwagen angeschlossen. Schließlich, als die Munition den Punkt erreicht hatte, dass nur noch genug übrig blieb, um die sichere Rückkehr der Truppen zu ermöglichen, und nach 379 Personen Getötet, und als am sichersten 1200 oder mehr verwundet worden waren, schwangen die Truppen, an denen nicht einmal ein Stein geworfen worden war, herum und marschierten fort. Wir müssen es absolut klar machen, dass dies nicht die britische Art ist, Geschäfte zu machen. Unsere Herrschaft, in Indien oder anderswo, stand nie auf der Grundlage der physischen Kraft allein, und es wäre tödlich für das britische Empire, wenn wir versuchen würden, uns nur darauf zu stützen. Rede im Unterhaus, 8. Juli 1920 Amritsar Ich kann nicht behaupten, sich unparteiisch über die Farben zu fühlen. Ich freue mich mit den glänzenden, und bin wirklich leid für die armen Braunen. In der Malerei als Zeitvertreib, zuerst veröffentlicht im Strand-Magazin in zwei Teilen (Dezember 1921Januar 1922), zitiert in Churchill durch Sich selbst (2008), Hrsg. Langworth, PublicAffairs, p. 456 ISBN 1586486381 Er sollte gebundene Hand und Fuß an den Toren von Delhi gelegen sein und dann zertrampelt von einem riesigen Elefanten mit dem neuen Vizekönig auf seinem Rücken sitzend. Unter Bezugnahme auf Mahatma Gandhi im Gespräch mit Edwin Montagu, Staatssekretär für Indien, 1921. 3 4 Jeden Tag können Sie Fortschritte machen. Jeder Schritt kann fruchtbar sein. Dennoch wird es sich erstrecken, bevor Sie eine immer-verlängern, immer aufsteigenden, immer besseren Weg. Sie wissen, dass Sie nie bis zum Ende der Reise erhalten. Aber dies, so weit von Entmutigung, fügt nur der Freude und dem Ruhm des Aufstiegs hinzu. In der Malerei als Zeitvertreib, das Strand Magazine (Dezember 1921Januar 1922), zitiert in Churchill von ihm (2008), Hrsg. Langworth, PublicAffairs, p. 568 ISBN 1586486381 Ich bin sehr besorgt, dass ich im Umgang mit Angelegenheiten, die jedes Mitglied äußerst heikle Angelegenheiten kennt, keinen Ausdruck oder Satz verwenden sollte, der unsere Freunde und Alliierten auf dem Kontinent oder über dem Atlantischen Ozean verletzt. Über die interalliierten Schulden im Unterhaus (10. Dezember 1924) berichtet in den Parlamentsdebatten (Commons) (1925), 5. Reihe, vol. 179, col. 259. Die Wahl war eindeutig offen: Zerschmettere sie mit eitler und unbestreuter Kraft, oder versuche, ihnen das zu geben, was sie wollen. Dies waren die einzigen Alternativen, und obwohl jeder eine glühende Befürworter hatte, waren die meisten Menschen auch nicht vorbereitet. Hier war tatsächlich das irische Gespenst schrecklich und unerbittlich. Die Weltkrise, Band V. Die Nachwirkungen (1929), Churchill, Butterworth (London). Keine Stunde des Lebens ist verloren, die in den Sattel verbracht wird. Mein frühes Leben, 18741904 (1930), Churchill, Winston S. p. 45 (1996 Touchstone Edition), ISBN 0684823454 Möglicherweise eine Bombe nicht größer als eine Orange gefunden werden, um eine geheime Macht zu besitzen, um einen ganzen Block von Gebäuden zu zerstören, um die Kraft von tausend Tonnen Kordit zu konzentrieren und eine Gemeinde auf einen Schlag zu sprengen. Pall Mall Gazette (1924) auf HG Wells Vorschlag einer Atombombe, in BBC Artikel Zu oft ist der starke, stille Mann schweigt nur, weil er nicht weiß, was zu sagen, und ist angeblich stark, nur weil er schweigt. Winston S. Churchill: Seine vollständigen Reden (1974), Chelsea House, Band IV: 19221928, p. 3462 ISBN 0835206939 Ich lehne völlig unparteiisch zwischen Feuerwehr und Feuer ab. Rede im Unterhaus, 7. Juli 1926 Emergency Services. Auf die Kritik, dass er das britische Gazette in einer voreingenommenen Weise während des Generalstreiks bearbeitet. Wie zitiert in The Yale Book of Quotations (2006), Hrsg. Fred R. Shapiro, Yale University Press, p. 152 ISBN 0300107986 Machen Sie Ihren Verstand vollkommen klar, dass, wenn immer Sie loslassen auf uns wieder einen Generalstreik, wir lose auf Ihnen eine andere britische Gazette. Rede im Unterhaus, 7. Juli 1926 Notdienste in dieser Zeit, war Churchill als Kanzler des Excheqer unter Premierminister Stanley Baldwin. Bedrohung der Labour Party und Gewerkschaftsbewegung mit einer Rückkehr der Regierung veröffentlicht Zeitung, die er während dieser Mays General Strike bearbeitet. Wenn ich ein Italiener gewesen wäre, so wäre ich ganz gewiß von Anfang an bis zum Ende des Siegeskampfes gegen die bestialischen Begierden und Leidenschaften des Leninismus gewesen. Zu Benito Mussolini in einer Pressekonferenz in Rom (Januar 1927), wie in Churchill zitiert. Ein Leben (1992) von Martin Gilbert. Ein Schaf in sheeps Kleidung. Auf Ramsay MacDonald. Dies wird oft als Bezugnahme auf Clement Attlee genommen. Aber Scottish Historiker D. W. Brogan ist in Safires Political Dictionary (2008), William Safire zitiert. Oxford Universität Presse US, p. 352 ISBN 0195343344 as follows: Sir Winston Churchill never said of Clement Attlee that he was a sheep in sheeps clothing. I have this on the excellent authority of Sir Winston himself. The phrase was totally inapplicable to Mr. Attlee. It was applicable, and applied, to J. Ramsay MacDonald, a very different kind of Labour leader. To improve is to change, so to be perfect is to have changed often. Winston Churchill, His complete speeches, 18971963, edited by Robert Rhodes James, Chelsea House ed. Vol. 4 (19221928), p. 3706. Lors dun dbat avec Philipp Snowden, chancelier de lEchiquier, propos des droits de douane sur la soie. Often misquoted as: To improve is to change, to be perfect is to change often. An infected Russia, a plague-bearing Russia a Russia of armed hordes not only smiting with bayonet and with cannon, but accompanied and preceded by swarms of typhus-bearing vermin which slew the bodies of men, and political doctrines which destroyed the health and even the souls of nations. The Aftermath . by Winston Churchill (published 1929), p. 274 My Early Life: A Roving Commission (1930) Edit She shone for me like the Evening Star. I loved her dearly but at a distance. On his mother, Lady Randolph Churchill, Chapter 1 (Childhood). Where my reason, imagination or interest were not engaged, I would not or I could not learn. Chapter 1 (Childhood). Thus I got into my bones the essential structure of the ordinary British sentence, which is a noble thing. On studying English rather than Latin at school, Chapter 2 (Harrow). Headmasters have powers at their disposal with which Prime Ministers have never yet been invested. Chapter 2 (Harrow). Mr. Gladstone read Homer for fun, which I thought served him right. Chapter 2 (Harrow). I then had one of the three or four long intimate conversations with him which are all I can boast. On his father, Lord Randolph Churchill, Chapter 3 (Examinations). In retrospect these years form not only the least agreeable, but the only barren and unhappy period of my life. I was happy as a child with my toys in my nursery. I have been happier every year since I became a man. But this interlude of school makes a sombre grey patch upon the chart of my journey. It was an unending spell of worries that did not then seem petty, of toil uncheered by fruition a time of discomfort, restriction and purposeless monotony. This train of thought must not lead me to exaggerate the character of my school days Harrow was a very good school Most of the boys were very happy I can only record the fact that, no doubt through my own shortcomings, I was an exception. I was on the whole considerably discouraged All my contemporaries and even younger boys seemed in every way better adapted to the conditions of our little world. They were far better both at the games and at the lessons. It is not pleasant to feel oneself so completely outclassed and left behind at the very beginning of the race. Chapter 3 (Examinations). Certainly the prolonged education indispensable to the progress of Society is not natural to mankind. It cuts against the grain. A boy would like to follow his father in pursuit of food or prey. He would like to be doing serviceable things so far as his utmost strength allowed. He would like to be earning wages however small to help to keep up the home. He would like to have some leisure of his own to use or misuse as he pleased. He would ask little more than the right to work or starve. And then perhaps in the evenings a real love of learning would come to those who are worthy and why try to stuff in those who are not and knowledge and thought would open the magic casements of the mind. Chapter 3 (Examinations). I had a feeling once about Mathematics, that I saw it allDepth beyond depth was revealed to methe Byss and the Abyss. I saw, as one might see the transit of Venusor even the Lord Mayors Show, a quantity passing through infinity and changing its sign from plus to minus. I saw exactly how it happened and why the tergiversation was inevitable: and how the one step involved all the others. It was like politics. But it was after dinner and I let it go Chapter 3 (Examinations), p. 27. Although always prepared for martyrdom, I preferred that it should be postponed. Chapter 4 (Sandhurst), p. 72. You will make all kinds of mistakes but as long as you are generous and true, and also fierce, you cannot hurt the world or even seriously distress her. Chapter 4 (Sandhurst). I wonder whether any other generation has seen such astounding revolutions of data and values as those through which we have lived. Scarcely anything material or established which I was brought up to believe was permanent and vital, has lasted. Everything I was sure or taught to be sure was impossible, has happened. Chapter 5 (The Fourth Hussars). I have no doubt that the Romans planned the time-table of their days far better than we do. They rose before the sun at all seasons. Except in wartime we never see the dawn. Sometimes we see sunset. The message of sunset is sadness the message of dawn is hope. The rest and the spell of sleep in the middle of the day refresh the human frame far more than a long night. We were not made by Nature to work, or even play, from eight oclock in the morning till midnight. We throw a strain upon our system which is unfair and improvident. For every purpose of business or pleasure, mental or physical, we ought to break our days and our marches into two. Chapter 6 (Cuba). I do think unpunctuality is a vile habit, and all my life I have tried to break myself of it. Chapter 7 (Hounslow). I now began for the first time to envy those young cubs at the university who had fine scholars to tell them what was what professors who had devoted their lives to mastering and focusing ideas in every branch of learning who were eager to distribute the treasures they had gathered before they were overtaken by the night. But now I pity undergraduates, when I see what frivolous lives many of them lead in the midst of precious fleeting opportunity. After all, a mans Life must be nailed to a cross either of Thought or Action. Without work there is no play. Chapter 9 (Education At Bangalore). I accumulated in those years so fine a surplus in the Book of Observance that I have been drawing confidently upon it ever since. Chapter 9 (Education At Bangalore). It is a good thing for an uneducated man to read books of quotations. Bartletts Familiar Quotations is an admirable work, and I studied it intently. The quotations when engraved upon the memory give you good thoughts. They also make you anxious to read the authors and look for more. Chapter 9 (Education At Bangalore). I had been brought up and trained to have the utmost contempt for people who got drunk and I would have liked to have the boozing scholars of the Universities wheeled into line and properly chastised for their squalid misuse of what I must ever regard as a gift of the gods. Chapter 10 (The Malakand Field Force). Never, never, never believe any war will be smooth and easy, or that anyone who embarks on the strange voyage can measure the tides and hurricanes he will encounter. The statesman who yields to war fever must realise that once the signal is given, he is no longer the master of policy but the slave of unforeseeable and uncontrollable events. Antiquated War Offices, weak, incompetent, or arrogant Commanders, untrustworthy allies, hostile neutrals, malignant Fortune, ugly surprises, awful miscalculations all take their seats at the Council Board on the morrow of a declaration of war. Always remember, however sure you are that you could easily win, that there would not be a war if the other man did not think he also had a chance. Chapter 18 (With Buller To The Cape), p. 246 Quoted in This Time Its Our War (2003) by Leonard Fein in The Forward (July 25. 2003 ). The 1930s Edit After annexation Zaolzie (part of Czechoslovakia) by Poland in October 1938 Poland is a greedy hyena of Europe. I remember, when I was a child, being taken to the celebrated Barnums circus, which contained an exhibition of freaks and monstrosities. But the exhibit on the programme which I most desired to see was the one described as The Boneless Wonder. My parents judged that that spectacle would be too revolting and demoralising for my youthful eyes, and I have waited 50 years to see the boneless wonder sitting on the Treasury Bench. A jibe at Prime Minister (and First Lord of the Treasury ) Ramsay MacDonald during a speech in the House of Commons, January 28, 1931 Trade Disputes and Trade Unions (Amendment) Bill . India is a geographical term. It is no more a united nation than the equator. Speech at Royal Albert Hall, London (18 March 1931). It is alarming and also nauseating to see Mr. Gandhi. a seditious Middle Temple lawyer of the type well-known in the East, now posing as a fakir. striding half naked up the steps of the Viceregal palace to parley on equal terms with the representative of the King-Emperor. Comment on Gandhis meeting with the Viceroy of India. addressing the Council of the West Essex Unionist Association (23 February 1931) as quoted in Mr Churchill on India in The Times (24 February 1931). We shall escape the absurdity of growing a whole chicken in order to eat the breast or wing, by growing these parts separately under a suitable medium. Fifty Years Hence, The Strand Magazine (December 1931). We are stripped bare by the curse of plenty. Lecture at Cleveland, Ohio (February 3, 1932), reported in Robert Rhodes James, ed. Winston S. Churchill: His Complete Speeches, 18971963 (1974), vol. 5, p. 5130 referring to the theory that over-production caused the Depression. We know that he has, more than any other man, the gift of compressing the largest number of words into the smallest amount of thought. A jibe directed at Ramsay MacDonald. during a speech in the House of Commons, March 23, 1933 European Situation. This quote is similar to a remark (He can compress the most words into the smallest ideas of any man I ever met) made by Abraham Lincoln. Frederick Trevor Hill credits Lincoln with this remark in Lincoln the Lawyer (1906), adding that History has considerately sheltered the identity of the victim. One may dislike Hitlers system and yet admire his patriotic achievement. If our country were defeated, I hope we should find a champion as indomitable to restore our courage and lead us back to our place among the nations. Hitler and His Choice, The Strand Magazine (November 1935). We cannot tell whether Hitler will be the man who will once again let loose upon the world another war in which civilisation will irretrievably succumb, or whether he will go down in history as the man who restored honour and peace of mind to the Great Germanic nation. Hitler and His Choice, The Strand Magazine (November 1935). Mr. Gandhi has gone very high in my esteem since he stood up for the untouchables I do not care whether you are more or less loyal to Great Britain Tell Mr. Gandhi to use the powers that are offered and make the thing a success. Letter to G. D. Birla (1935) published in Winston S. Churchill, Volume Five: The Coming of War 19221939 (1979) by Sir Martin Gilbert The world looks with some awe upon a man who appears unconcernedly indifferent to home, money, comfort, rank, or even power and fame. The world feels not without a certain apprehension, that here is some one outside its jurisdiction someone before whom its allurements may be spread in vain some one strangely enfranchised, untamed, untrammelled by convention, moving independent of the ordinary currents of human action. At an unveiling of a memorial to T. E. Lawrence at the Oxford High School for Boys (3 October 1936) as quoted in Lawrence of Arabia: The Authorized Biography of T. E. Lawrence (1989) by Jeremy M Wilson. Occasionally he stumbled over the truth, but hastily picked himself up and hurried on as if nothing had happened. On Stanley Baldwin. as cited in Churchill by Himself (2008), Ed. Langworth, PublicAffairs, p. 322 ISBN 1586486381 Also quoted by Kay Halle in Irrepressible Churchill: A Treasury of Winston Churchills Wit (1966). Anyone can see what the position is. The Government simply cannot make up their mind, or they cannot get the Prime Minister to make up his mind. So they go on in strange paradox, decided only to be undecided, resolved to be irresolute, adamant for drift, solid for fluidity, all powerful to be impotent. So we go on preparing more months and years precious, perhaps vital to the greatness of Britain for the locusts to eat. Speech in the House of Commons, November 12, 1936 Debate on the Address. criticizing the Government of Stanley Baldwin for its conciliatory stance toward Hitler . The era of procrastination, of half-measures, of soothing and baffling expedients, of delays, is coming to its close. In its place we are entering a period of consequences. Speech in the House of Commons, November 12, 1936 Debate on the Address Cited in Al Gore s documentary An Inconvenient Truth This speech is also commonly known by the name The Locust Years . Courage is rightly esteemed the first of human qualities, because, as has been said, it is the quality which guarantees all others. In Great Contemporaries. Alfonso XIII (1937). The essence and foundation of House of Commons debating is formal conversation. The set speech, the harangue addressed to constituents, or to the wider public out of doors, has never succeeded much in our small wisely-built chamber. To do any good you have got to get down to grips with the subject and in human touch with the audience. In Great Contemporaries . Clemenceau (1937). Whatever one may think about democratic government, it is just as well to have practical experience of its rough and slatternly foundations. No part of the education of a politician is more indispensable than the fighting of elections. In Great Contemporaries . Lord Rosebery (1937). I do not agree that the dog in a manger has the final right to the manger even though he may have lain there for a very long time. I do not admit that right. I do not admit for instance, that a great wrong has been done to the Red Indians of America or the black people of Australia. I do not admit that a wrong has been done to these people by the fact that a stronger race, a higher-grade race, a more worldly wise race to put it that way, has come in and taken their place. To the Peel Commission (1937) on a Jewish Homeland in Palestine. Dictators ride to and fro on tigers from which they dare not dismount. And the tigers are getting hungry. Armistice - or Peace, published in The Evening Standard (11 November 1937). For five years I have talked to the House on these matters not with very great success. I have watched this famous island descending incontinently, fecklessly, the stairway which leads to a dark gulf. It is a fine broad stairway at the beginning, but after a bit the carpet ends. A little farther on there are only flagstones, and a little farther on still these break beneath your feet. Look back upon the last five years since, that is to say, Germany began to rearm in earnest and openly to seek revenge historians a thousand years hence will still be baffled by the mystery of our affairs. They will never understand how it was that a victorious nation, with everything in hand, suffered themselves to be brought low, and to cast away all that they had gained by measureless sacrifice and absolute victory gone with the wind Now the victors are the vanquished, and those who threw down their arms in the field and sued for an armistice are striding on to world mastery. That is the position that is the terrible transformation that has taken place bit by bit. Speech in the House of Commons (24 March 1938) Foreign Affairs and Rearmament. 12 days after the Anschluss (the Nazi annexation of Austria). Our loyal, brave people should know the truth. they should know that we have sustained a defeat without a war, and that the terrible words have for the time being been pronounced against the Western democracies Thou art weighed in the balance and found wanting. And do not suppose that this is the end. This is only the beginning of the reckoning. This is only the first sip, the first foretaste of a bitter cup which will be proferred to us year by year unless by a supreme recovery of moral health and martial vigour, we arise again and take our stand for freedom as in the olden time. Speech in the House of Commons (5 October 1938) Policy of His Majestys Government. a week after the announcement of the Munich Accords. The stations of uncensored expression are closing down the lights are going out but there is still time for those to whom freedom and parliamentary government mean something, to consult together. Let me, then, speak in truth and earnestness while time remains. Winston Churchill . in The Defence of Freedom and Peace (The Lights are Going Out), radio broadcast to the United States and to London (16 October 1938) . People say we ought not to allow ourselves to be drawn into a theoretical antagonism between Nazidom and democracy but the antagonism is here now. It is this very conflict of spiritual and moral ideas which gives the free countries a great part of their strength. You see these dictators on their pedestals, surrounded by the bayonets of their soldiers and the truncheons of their police. On all sides they are guarded by masses of armed men, cannons, aeroplanes, fortifications, and the like they boast and vaunt themselves before the world, yet in their hearts there is unspoken fear. They are afraid of words and thoughts words spoken abroad, thoughts stirring at home all the more powerful because forbidden terrify them. A little mouse of thought appears in the room, and even the mightiest potentates are thrown into panic. They make frantic efforts to bar our thoughts and words they are afraid of the workings of the human mind. Cannons, airplanes, they can manufacture in large quantities but how are they to quell the natural promptings of human nature, which after all these centuries of trial and progress has inherited a whole armoury of potent and indestructible knowledge Winston Churchill . in The Defence of Freedom and Peace (The Lights are Going Out), radio broadcast to the United States and to London (16 October 1938). I have always said that if Great Britain were defeated in war I hoped we should find a Hitler to lead us back to our rightful position among the nations. I am sorry, however, that he has not been mellowed by the great success that has attended him. The whole world would rejoice to see the Hitler of peace and tolerance, and nothing would adorn his name in world history so much as acts of magnanimity and of mercy and of pity to the forlorn and friendless, to the weak and poor. Let this great man search his own heart and conscience before he accuses anyone of being a warmonger. Mr. Churchills Reply in The Times (7 November 1938). Britain and France had to choose between war and dishonour. They chose dishonour. They will have war. To Neville Chamberlain in the House of Commons, after the Munich accords (1938). The Second World War (19391945) Edit Winston Churchill addressing a joint session of the United States Congress, May 1943. I cannot forecast to you the action of Russia. It is a riddle wrapped in a mystery inside an enigma . but perhaps there is a key. That key is Russian national interest. BBC broadcast (The Russian Enigma), London, October 1, 1939 (partial text. transcript of the First Month of War speech ). First, Poland has been again overrun by two of the great powers which held her in bondage for 150 years but were unable to quench the spirit of the Polish nation. The heroic defense of Warsaw shows that the soul of Poland is indestructible, and that she will rise again like a rock which may for a spell be submerged by a tidal wave but which remains a rock. BBC broadcast (The Russian Enigma), London, October 1, 1939 (First Month of War (excerpt). transcript of the full text ). I would say to the House, as I said to those who have joined this Government: I have nothing to offer but blood, toil, tears, and sweat. We have before us an ordeal of the most grievous kind. We have before us many, many long months of struggle and of suffering. You ask, what is our policy I will say: It is to wage war, by sea, land and air, with all our might and with all the strength that God can give us: to wage war against a monstrous tyranny, never surpassed in the dark, lamentable catalogue of human crime. That is our policy. You ask, what is our aim I can answer in one word: It is victory, victory at all costs, victory in spite of all terror, victory, however long and hard the road may be for without victory, there is no survival. Speech in the House of Commons. after taking office as Prime Minister (13 May 1940) This has often been misquoted in the form: I have nothing to offer but blood, sweat and tears. The Official Report, House of Commons (5th Series), 13 May 1940, vol. 360, c. 1502. Audio records of the speech do spare out the It is before the in the beginning of the Victory-Part. Side by side the British and French peoples have advanced to rescue mankind from the foulest and most soul-destroying tyranny which has ever darkened and stained the pages of history. Behind them gather a group of shattered States and bludgeoned races: the Czechs, the Poles, the Norwegians, the Danes, the Dutch, the Belgians -- upon all of whom the long night of barbarism will descend, unbroken even by a star of hope, unless we conquer, as conquer we must as conquer we shall. Radio broadcast, Be Ye Men of Valour. May 19, 1940 (partial text ). Every morn brought forth a noble chance, and every chance brought forth a noble knight. Speech in the House of Commons, June 4, 1940 passage praising the airmen of the Royal Air Force and their efforts during the evacuation of Dunkirk. This is a close paraphrase of Tennyson: When every morning brought a noble chance, And every chance brought out a noble knight. Alfred Tennyson. Morte dArthur . stanza 23 (1842), and the expanded The Passing of Arthur , stanza 36 in Idylls of the King (18561885) Wikisource has original text related to: We shall not flag or fail. We shall go on to the end, we shall fight in France, we shall fight on the seas and oceans, we shall fight with growing confidence and growing strength in the air, we shall defend our Island, whatever the cost may be, we shall fight on the beaches, we shall fight on the landing grounds, we shall fight in the fields and in the streets, we shall fight in the hills we shall never surrender . and even if, which I do not for a moment believe, this Island or a large part of it were subjugated and starving, then our Empire beyond the seas, armed and guarded by the British Fleet, would carry on the struggle, until, in Gods good time, the New World, with all its power and might, steps forth to the rescue and the liberation of the Old. Speech in the House of Commons (4 June 1940). Bearing ourselves humbly before God we await undismayed the impending assault be the ordeal sharp or long, or both, we shall seek no terms, we shall tolerate no parlay we may show mercy we shall ask for none. BBC Broadcast, London, July 14, 1940 War of the Unknown Warriors . Of this I am quite sure, that if we open a quarrel between the past and the present, we shall find that we have lost the future. Speech in the House of Commons, June 18, 1940 War Situation . Upon this battle depends the survival of Christian civilisation. Upon it depends our own British life and the long continuity of our institutions and our Empire. The whole fury and might of the enemy must very soon be turned on us now. Hitler knows that he will have to break us in this island or lose the war. If we can stand up to him, all Europe may be free and the life of the world may move forward into broad, sunlit uplands. But if we fail, then the whole world, including the United States, including all that we have known and cared for, will sink into the abyss of a new Dark Age, made more sinister, and perhaps more protracted, by the lights of perverted science. Let us therefore brace ourselves to our duties, and so bear ourselves that, if the British Empire and its Commonwealth last for a thousand years, men will still say, This was their finest hour. Speech in the House of Commons, June 18, 1940 War Situation . The gratitude of every home in our Island, in our Empire, and indeed throughout the world, except in the abodes of the guilty, goes out to the British airmen who, undaunted by odds, unwearied in their constant challenge and mortal danger, are turning the tide of the World War by their prowess and by their devotion. Never in the field of human conflict was so much owed by so many to so few. All hearts go out to the fighter pilots, whose brilliant actions we see with our own eyes day after day but we must never forget that all the time, night after night, month after month, our bomber squadrons travel far into Germany, find their targets in the darkness by the highest navigational skill, aim their attacks, often under the heaviest fire, often with serious loss, with deliberate careful discrimination, and inflict shattering blows upon the whole of the technical and war-making structure of the Nazi power. Speech in the House of Commons. also known as The Few , made on 20 August 1940. However Churchill first made his comment, Never in the field of human conflict was so much owed by so many to so few to General Hastings Ismay as they got into their car to leave RAF Uxbridge on 16 August 1940 after monitoring the battle from the Operations Room. Farewell to RAF Uxbridge. Global Aviation Resource ( 6 April 2010 ). Retrieved on 12 September 2010. Crozier, Hazel. RAF Uxbridge 90th Anniversary 19172007 . RAF High Wycombe: Air Command Media Services. Churchill repeated the quote in a speech to Parliament four days later complimenting the pilots in the Royal Air Force during the Battle of Britain. The speech in the House of Commons is often incorrectly cited as the origin of the popular phrase never was so much owed by so many to so few . Queen Elizabeth II during her speech in Polish Parliament 26.03.1996 said that Churchill said so few about unforgettable and brave Polish pilots from Battle of Britain. We are waiting for the long-promised invasion. So are the fishes. Radio broadcast, London, Dieu Protge La France God protect France , October 21, 1940 (partial text ). Goodnight then: sleep to gather strength for the morning. For the morning will come. Brightly will it shine on the brave and true, kindly upon all who suffer for the cause, glorious upon the tombs of heroes. Thus will shine the dawn. Vive la France Long live also the forward march of the common people in all the lands towards their just and true inheritance, and towards the broader and fuller age. Radio broadcast, London, Dieu Protge La France God protect France , October 21, 1940 (partial text ). These cruel, wanton, indiscriminate bombings of London are, of course, a part of Hitlers invasion plans. He hopes, by killing large numbers of civilians, and women and children, that he will terrorise and cow the people of this mighty imperial city Little does he know the spirit of the British nation, or the tough fibre of the Londoners. Radio broadcast during the London Blitz, September 11, 1940. Quoted by Martin Gilbert in Churchill: A Life . Macmillan (1992), p. 675 ISBN 0805023968 We do not covet anything from any nation except their respect. Radio broadcast to German occupied. Vichy. and Free France (21 October 1940) The hour has come kill the Hun. How Churchill said he would end his speech if Germany invaded Britain (John Colville s diary entry for January 25, 1941). In The Churchill War Papers. 1941 (1993), ed. Gilbert, W. W. Norton, pp. 132133 ISBN 0393019594 Here is the answer which I will give to President Roosevelt: Put your confidence in us. We shall not fail or falter we shall not weaken or tire. Neither the sudden shock of battle, nor the long-drawn trials of vigilance and exertion will wear us down. Give us the tools and we will finish the job. BBC radio broadcast, February 9, 1941. In The Churchill War Papers. 1941 (1993), ed. Gilbert, W. W. Norton, pp. 199200 ISBN 0393019594 I must point out that the British nation is unique in this respect. They are the only people who like to be told how bad things are, who like to be told the worst, and like to be told that they are very likely to get much worse in the future and must prepare themselves for further reverses. Speech in the House of Commons, June 10, 1941 Defence of Crete. in The Churchill War Papers. 1941 (1993), ChurchillGilbert, Norton, p. 785 ISBN 0393019594 . If Hitler invaded Hell, I would make at least a favourable reference to the devil in the House of Commons. To his personal secretary John Colville the evening before Operation Barbarossa. the German invasion of the Soviet Union. As quoted by Andrew Nagorski in The Greatest Battle (2007), Simon amp Schuster, pp. 150151 ISBN 0743281101 Hitler is a monster of wickedness, insatiable in his lust for blood and plunder. Not content with having all Europe under his heel, or else terrorised into various forms of abject submission, he must now carry his work of butchery and desolation among the vast multitudes of Russia and of Asia. The terrible military machine which we and the rest of the civilised world so foolishly, so supinely, so insensately allowed the Nazi gangsters to build up year by year from almost nothing cannot stand idle lest it rust or fall to pieces. So now this bloodthirsty guttersnipe must launch his mechanized armies upon new fields of slaughter, pillage and devastation. Radio broadcast on the German invasion of Russia, June 22, 1941. In The Churchill War Papers. 1941 (1993), W. W. Norton, pp. 835836 ISBN 0393019594 We ask no favours of the enemy. We seek from them no compunction. On the contrary, if tonight the people of London were asked to cast their votes as to whether a convention should be entered into to stop the bombing of all cities, an overwhelming majority would cry, No, we will mete out to the Germans the measure, and more than the measure, they have meted out to us. The people of London with one voice would say to Hitler: You have committed every crime under the sun. Where you have been the least resisted there you have been the most brutal. It was you who began the indiscriminate bombing. We remember Warsaw In the first few days of the war. We remember Rotterdam. We have been newly reminded of your habits by the hideous massacre in Belgrade. We know too well the bestial assaults youre making upon the Russian people, to whom our hearts go out in their valiant struggle We will have no truce or parley with you, or the grisly gang who work your wicked will You do your worst and we will do our best Perhaps it may be our turn soon. Perhaps it may be our turn now. July 14. 1941. in a speech before the London County Council. The original can be found in Churchills The Unrelenting Struggle (English edition 187 American edition 182) or in the Complete Speeches VI:6448. Never give in never, never, never, never, in nothing great or small, large or petty, never give in except to convictions of honour and good sense. Never yield to force never yield to the apparently overwhelming might of the enemy. Speech given at Harrow School. Harrow, England, October 29, 1941. Quoted in Churchill by Himself (2008), ed. Langworth, PublicAffairs, 2008, p. 23 ISBN 1586486381 We have not journeyed all this way across the centuries, across the oceans, across the mountains, across the prairies, because we are made of sugar candy. Speech before Joint Session of the Canadian Parliament, Ottawa (December 30. 1941 ) The Yale Book of Quotations . ed. Fred R. Shapiro, Yale University Press (2006), p. 153 ISBN 0300107986 When we consider the resources of the United States and the British Empire compared to those of Japan, when we remember those of China, which has so long and valiantly withstood invasion and when also we observe the Russian menace which hangs over Japan, it becomes still more difficult to reconcile Japanese action with prudence or even with sanity. What kind of a people do they think we are Is it possible they do not realise that we shall never cease to persevere against them until they have been taught a lesson which they and the world will never forget Members of the Senate and members of the House of Representatives, I turn for one moment more from the turmoil and convulsions of the present to the broader basis of the future. Here we are together facing a group of mighty foes who seek our ruin here we are together defending all that to free men is dear. Twice in a single generation the catastrophe of world war has fallen upon us twice in our lifetime has the long arm of fate reached across the ocean to bring the United States into the forefront of the battle. If we had kept together after the last War, if we had taken common measures for our safety, this renewal of the curse need never have fallen upon us. Do we not owe it to ourselves, to our children, to mankind tormented, to make sure that these catastrophes shall not engulf us for the third time Speech to a joint session of the United States Congress, Washington, D. C. (26 December 1941) . It is not given to us to peer into the mysteries of the future. Still, I avow my hope and faith, sure and inviolate, that in the days to come the British and American peoples will for their own safety and for the good of all walk together side by side in majesty, in justice, and in peace. Ending of the Speech to a joint session of the United States Congress, Washington, D. C. (26 December 1941) reported in Winston S. Churchill: His Complete Speeches, 18971963 . ed. Robert Rhodes James (1974), vol. 6, p. 6541. The Congressional Record reports that this speech was followed by Prolonged applause, the Members of the Senate and their guests rising Congressional Record . Vol. 87, p. 10119. When I warned them that Britain would fight on alone whatever they did, their generals told their Prime Minister and his divided Cabinet, In three weeks England will have her neck wrung like a chicken. Some chicken Some neck Reference to the French government speech before Joint Session of the Canadian Parliament, Ottawa (December 30. 1941 ) The Yale Book of Quotations . ed. Fred R. Shapiro, Yale University Press (2006), p. 153 ISBN 0300107986 The most dangerous moment of the War, and the one which caused me the greatest alarm . was when the Japanese Fleet was heading for Ceylon and the naval base there. The capture of Ceylon, the consequent control of the Indian Ocean, and the possibility at the same time of a German conquest of Egypt would have closed the ring and the future would have been black. Quote about the (April 5, 1942) Easter Sunday Raid on Colombo, Ceylon (Sri Lanka). From a conversation at the British Embassy, Washington D. C. as described by Leonard Birchall. RCAF, in Battle for the Skies (2004), Michael Paterson, David amp Charles, ISBN 0715318152 It was an experience of great interest to me to meet Premier Stalin It is very fortunate for Russia in her agony to have this great rugged war chief at her head. He is a man of massive outstanding personality, suited to the sombre and stormy times in which his life has been cast a man of inexhaustible courage and will-power and a man direct and even blunt in speech, which, having been brought up in the House of Commons, I do not mind at all, especially when I have something to say of my own. Above all, he is a man with that saving sense of humour which is of high importance to all men and all nations, but particularly to great men and great nations. Stalin also left upon me the impression of a deep, cool wisdom and a complete absence of illusions of any kind. I believe I made him feel that we were good and faithful comrades in this war but that, after all, is a matter which deeds not words will prove. Speech in the House of Commons, September 8, 1942 War Situation . I hate Indians. They are a beastly people with a beastly religion . In conversation to Leo Amery. Secretary of State for India. This quotation is widely cited as written in a letter to Leo Amery (e. g. in Jolly Good Fellows and Their Nasty Ways by Vinay Lal in Times of India (15 January 2007)) but it is actually attributed to Churchill as a remark, in an entry for September 1942 in Leo Amery. Diaries (1988), edited John Barnes and David Nicholson, p. 832. During my talk with Winston he burst out with: I hate Indians. They are a beastly people with a beastly religion. Now this is not the end. It is not even the beginning of the end. But it is, perhaps, the end of the beginning. speech at Lord Mayors Luncheon, Mansion House, London, November 10. 1942. (partial text ) Referring to the British victory over the German Afrika Korps at the Second Battle of El Alamein in Egypt. The problems of victory are more agreeable than those of defeat, but they are no less difficult. Speech in the House of Commons. November 11, 1942 Debate on the address . I have not become the Kings First Minister in order to preside over the liquidation of the British Empire. speech at Lord Mayors Luncheon, Mansion House, London, November 10. 1942 The Yale Book of Quotations . ed. Fred R. Shapiro, Yale University Press (2006), p. 153 ISBN 0300107986 Before Alamein we never had a victory. After Alamein, we never had a defeat. The Second World War, Volume IV. The Hinge of Fate (1951) Chapter 33 (The Battle of Alamein) BBC News story on the 60th anniversary of Alamein . The maxim Nothing avails but perfection may be spelt shorter: Paralysis. Minute brief note to General Ismay, December 6. 1942. on proposed improvements to landing-craft. In The Second World War, Volume IV. The Hinge of Fate (1951), Appendix C. I am sure it would be sensible to restrict as much as possible the work of these gentlemen, who are capable of doing an immense amount of harm with what may very easily degenerate into charlatanry. The tightest hand should be kept over them, and they should not be allowed to quarter themselves in large numbers among Fighting Services at the public expense. On psychiatrists, in a letter to John Anderson. Lord President of the Council (December 19, 1942) In The Second World War, Volume IV. The Hinge of Fate (1951), Appendix C. There is no finer investment for any community than putting milk into babies. Radio broadcast (March 21, 1943), cited in Churchill by Himself (2008), ed. Langworth, PublicAffairs, p. 21 ISBN 1586486381 By its sudden collapse, the proud German army has once again proved the truth of the saying, The Hun is always either at your throat or at your feet. Speech before a Joint Session of Congress (May 19, 1943), Washington, D. C. in Never Give In. The best of Winston Churchills Speeches (2003), Hyperion, p. 352 ISBN 1401300561 The empires of the future are the empires of the mind. Speech at Harvard University. September 6, 1943, in The Oxford Dictionary of Quotations (1999), Knowles amp Partington, Oxford University Press, p. 215 ISBN 0198601735 To achieve the extirpation of Nazi tyranny there are no lengths of violence to which we will not go. Speech to Parliament, September 21, 1943. Quoted in Churchill, Hitler, and the Unnecessary War (2008) by Patrick J Buchanan. P. 396. I have nothing to add to the reply which has already been sent. Response to Dundee Council after refusing to expand on his reasons for not accepting the Freedom of the City Memo (October 27. 1943 ). I hate nobody except Hitler and that is professional. Churchill to John Colville during WWII, quoted by Colville in his book The Churchillians (1981) ISBN 0297779095 Everyone is in favour of free speech. Hardly a day passes without its being extolled, but some peoples idea of it is that they are free to say what they like, but if anyone says anything back, that is an outrage The Coalmining Situation, Speech to the House of Commons (October 13, 1943) 5 We shape our buildings, and afterwards our buildings shape us. Speech to the House of Commons (October 28, 1943), on plans for the rebuilding of the Chamber (destroyed by an enemy bomb May 10, 1941), in Never Give In. The best of Winston Churchills Speeches (2003), Hyperion, p. 358 ISBN 1401300561 The essence of good House of Commons speaking is the conversational style, the facility for quick, informal interruptions and interchanges. Harangues from a rostrum would be a bad substitute for the conversational style in which so much of our business is done. But the conversational style requires a fairly small space, and there should be on great occasions a sense of crowd and urgency. There should be a sense of the importance of much that is said and a sense that great matters are being decided, there and then, by the House. It has a collective personality which enjoys the regard of the public, and which imposes itself upon the conduct not only of individual Members but of parties. Speech in the House of Commons, October 28, 1943 House of Commons Rebuilding . The House of Commons has lifted our affairs above the mechanical sphere into the human sphere. It thrives on criticism, it is perfectly impervious to newspaper abuse or taunts from any quarter, and it is capable of digesting almost anything or almost any body of gentlemen, whatever be the views with which they arrive. There is no situation to which it cannot address itself with vigour and ingenuity. It is the citadel of British liberty it is the foundation of our laws its traditions and its privileges are as lively today it broke the arbitrary power of the Crown and substituted that Constitutional Monarchy under which we have enjoyed so many blessings. Speech in the House of Commons, October 28, 1943 House of Commons Rebuilding . You might however consider whether you should not unfold as a background the great privilege of habeas corpus and trial by jury, which are the supreme protection invented by the English people for ordinary individuals against the state. The power of the Executive to cast a man in prison without formulating any charge known to the law, and particularly to deny him the judgment of his peers is in the highest degree odious and is the foundation of all totalitarian government, whether Nazi or Communist. In a telegram (November 21, 1942) by Churchill from Cairo, Egypt to Home Secretary Herbert Morrison cited in In the Highest Degree Odious (1992), Simpson, Clarendon Press, p. 391 ISBN 0198257759 When I make a statement of facts within my knowledge I expect it to be accepted. To Joseph Stalin in 1944, on the fact that there had been no plot between Britain and Germany to invade the Soviet Union. The Grand Alliance, Winston S. Churchill. The object of presenting medals, stars, and ribbons is to give pride and pleasure to those who have deserved them. At the same time a distinction is something which everybody does not possess. If all have it it is of less value A medal glitters, but it also casts a shadow. Speech in the House of Commons, March 22, 1944 War Decorations . I have left the obvious, essential fact to this point, namely, that it is the Russian Armies who have done the main work in tearing the guts out of the German army. In the air and on the oceans we could maintain our place, but there was no force in the world which could have been called into being, except after several more years, that would have been able to maul and break the German army unless it had been subjected to the terrible slaughter and manhandling that has fallen to it through the strength of the Russian Soviet Armies. Speech in the House of Commons, August 2, 1944 War Situation . The Russians will sweep through your country and your people will be liquidated. You are on the verge of annihiliation. To Stanisaw Mikoajczyk in Moscow, October 14, 1944. Quoted in Churchill, Hitler, and the Unnecessary War (2008) by Patrick J Buchanan. P. 380. A love of tradition has never weakened a nation, indeed it has strengthened nations in their hour of peril but the new view must come, the world must roll forward Let us have no fear of the future. Speech in the House of Commons, November 29, 1944 Debate on the Address . It seems to me that the moment has come when the question of bombing of German cities simply for the sake of increasing the terror, though under other pretexts, should be reviewed. After the devastation of Dresden by aerial bombing, and the resulting fire storm (February 1945). Quoted in Where the Right Went Wrong (2004) by Patrick J Buchanan. P. 119 ISBN 0312341156 It is a mistake to look too far ahead. Only one link in the chain of destiny can be handled at a time. Speech in the House of Commons, February 27, 1945 Crimea Conference in The Second World War, Volume VI: Triumph and Tragedy (1954), Chapter XXIII Yalta: Finale. I am going to tell you something you must not tell to any human being. We have split the atom. The report of the great experiment has just come in. A bomb was let off in some wild spot in New Mexico. It was only a thirteen-pound bomb, but it made a crater half a mile across. People ten miles away lay with their feet towards the bomb when it went off they rolled over and tried to look at the sky. But even with the darkest glasses it was impossible. It was the middle of the night, but it was as if seven suns had lit the earth two hundred miles away the light could be seen. The bomb sent up smoke into the stratosphere. It is the Second Coming. The secret has been wrested from nature. Fire was the first discovery this is the second. Churchill on the atom bomb in conversation with his doctor, Lord Moran, on 23 July 1945 (Lord Moran, Winston Churchill: The Struggle for Survival, 1940-1965 (London: Sphere, 1968), p. 305). The Gathering Storm Edit In the Second World War every bond between man and man was to perish. Crimes were committed by the Germans under the Hitlerite domination to which they allowed themselves to be subjected which find no equal in scale and wickedness with any that have darkened the human record. The wholesale massacre by systematised processes of six or seven millions of men, women, and children in the German execution camps exceeds in horror the rough-and-ready butcheries of Genghis Khan, and in scale reduces them to pigmy proportions. Deliberate extermination of whole populations was contemplated and pursued by both Germany and Russia in the Eastern war. We have at length emerged from a scene of material ruin and moral havoc the like of which had never darkened the imagination of former centuries. The Foreign Secretary has a special position in a British Cabinet. He is treated with marked respect in his high and responsible office, but he usually conducts his affairs under the continuous scrutiny, if not of the whole Cabinet, at least of its principal members. He is under an obligation to keep them informed. He circulates to his colleagues, as a matter of custom and routine, all his executive telegrams, the reports from our embassies abroad, the records of his interviews with foreign Ambassadors or other notables. At least this has been the case during my experience of Cabinet life. This supervision is, of course, especially maintained by the Prime Minister, who personally or through his Cabinet is responsible for controlling, and has the power to control, the main course of foreign policy. From him at least there must be no secrets. No Foreign Secretary can do his work unless he is supported constantly by his chief. To make things go smoothly, there must not only be agreement between them on fundamentals, but also a harmony of outlook and even to some extent of temperament. This is all the more important if the Prime Minister himself devotes special attention to foreign affairs. intro to ch.14 Mr. Eden at the Foreign Office: His Resignation, The Gathering Storm . Volume I of The Second World War, by Winston S. Churchill. Post-war years (19451955) Edit We must all turn our backs upon the horrors of the past. We must look to the future. We cannot afford to drag forward across the years that are to come the hatreds and revenges which have sprung from the injuries of the past. Crowdsourcing is a very popular means of obtaining the large amounts of labeled data that modern machine learning methods require. Although cheap and fast to obtain, crowdsourced labels suffer from significant amounts of error, thereby degrading the performance of downstream machine learning tasks. With the goal of improving the quality of the labeled data, we seek to mitigate the many errors that occur due to silly mistakes or inadvertent errors by crowdsourcing workers. We propose a two-stage setting for crowdsourcing where the worker first answers the questions, and is then allowed to change her answers after looking at a (noisy) reference answer. We mathematically formulate this process and develop mechanisms to incentivize workers to act appropriately. Our mathematical guarantees show that our mechanism incentivizes the workers to answer honestly in both stages, and refrain from answering randomly in the first stage or simply copying in the second. Numerical experiments reveal a significant boost in performance that such 8220self-correction8221 can provide when using crowdsourcing to train machine learning algorithms. There are various parametric models for analyzing pairwise comparison data, including the Bradley-Terry-Luce (BTL) and Thurstone models, but their reliance on strong parametric assumptions is limiting. In this work, we study a flexible model for pairwise comparisons, under which the probabilities of outcomes are required only to satisfy a natural form of stochastic transitivity. This class includes parametric models including the BTL and Thurstone models as special cases, but is considerably more general. We provide various examples of models in this broader stochastically transitive class for which classical parametric models provide poor fits. Despite this greater flexibility, we show that the matrix of probabilities can be estimated at the same rate as in standard parametric models. On the other hand, unlike in the BTL and Thurstone models, computing the minimax-optimal estimator in the stochastically transitive model is non-trivial, and we explore various computationally tractable alternatives. We show that a simple singular value thresholding algorithm is statistically consistent but does not achieve the minimax rate. We then propose and study algorithms that achieve the minimax rate over interesting sub-classes of the full stochastically transitive class. We complement our theoretical results with thorough numerical simulations. We show how any binary pairwise model may be uprooted to a fully symmetric model, wherein original singleton potentials are transformed to potentials on edges to an added variable, and then rerooted to a new model on the original number of variables. The new model is essentially equivalent to the original model, with the same partition function and allowing recovery of the original marginals or a MAP conguration, yet may have very different computational properties that allow much more efficient inference. This meta-approach deepens our understanding, may be applied to any existing algorithm to yield improved methods in practice, generalizes earlier theoretical results, and reveals a remarkable interpretation of the triplet-consistent polytope. We show how deep learning methods can be applied in the context of crowdsourcing and unsupervised ensemble learning. First, we prove that the popular model of Dawid and Skene, which assumes that all classifiers are conditionally independent, is to a Restricted Boltzmann Machine (RBM) with a single hidden node. Hence, under this model, the posterior probabilities of the true labels can be instead estimated via a trained RBM. Next, to address the more general case, where classifiers may strongly violate the conditional independence assumption, we propose to apply RBM-based Deep Neural Net (DNN). Experimental results on various simulated and real-world datasets demonstrate that our proposed DNN approach outperforms other state-of-the-art methods, in particular when the data violates the conditional independence assumption. Revisiting Semi-Supervised Learning with Graph Embeddings Zhilin Yang Carnegie Mellon University . William Cohen CMU . Ruslan Salakhudinov U. of Toronto Paper AbstractWe present a semi-supervised learning framework based on graph embeddings. Given a graph between instances, we train an embedding for each instance to jointly predict the class label and the neighborhood context in the graph. We develop both transductive and inductive variants of our method. In the transductive variant of our method, the class labels are determined by both the learned embeddings and input feature vectors, while in the inductive variant, the embeddings are defined as a parametric function of the feature vectors, so predictions can be made on instances not seen during training. On a large and diverse set of benchmark tasks, including text classification, distantly supervised entity extraction, and entity classification, we show improved performance over many of the existing models. Reinforcement learning can acquire complex behaviors from high-level specifications. However, defining a cost function that can be optimized effectively and encodes the correct task is challenging in practice. We explore how inverse optimal control (IOC) can be used to learn behaviors from demonstrations, with applications to torque control of high-dimensional robotic systems. Our method addresses two key challenges in inverse optimal control: first, the need for informative features and effective regularization to impose structure on the cost, and second, the difficulty of learning the cost function under unknown dynamics for high-dimensional continuous systems. To address the former challenge, we present an algorithm capable of learning arbitrary nonlinear cost functions, such as neural networks, without meticulous feature engineering. To address the latter challenge, we formulate an efficient sample-based approximation for MaxEnt IOC. We evaluate our method on a series of simulated tasks and real-world robotic manipulation problems, demonstrating substantial improvement over prior methods both in terms of task complexity and sample efficiency. In learning latent variable models (LVMs), it is important to effectively capture infrequent patterns and shrink model size without sacrificing modeling power. Various studies have been done to 8220diversify8221 a LVM, which aim to learn a diverse set of latent components in LVMs. Most existing studies fall into a frequentist-style regularization framework, where the components are learned via point estimation. In this paper, we investigate how to 8220diversify8221 LVMs in the paradigm of Bayesian learning, which has advantages complementary to point estimation, such as alleviating overfitting via model averaging and quantifying uncertainty. We propose two approaches that have complementary advantages. One is to define diversity-promoting mutual angular priors which assign larger density to components with larger mutual angles based on Bayesian network and von Mises-Fisher distribution and use these priors to affect the posterior via Bayes rule. We develop two efficient approximate posterior inference algorithms based on variational inference and Markov chain Monte Carlo sampling. The other approach is to impose diversity-promoting regularization directly over the post-data distribution of components. These two methods are applied to the Bayesian mixture of experts model to encourage the 8220experts8221 to be diverse and experimental results demonstrate the effectiveness and efficiency of our methods. High dimensional nonparametric regression is an inherently difficult problem with known lower bounds depending exponentially in dimension. A popular strategy to alleviate this curse of dimensionality has been to use additive models of emph , which model the regression function as a sum of independent functions on each dimension. Though useful in controlling the variance of the estimate, such models are often too restrictive in practical settings. Between non-additive models which often have large variance and first order additive models which have large bias, there has been little work to exploit the trade-off in the middle via additive models of intermediate order. In this work, we propose salsa, which bridges this gap by allowing interactions between variables, but controls model capacity by limiting the order of interactions. salsas minimises the residual sum of squares with squared RKHS norm penalties. Algorithmically, it can be viewed as Kernel Ridge Regression with an additive kernel. When the regression function is additive, the excess risk is only polynomial in dimension. Using the Girard-Newton formulae, we efficiently sum over a combinatorial number of terms in the additive expansion. Via a comparison on 15 real datasets, we show that our method is competitive against 21 other alternatives. We propose an extension to Hawkes processes by treating the levels of self-excitation as a stochastic differential equation. Our new point process allows better approximation in application domains where events and intensities accelerate each other with correlated levels of contagion. We generalize a recent algorithm for simulating draws from Hawkes processes whose levels of excitation are stochastic processes, and propose a hybrid Markov chain Monte Carlo approach for model fitting. Our sampling procedure scales linearly with the number of required events and does not require stationarity of the point process. A modular inference procedure consisting of a combination between Gibbs and Metropolis Hastings steps is put forward. We recover expectation maximization as a special case. Our general approach is illustrated for contagion following geometric Brownian motion and exponential Langevin dynamics. Rank aggregation systems collect ordinal preferences from individuals to produce a global ranking that represents the social preference. To reduce the computational complexity of learning the global ranking, a common practice is to use rank-breaking. Individuals preferences are broken into pairwise comparisons and then applied to efficient algorithms tailored for independent pairwise comparisons. However, due to the ignored dependencies, naive rank-breaking approaches can result in inconsistent estimates. The key idea to produce unbiased and accurate estimates is to treat the paired comparisons outcomes unequally, depending on the topology of the collected data. In this paper, we provide the optimal rank-breaking estimator, which not only achieves consistency but also achieves the best error bound. This allows us to characterize the fundamental tradeoff between accuracy and complexity in some canonical scenarios. Further, we identify how the accuracy depends on the spectral gap of a corresponding comparison graph. Dropout distillation Samuel Rota Bul FBK . Lorenzo Porzi FBK . Peter Kontschieder Microsoft Research Cambridge Paper AbstractDropout is a popular stochastic regularization technique for deep neural networks that works by randomly dropping (i. e. zeroing) units from the network during training. This randomization process allows to implicitly train an ensemble of exponentially many networks sharing the same parametrization, which should be averaged at test time to deliver the final prediction. A typical workaround for this intractable averaging operation consists in scaling the layers undergoing dropout randomization. This simple rule called 8216standard dropout8217 is efficient, but might degrade the accuracy of the prediction. In this work we introduce a novel approach, coined 8216dropout distillation8217, that allows us to train a predictor in a way to better approximate the intractable, but preferable, averaging process, while keeping under control its computational efficiency. We are thus able to construct models that are as efficient as standard dropout, or even more efficient, while being more accurate. Experiments on standard benchmark datasets demonstrate the validity of our method, yielding consistent improvements over conventional dropout. Metadata-conscious anonymous messaging Giulia Fanti UIUC . Peter Kairouz UIUC . Sewoong Oh UIUC . Kannan Ramchandran UC Berkeley . Pramod Viswanath UIUC Paper AbstractAnonymous messaging platforms like Whisper and Yik Yak allow users to spread messages over a network (e. g. a social network) without revealing message authorship to other users. The spread of messages on these platforms can be modeled by a diffusion process over a graph. Recent advances in network analysis have revealed that such diffusion processes are vulnerable to author deanonymization by adversaries with access to metadata, such as timing information. In this work, we ask the fundamental question of how to propagate anonymous messages over a graph to make it difficult for adversaries to infer the source. In particular, we study the performance of a message propagation protocol called adaptive diffusion introduced in (Fanti et al. 2015). We prove that when the adversary has access to metadata at a fraction of corrupted graph nodes, adaptive diffusion achieves asymptotically optimal source-hiding and significantly outperforms standard diffusion. We further demonstrate empirically that adaptive diffusion hides the source effectively on real social networks. The Teaching Dimension of Linear Learners Ji Liu University of Rochester . Xiaojin Zhu University of Wisconsin . Hrag Ohannessian University of Wisconsin-Madison Paper AbstractTeaching dimension is a learning theoretic quantity that specifies the minimum training set size to teach a target model to a learner. Previous studies on teaching dimension focused on version-space learners which maintain all hypotheses consistent with the training data, and cannot be applied to modern machine learners which select a specific hypothesis via optimization. This paper presents the first known teaching dimension for ridge regression, support vector machines, and logistic regression. We also exhibit optimal training sets that match these teaching dimensions. Our approach generalizes to other linear learners. Truthful Univariate Estimators Ioannis Caragiannis University of Patras . Ariel Procaccia Carnegie Mellon University . Nisarg Shah Carnegie Mellon University Paper AbstractWe revisit the classic problem of estimating the population mean of an unknown single-dimensional distribution from samples, taking a game-theoretic viewpoint. In our setting, samples are supplied by strategic agents, who wish to pull the estimate as close as possible to their own value. In this setting, the sample mean gives rise to manipulation opportunities, whereas the sample median does not. Our key question is whether the sample median is the best (in terms of mean squared error) truthful estimator of the population mean. We show that when the underlying distribution is symmetric, there are truthful estimators that dominate the median. Our main result is a characterization of worst-case optimal truthful estimators, which provably outperform the median, for possibly asymmetric distributions with bounded support. Why Regularized Auto-Encoders learn Sparse Representation Devansh Arpit SUNY Buffalo . Yingbo Zhou SUNY Buffalo . Hung Ngo SUNY Buffalo . Venu Govindaraju SUNY Buffalo Paper AbstractSparse distributed representation is the key to learning useful features in deep learning algorithms, because not only it is an efficient mode of data representation, but also 8212 more importantly 8212 it captures the generation process of most real world data. While a number of regularized auto-encoders (AE) enforce sparsity explicitly in their learned representation and others don8217t, there has been little formal analysis on what encourages sparsity in these models in general. Our objective is to formally study this general problem for regularized auto-encoders. We provide sufficient conditions on both regularization and activation functions that encourage sparsity. We show that multiple popular models (de-noising and contractive auto encoders, e. g.) and activations (rectified linear and sigmoid, e. g.) satisfy these conditions thus, our conditions help explain sparsity in their learned representation. Thus our theoretical and empirical analysis together shed light on the properties of regularizationactivation that are conductive to sparsity and unify a number of existing auto-encoder models and activation functions under the same analytical framework. k-variates: more pluses in the k-means Richard Nock Nicta 038 ANU . Raphael Canyasse Ecole Polytechnique and The Technion . Roksana Boreli Data61 . Frank Nielsen Ecole Polytechnique and Sony CS Labs Inc. Paper Abstractk-means seeding has become a de facto standard for hard clustering algorithms. In this paper, our first contribution is a two-way generalisation of this seeding, k-variates, that includes the sampling of general densities rather than just a discrete set of Dirac densities anchored at the point locations, textit a generalisation of the well known Arthur-Vassilvitskii (AV) approximation guarantee, in the form of a textit approximation bound of the textit optimum. This approximation exhibits a reduced dependency on the 8220noise8221 component with respect to the optimal potential 8212 actually approaching the statistical lower bound. We show that k-variates textit to efficient (biased seeding) clustering algorithms tailored to specific frameworks these include distributed, streaming and on-line clustering, with textit approximation results for these algorithms. Finally, we present a novel application of k-variates to differential privacy. For either the specific frameworks considered here, or for the differential privacy setting, there is little to no prior results on the direct application of k-means and its approximation bounds 8212 state of the art contenders appear to be significantly more complex and or display less favorable (approximation) properties. We stress that our algorithms can still be run in cases where there is textit closed form solution for the population minimizer. We demonstrate the applicability of our analysis via experimental evaluation on several domains and settings, displaying competitive performances vs state of the art. Multi-Player Bandits 8212 a Musical Chairs Approach Jonathan Rosenski Weizmann Institute of Science . Ohad Shamir Weizmann Institute of Science . Liran Szlak Weizmann Institute of Science Paper AbstractWe consider a variant of the stochastic multi-armed bandit problem, where multiple players simultaneously choose from the same set of arms and may collide, receiving no reward. This setting has been motivated by problems arising in cognitive radio networks, and is especially challenging under the realistic assumption that communication between players is limited. We provide a communication-free algorithm (Musical Chairs) which attains constant regret with high probability, as well as a sublinear-regret, communication-free algorithm (Dynamic Musical Chairs) for the more difficult setting of players dynamically entering and leaving throughout the game. Moreover, both algorithms do not require prior knowledge of the number of players. To the best of our knowledge, these are the first communication-free algorithms with these types of formal guarantees. The Information Sieve Greg Ver Steeg Information Sciences Institute . Aram Galstyan Information Sciences Institute Paper AbstractWe introduce a new framework for unsupervised learning of representations based on a novel hierarchical decomposition of information. Intuitively, data is passed through a series of progressively fine-grained sieves. Each layer of the sieve recovers a single latent factor that is maximally informative about multivariate dependence in the data. The data is transformed after each pass so that the remaining unexplained information trickles down to the next layer. Ultimately, we are left with a set of latent factors explaining all the dependence in the original data and remainder information consisting of independent noise. We present a practical implementation of this framework for discrete variables and apply it to a variety of fundamental tasks in unsupervised learning including independent component analysis, lossy and lossless compression, and predicting missing values in data. Deep Speech 2. End-to-End Speech Recognition in English and Mandarin Dario Amodei . Rishita Anubhai . Eric Battenberg . Carl Case . Jared Casper . Bryan Catanzaro . JingDong Chen . Mike Chrzanowski Baidu USA, Inc. . Adam Coates . Greg Diamos Baidu USA, Inc. . Erich Elsen Baidu USA, Inc. . Jesse Engel . Linxi Fan . Christopher Fougner . Awni Hannun Baidu USA, Inc. . Billy Jun . Tony Han . Patrick LeGresley . Xiangang Li Baidu . Libby Lin . Sharan Narang . Andrew Ng . Sherjil Ozair . Ryan Prenger . Sheng Qian Baidu . Jonathan Raiman . Sanjeev Satheesh Baidu SVAIL . David Seetapun . Shubho Sengupta . Chong Wang . Yi Wang . Zhiqian Wang . Bo Xiao . Yan Xie Baidu . Dani Yogatama . Jun Zhan . zhenyao Zhu Paper AbstractWe show that an end-to-end deep learning approach can be used to recognize either English or Mandarin Chinese speechtwo vastly different languages. Because it replaces entire pipelines of hand-engineered components with neural networks, end-to-end learning allows us to handle a diverse variety of speech including noisy environments, accents and different languages. Key to our approach is our application of HPC techniques, enabling experiments that previously took weeks to now run in days. This allows us to iterate more quickly to identify superior architectures and algorithms. As a result, in several cases, our system is competitive with the transcription of human workers when benchmarked on standard datasets. Finally, using a technique called Batch Dispatch with GPUs in the data center, we show that our system can be inexpensively deployed in an online setting, delivering low latency when serving users at scale. An important question in feature selection is whether a selection strategy recovers the 8220true8221 set of features, given enough data. We study this question in the context of the popular Least Absolute Shrinkage and Selection Operator (Lasso) feature selection strategy. In particular, we consider the scenario when the model is misspecified so that the learned model is linear while the underlying real target is nonlinear. Surprisingly, we prove that under certain conditions, Lasso is still able to recover the correct features in this case. We also carry out numerical studies to empirically verify the theoretical results and explore the necessity of the conditions under which the proof holds. We propose minimum regret search (MRS), a novel acquisition function for Bayesian optimization. MRS bears similarities with information-theoretic approaches such as entropy search (ES). However, while ES aims in each query at maximizing the information gain with respect to the global maximum, MRS aims at minimizing the expected simple regret of its ultimate recommendation for the optimum. While empirically ES and MRS perform similar in most of the cases, MRS produces fewer outliers with high simple regret than ES. We provide empirical results both for a synthetic single-task optimization problem as well as for a simulated multi-task robotic control problem. CryptoNets: Applying Neural Networks to Encrypted Data with High Throughput and Accuracy Ran Gilad-Bachrach Microsoft Research . Nathan Dowlin Princeton . Kim Laine Microsoft Research . Kristin Lauter Microsoft Research . Michael Naehrig Microsoft Research . John Wernsing Microsoft Research Paper AbstractApplying machine learning to a problem which involves medical, financial, or other types of sensitive data, not only requires accurate predictions but also careful attention to maintaining data privacy and security. Legal and ethical requirements may prevent the use of cloud-based machine learning solutions for such tasks. In this work, we will present a method to convert learned neural networks to CryptoNets, neural networks that can be applied to encrypted data. This allows a data owner to send their data in an encrypted form to a cloud service that hosts the network. The encryption ensures that the data remains confidential since the cloud does not have access to the keys needed to decrypt it. Nevertheless, we will show that the cloud service is capable of applying the neural network to the encrypted data to make encrypted predictions, and also return them in encrypted form. These encrypted predictions can be sent back to the owner of the secret key who can decrypt them. Therefore, the cloud service does not gain any information about the raw data nor about the prediction it made. We demonstrate CryptoNets on the MNIST optical character recognition tasks. CryptoNets achieve 99 accuracy and can make around 59000 predictions per hour on a single PC. Therefore, they allow high throughput, accurate, and private predictions. Spectral methods for dimensionality reduction and clustering require solving an eigenproblem defined by a sparse affinity matrix. When this matrix is large, one seeks an approximate solution. The standard way to do this is the Nystrom method, which first solves a small eigenproblem considering only a subset of landmark points, and then applies an out-of-sample formula to extrapolate the solution to the entire dataset. We show that by constraining the original problem to satisfy the Nystrom formula, we obtain an approximation that is computationally simple and efficient, but achieves a lower approximation error using fewer landmarks and less runtime. We also study the role of normalization in the computational cost and quality of the resulting solution. As a widely used non-linear activation, Rectified Linear Unit (ReLU) separates noise and signal in a feature map by learning a threshold or bias. However, we argue that the classification of noise and signal not only depends on the magnitude of responses, but also the context of how the feature responses would be used to detect more abstract patterns in higher layers. In order to output multiple response maps with magnitude in different ranges for a particular visual pattern, existing networks employing ReLU and its variants have to learn a large number of redundant filters. In this paper, we propose a multi-bias non-linear activation (MBA) layer to explore the information hidden in the magnitudes of responses. It is placed after the convolution layer to decouple the responses to a convolution kernel into multiple maps by multi-thresholding magnitudes, thus generating more patterns in the feature space at a low computational cost. It provides great flexibility of selecting responses to different visual patterns in different magnitude ranges to form rich representations in higher layers. Such a simple and yet effective scheme achieves the state-of-the-art performance on several benchmarks. We propose a novel multi-task learning method that can minimize the effect of negative transfer by allowing asymmetric transfer between the tasks based on task relatedness as well as the amount of individual task losses, which we refer to as Asymmetric Multi-task Learning (AMTL). To tackle this problem, we couple multiple tasks via a sparse, directed regularization graph, that enforces each task parameter to be reconstructed as a sparse combination of other tasks, which are selected based on the task-wise loss. We present two different algorithms to solve this joint learning of the task predictors and the regularization graph. The first algorithm solves for the original learning objective using alternative optimization, and the second algorithm solves an approximation of it using curriculum learning strategy, that learns one task at a time. We perform experiments on multiple datasets for classification and regression, on which we obtain significant improvements in performance over the single task learning and symmetric multitask learning baselines. This paper illustrates a novel approach to the estimation of generalization error of decision tree classifiers. We set out the study of decision tree errors in the context of consistency analysis theory, which proved that the Bayes error can be achieved only if when the number of data samples thrown into each leaf node goes to infinity. For the more challenging and practical case where the sample size is finite or small, a novel sampling error term is introduced in this paper to cope with the small sample problem effectively and efficiently. Extensive experimental results show that the proposed error estimate is superior to the well known K-fold cross validation methods in terms of robustness and accuracy. Moreover it is orders of magnitudes more efficient than cross validation methods. We study the convergence properties of the VR-PCA algorithm introduced by cite for fast computation of leading singular vectors. We prove several new results, including a formal analysis of a block version of the algorithm, and convergence from random initialization. We also make a few observations of independent interest, such as how pre-initializing with just a single exact power iteration can significantly improve the analysis, and what are the convexity and non-convexity properties of the underlying optimization problem. We consider the problem of principal component analysis (PCA) in a streaming stochastic setting, where our goal is to find a direction of approximate maximal variance, based on a stream of i. i.d. data points in realsd. A simple and computationally cheap algorithm for this is stochastic gradient descent (SGD), which incrementally updates its estimate based on each new data point. However, due to the non-convex nature of the problem, analyzing its performance has been a challenge. In particular, existing guarantees rely on a non-trivial eigengap assumption on the covariance matrix, which is intuitively unnecessary. In this paper, we provide (to the best of our knowledge) the first eigengap-free convergence guarantees for SGD in the context of PCA. This also partially resolves an open problem posed in cite . Moreover, under an eigengap assumption, we show that the same techniques lead to new SGD convergence guarantees with better dependence on the eigengap. Dealbreaker: A Nonlinear Latent Variable Model for Educational Data Andrew Lan Rice University . Tom Goldstein University of Maryland . Richard Baraniuk Rice University . Christoph Studer Cornell University Paper AbstractStatistical models of student responses on assessment questions, such as those in homeworks and exams, enable educators and computer-based personalized learning systems to gain insights into students knowledge using machine learning. Popular student-response models, including the Rasch model and item response theory models, represent the probability of a student answering a question correctly using an affine function of latent factors. While such models can accurately predict student responses, their ability to interpret the underlying knowledge structure (which is certainly nonlinear) is limited. In response, we develop a new, nonlinear latent variable model that we call the dealbreaker model, in which a students success probability is determined by their weakest concept mastery. We develop efficient parameter inference algorithms for this model using novel methods for nonconvex optimization. We show that the dealbreaker model achieves comparable or better prediction performance as compared to affine models with real-world educational datasets. We further demonstrate that the parameters learned by the dealbreaker model are interpretablethey provide key insights into which concepts are critical (i. e. the dealbreaker) to answering a question correctly. We conclude by reporting preliminary results for a movie-rating dataset, which illustrate the broader applicability of the dealbreaker model. We derive a new discrepancy statistic for measuring differences between two probability distributions based on combining Stein8217s identity and the reproducing kernel Hilbert space theory. We apply our result to test how well a probabilistic model fits a set of observations, and derive a new class of powerful goodness-of-fit tests that are widely applicable for complex and high dimensional distributions, even for those with computationally intractable normalization constants. Both theoretical and empirical properties of our methods are studied thoroughly. Variable Elimination in the Fourier Domain Yexiang Xue Cornell University . Stefano Ermon . Ronan Le Bras Cornell University . Carla . Bart Paper AbstractThe ability to represent complex high dimensional probability distributions in a compact form is one of the key insights in the field of graphical models. Factored representations are ubiquitous in machine learning and lead to major computational advantages. We explore a different type of compact representation based on discrete Fourier representations, complementing the classical approach based on conditional independencies. We show that a large class of probabilistic graphical models have a compact Fourier representation. This theoretical result opens up an entirely new way of approximating a probability distribution. We demonstrate the significance of this approach by applying it to the variable elimination algorithm. Compared with the traditional bucket representation and other approximate inference algorithms, we obtain significant improvements. Low-rank matrix approximation has been widely adopted in machine learning applications with sparse data, such as recommender systems. However, the sparsity of the data, incomplete and noisy, introduces challenges to the algorithm stability 8212 small changes in the training data may significantly change the models. As a result, existing low-rank matrix approximation solutions yield low generalization performance, exhibiting high error variance on the training dataset, and minimizing the training error may not guarantee error reduction on the testing dataset. In this paper, we investigate the algorithm stability problem of low-rank matrix approximations. We present a new algorithm design framework, which (1) introduces new optimization objectives to guide stable matrix approximation algorithm design, and (2) solves the optimization problem to obtain stable low-rank approximation solutions with good generalization performance. Experimental results on real-world datasets demonstrate that the proposed work can achieve better prediction accuracy compared with both state-of-the-art low-rank matrix approximation methods and ensemble methods in recommendation task. Given samples from two densities p and q, density ratio estimation (DRE) is the problem of estimating the ratio pq. Two popular discriminative approaches to DRE are KL importance estimation (KLIEP), and least squares importance fitting (LSIF). In this paper, we show that KLIEP and LSIF both employ class-probability estimation (CPE) losses. Motivated by this, we formally relate DRE and CPE, and demonstrate the viability of using existing losses from one problem for the other. For the DRE problem, we show that essentially any CPE loss (eg logistic, exponential) can be used, as this equivalently minimises a Bregman divergence to the true density ratio. We show how different losses focus on accurately modelling different ranges of the density ratio, and use this to design new CPE losses for DRE. For the CPE problem, we argue that the LSIF loss is useful in the regime where one wishes to rank instances with maximal accuracy at the head of the ranking. In the course of our analysis, we establish a Bregman divergence identity that may be of independent interest. We study nonconvex finite-sum problems and analyze stochastic variance reduced gradient (SVRG) methods for them. SVRG and related methods have recently surged into prominence for convex optimization given their edge over stochastic gradient descent (SGD) but their theoretical analysis almost exclusively assumes convexity. In contrast, we prove non-asymptotic rates of convergence (to stationary points) of SVRG for nonconvex optimization, and show that it is provably faster than SGD and gradient descent. We also analyze a subclass of nonconvex problems on which SVRG attains linear convergence to the global optimum. We extend our analysis to mini-batch variants of SVRG, showing (theoretical) linear speedup due to minibatching in parallel settings. Hierarchical Variational Models Rajesh Ranganath . Dustin Tran Columbia University . Blei David Columbia Paper AbstractBlack box variational inference allows researchers to easily prototype and evaluate an array of models. Recent advances allow such algorithms to scale to high dimensions. However, a central question remains: How to specify an expressive variational distribution that maintains efficient computation To address this, we develop hierarchical variational models (HVMs). HVMs augment a variational approximation with a prior on its parameters, which allows it to capture complex structure for both discrete and continuous latent variables. The algorithm we develop is black box, can be used for any HVM, and has the same computational efficiency as the original approximation. We study HVMs on a variety of deep discrete latent variable models. HVMs generalize other expressive variational distributions and maintains higher fidelity to the posterior. The field of mobile health (mHealth) has the potential to yield new insights into health and behavior through the analysis of continuously recorded data from wearable health and activity sensors. In this paper, we present a hierarchical span-based conditional random field model for the key problem of jointly detecting discrete events in such sensor data streams and segmenting these events into high-level activity sessions. Our model includes higher-order cardinality factors and inter-event duration factors to capture domain-specific structure in the label space. We show that our model supports exact MAP inference in quadratic time via dynamic programming, which we leverage to perform learning in the structured support vector machine framework. We apply the model to the problems of smoking and eating detection using four real data sets. Our results show statistically significant improvements in segmentation performance relative to a hierarchical pairwise CRF. Binary embeddings with structured hashed projections Anna Choromanska Courant Institute, NYU . Krzysztof Choromanski Google Research NYC . Mariusz Bojarski NVIDIA . Tony Jebara Columbia . Sanjiv Kumar . Yann Paper AbstractWe consider the hashing mechanism for constructing binary embeddings, that involves pseudo-random projections followed by nonlinear (sign function) mappings. The pseudorandom projection is described by a matrix, where not all entries are independent random variables but instead a fixed budget of randomness is distributed across the matrix. Such matrices can be efficiently stored in sub-quadratic or even linear space, provide reduction in randomness usage (i. e. number of required random values), and very often lead to computational speed ups. We prove several theoretical results showing that projections via various structured matrices followed by nonlinear mappings accurately preserve the angular distance between input high-dimensional vectors. To the best of our knowledge, these results are the first that give theoretical ground for the use of general structured matrices in the nonlinear setting. In particular, they generalize previous extensions of the Johnson - Lindenstrauss lemma and prove the plausibility of the approach that was so far only heuristically confirmed for some special structured matrices. Consequently, we show that many structured matrices can be used as an efficient information compression mechanism. Our findings build a better understanding of certain deep architectures, which contain randomly weighted and untrained layers, and yet achieve high performance on different learning tasks. We empirically verify our theoretical findings and show the dependence of learning via structured hashed projections on the performance of neural network as well as nearest neighbor classifier. A Variational Analysis of Stochastic Gradient Algorithms Stephan Mandt Columbia University . Matthew Hoffman Adobe Research . Blei David Columbia Paper AbstractStochastic Gradient Descent (SGD) is an important algorithm in machine learning. With constant learning rates, it is a stochastic process that, after an initial phase of convergence, generates samples from a stationary distribution. We show that SGD with constant rates can be effectively used as an approximate posterior inference algorithm for probabilistic modeling. Specifically, we show how to adjust the tuning parameters of SGD such as to match the resulting stationary distribution to the posterior. This analysis rests on interpreting SGD as a continuous-time stochastic process and then minimizing the Kullback-Leibler divergence between its stationary distribution and the target posterior. (This is in the spirit of variational inference.) In more detail, we model SGD as a multivariate Ornstein-Uhlenbeck process and then use properties of this process to derive the optimal parameters. This theoretical framework also connects SGD to modern scalable inference algorithms we analyze the recently proposed stochastic gradient Fisher scoring under this perspective. We demonstrate that SGD with properly chosen constant rates gives a new way to optimize hyperparameters in probabilistic models. This paper proposes a new mechanism for sampling training instances for stochastic gradient descent (SGD) methods by exploiting any side-information associated with the instances (for e. g. class-labels) to improve convergence. Previous methods have either relied on sampling from a distribution defined over training instances or from a static distribution that fixed before training. This results in two problems a) any distribution that is set apriori is independent of how the optimization progresses and b) maintaining a distribution over individual instances could be infeasible in large-scale scenarios. In this paper, we exploit the side information associated with the instances to tackle both problems. More specifically, we maintain a distribution over classes (instead of individual instances) that is adaptively estimated during the course of optimization to give the maximum reduction in the variance of the gradient. Intuitively, we sample more from those regions in space that have a textit gradient contribution. Our experiments on highly multiclass datasets show that our proposal converge significantly faster than existing techniques. Tensor regression has shown to be advantageous in learning tasks with multi-directional relatedness. Given massive multiway data, traditional methods are often too slow to operate on or suffer from memory bottleneck. In this paper, we introduce subsampled tensor projected gradient to solve the problem. Our algorithm is impressively simple and efficient. It is built upon projected gradient method with fast tensor power iterations, leveraging randomized sketching for further acceleration. Theoretical analysis shows that our algorithm converges to the correct solution in fixed number of iterations. The memory requirement grows linearly with the size of the problem. We demonstrate superior empirical performance on both multi-linear multi-task learning and spatio-temporal applications. This paper presents a novel distributed variational inference framework that unifies many parallel sparse Gaussian process regression (SGPR) models for scalable hyperparameter learning with big data. To achieve this, our framework exploits a structure of correlated noise process model that represents the observation noises as a finite realization of a high-order Gaussian Markov random process. By varying the Markov order and covariance function for the noise process model, different variational SGPR models result. This consequently allows the correlation structure of the noise process model to be characterized for which a particular variational SGPR model is optimal. We empirically evaluate the predictive performance and scalability of the distributed variational SGPR models unified by our framework on two real-world datasets. Online Stochastic Linear Optimization under One-bit Feedback Lijun Zhang Nanjing University . Tianbao Yang University of Iowa . Rong Jin Alibaba Group . Yichi Xiao Nanjing University . Zhi-hua Zhou Paper AbstractIn this paper, we study a special bandit setting of online stochastic linear optimization, where only one-bit of information is revealed to the learner at each round. This problem has found many applications including online advertisement and online recommendation. We assume the binary feedback is a random variable generated from the logit model, and aim to minimize the regret defined by the unknown linear function. Although the existing method for generalized linear bandit can be applied to our problem, the high computational cost makes it impractical for real-world applications. To address this challenge, we develop an efficient online learning algorithm by exploiting particular structures of the observation model. Specifically, we adopt online Newton step to estimate the unknown parameter and derive a tight confidence region based on the exponential concavity of the logistic loss. Our analysis shows that the proposed algorithm achieves a regret bound of O(dsqrt ), which matches the optimal result of stochastic linear bandits. We present an adaptive online gradient descent algorithm to solve online convex optimization problems with long-term constraints, which are constraints that need to be satisfied when accumulated over a finite number of rounds T, but can be violated in intermediate rounds. For some user-defined trade-off parameter beta in (0, 1), the proposed algorithm achieves cumulative regret bounds of O(Tmax ) and O(T ), respectively for the loss and the constraint violations. Our results hold for convex losses, can handle arbitrary convex constraints and rely on a single computationally efficient algorithm. Our contributions improve over the best known cumulative regret bounds of Mahdavi et al. (2012), which are respectively O(T12) and O(T34) for general convex domains, and respectively O(T23) and O(T23) when the domain is further restricted to be a polyhedral set. We supplement the analysis with experiments validating the performance of our algorithm in practice. Motivated by an application of eliciting users8217 preferences, we investigate the problem of learning hemimetrics, i. e. pairwise distances among a set of n items that satisfy triangle inequalities and non-negativity constraints. In our application, the (asymmetric) distances quantify private costs a user incurs when substituting one item by another. We aim to learn these distances (costs) by asking the users whether they are willing to switch from one item to another for a given incentive offer. Without exploiting structural constraints of the hemimetric polytope, learning the distances between each pair of items requires Theta(n2) queries. We propose an active learning algorithm that substantially reduces this sample complexity by exploiting the structural constraints on the version space of hemimetrics. Our proposed algorithm achieves provably-optimal sample complexity for various instances of the task. For example, when the items are embedded into K tight clusters, the sample complexity of our algorithm reduces to O(n K). Extensive experiments on a restaurant recommendation data set support the conclusions of our theoretical analysis. We present an approach for learning simple algorithms such as copying, multi-digit addition and single digit multiplication directly from examples. Our framework consists of a set of interfaces, accessed by a controller. Typical interfaces are 1-D tapes or 2-D grids that hold the input and output data. For the controller, we explore a range of neural network-based models which vary in their ability to abstract the underlying algorithm from training instances and generalize to test examples with many thousands of digits. The controller is trained using Q-learning with several enhancements and we show that the bottleneck is in the capabilities of the controller rather than in the search incurred by Q-learning. Learning Physical Intuition of Block Towers by Example Adam Lerer Facebook AI Research . Sam Gross Facebook AI Research . Rob Fergus Facebook AI Research Paper AbstractWooden blocks are a common toy for infants, allowing them to develop motor skills and gain intuition about the physical behavior of the world. In this paper, we explore the ability of deep feed-forward models to learn such intuitive physics. Using a 3D game engine, we create small towers of wooden blocks whose stability is randomized and render them collapsing (or remaining upright). This data allows us to train large convolutional network models which can accurately predict the outcome, as well as estimating the trajectories of the blocks. The models are also able to generalize in two important ways: (i) to new physical scenarios, e. g. towers with an additional block and (ii) to images of real wooden blocks, where it obtains a performance comparable to human subjects. Structure Learning of Partitioned Markov Networks Song Liu The Inst. of Stats. Mathe. . Taiji Suzuki . Masashi Sugiyama University of Tokyo . Kenji Fukumizu The Institute of Statistical Mathematics Paper AbstractWe learn the structure of a Markov Network between two groups of random variables from joint observations. Since modelling and learning the full MN structure may be hard, learning the links between two groups directly may be a preferable option. We introduce a novel concept called the emph whose factorization directly associates with the Markovian properties of random variables across two groups. A simple one-shot convex optimization procedure is proposed for learning the emph factorizations of the partitioned ratio and it is theoretically guaranteed to recover the correct inter-group structure under mild conditions. The performance of the proposed method is experimentally compared with the state of the art MN structure learning methods using ROC curves. Real applications on analyzing bipartisanship in US congress and pairwise DNAtime-series alignments are also reported. This work focuses on dynamic regret of online convex optimization that compares the performance of online learning to a clairvoyant who knows the sequence of loss functions in advance and hence selects the minimizer of the loss function at each step. By assuming that the clairvoyant moves slowly (i. e. the minimizers change slowly), we present several improved variation-based upper bounds of the dynamic regret under the true and noisy gradient feedback, which are in light of the presented lower bounds. The key to our analysis is to explore a regularity metric that measures the temporal changes in the clairvoyant8217s minimizers, to which we refer as path variation. Firstly, we present a general lower bound in terms of the path variation, and then show that under full information or gradient feedback we are able to achieve an optimal dynamic regret. Secondly, we present a lower bound with noisy gradient feedback and then show that we can achieve optimal dynamic regrets under a stochastic gradient feedback and two-point bandit feedback. Moreover, for a sequence of smooth loss functions that admit a small variation in the gradients, our dynamic regret under the two-point bandit feedback matches that is achieved with full information. Beyond CCA: Moment Matching for Multi-View Models Anastasia Podosinnikova INRIA 8211 ENS . Francis Bach Inria . Simon Lacoste-Julien INRIA Paper AbstractWe introduce three novel semi-parametric extensions of probabilistic canonical correlation analysis with identifiability guarantees. We consider moment matching techniques for estimation in these models. For that, by drawing explicit links between the new models and a discrete version of independent component analysis (DICA), we first extend the DICA cumulant tensors to the new discrete version of CCA. By further using a close connection with independent component analysis, we introduce generalized covariance matrices, which can replace the cumulant tensors in the moment matching framework, and, therefore, improve sample complexity and simplify derivations and algorithms significantly. As the tensor power method or orthogonal joint diagonalization are not applicable in the new setting, we use non-orthogonal joint diagonalization techniques for matching the cumulants. We demonstrate performance of the proposed models and estimation techniques on experiments with both synthetic and real datasets. We present two computationally inexpensive techniques for estimating the numerical rank of a matrix, combining powerful tools from computational linear algebra. These techniques exploit three key ingredients. The first is to approximate the projector on the non-null invariant subspace of the matrix by using a polynomial filter. Two types of filters are discussed, one based on Hermite interpolation and the other based on Chebyshev expansions. The second ingredient employs stochastic trace estimators to compute the rank of this wanted eigen-projector, which yields the desired rank of the matrix. In order to obtain a good filter, it is necessary to detect a gap between the eigenvalues that correspond to noise and the relevant eigenvalues that correspond to the non-null invariant subspace. The third ingredient of the proposed approaches exploits the idea of spectral density, popular in physics, and the Lanczos spectroscopic method to locate this gap. Unsupervised Deep Embedding for Clustering Analysis Junyuan Xie University of Washington . Ross Girshick Facebook . Ali Farhadi University of Washington Paper AbstractClustering is central to many data-driven application domains and has been studied extensively in terms of distance functions and grouping algorithms. Relatively little work has focused on learning representations for clustering. In this paper, we propose Deep Embedded Clustering (DEC), a method that simultaneously learns feature representations and cluster assignments using deep neural networks. DEC learns a mapping from the data space to a lower-dimensional feature space in which it iteratively optimizes a clustering objective. Our experimental evaluations on image and text corpora show significant improvement over state-of-the-art methods. Dimensionality reduction is a popular approach for dealing with high dimensional data that leads to substantial computational savings. Random projections are a simple and effective method for universal dimensionality reduction with rigorous theoretical guarantees. In this paper, we theoretically study the problem of differentially private empirical risk minimization in the projected subspace (compressed domain). Empirical risk minimization (ERM) is a fundamental technique in statistical machine learning that forms the basis for various learning algorithms. Starting from the results of Chaudhuri et al. (NIPS 2009, JMLR 2011), there is a long line of work in designing differentially private algorithms for empirical risk minimization problems that operate in the original data space. We ask: is it possible to design differentially private algorithms with small excess risk given access to only projected data In this paper, we answer this question in affirmative, by showing that for the class of generalized linear functions, we can obtain excess risk bounds of O(w(Theta) n ) under eps-differential privacy, and O((w(Theta)n) ) under (eps, delta)-differential privacy, given only the projected data and the projection matrix. Here n is the sample size and w(Theta) is the Gaussian width of the parameter space that we optimize over. Our strategy is based on adding noise for privacy in the projected subspace and then lifting the solution to original space by using high-dimensional estimation techniques. A simple consequence of these results is that, for a large class of ERM problems, in the traditional setting (i. e. with access to the original data), under eps-differential privacy, we improve the worst-case risk bounds of Bassily et al. (FOCS 2014). We consider the maximum likelihood parameter estimation problem for a generalized Thurstone choice model, where choices are from comparison sets of two or more items. We provide tight characterizations of the mean square error, as well as necessary and sufficient conditions for correct classification when each item belongs to one of two classes. These results provide insights into how the estimation accuracy depends on the choice of a generalized Thurstone choice model and the structure of comparison sets. We find that for a priori unbiased structures of comparisons, e. g. when comparison sets are drawn independently and uniformly at random, the number of observations needed to achieve a prescribed estimation accuracy depends on the choice of a generalized Thurstone choice model. For a broad set of generalized Thurstone choice models, which includes all popular instances used in practice, the estimation error is shown to be largely insensitive to the cardinality of comparison sets. On the other hand, we found that there exist generalized Thurstone choice models for which the estimation error decreases much faster with the cardinality of comparison sets. Large-Margin Softmax Loss for Convolutional Neural Networks Weiyang Liu Peking University . Yandong Wen South China University of Technology . Zhiding Yu Carnegie Mellon University . Meng Yang Shenzhen University Paper AbstractCross-entropy loss together with softmax is arguably one of the most common used supervision components in convolutional neural networks (CNNs). Despite its simplicity, popularity and excellent performance, the component does not explicitly encourage discriminative learning of features. In this paper, we propose a generalized large-margin softmax (L-Softmax) loss which explicitly encourages intra-class compactness and inter-class separability between learned features. Moreover, L-Softmax not only can adjust the desired margin but also can avoid overfitting. We also show that the L-Softmax loss can be optimized by typical stochastic gradient descent. Extensive experiments on four benchmark datasets demonstrate that the deeply-learned features with L-softmax loss become more discriminative, hence significantly boosting the performance on a variety of visual classification and verification tasks. A Random Matrix Approach to Echo-State Neural Networks Romain Couillet CentraleSupelec . Gilles Wainrib ENS Ulm, Paris, France . Hafiz Tiomoko Ali CentraleSupelec, Gif-sur-Yvette, France . Harry Sevi ENS Lyon, Lyon, Paris Paper AbstractRecurrent neural networks, especially in their linear version, have provided many qualitative insights on their performance under different configurations. This article provides, through a novel random matrix framework, the quantitative counterpart of these performance results, specifically in the case of echo-state networks. Beyond mere insights, our approach conveys a deeper understanding on the core mechanism under play for both training and testing. One-hot CNN (convolutional neural network) has been shown to be effective for text categorization (Johnson 038 Zhang, 2015). We view it as a special case of a general framework which jointly trains a linear model with a non-linear feature generator consisting of text region embedding pooling8217. Under this framework, we explore a more sophisticated region embedding method using Long Short-Term Memory (LSTM). LSTM can embed text regions of variable (and possibly large) sizes, whereas the region size needs to be fixed in a CNN. We seek effective and efficient use of LSTM for this purpose in the supervised and semi-supervised settings. The best results were obtained by combining region embeddings in the form of LSTM and convolution layers trained on unlabeled data. The results indicate that on this task, embeddings of text regions, which can convey complex concepts, are more useful than embeddings of single words in isolation. We report performances exceeding the previous best results on four benchmark datasets. Crowdsourcing systems are popular for solving large-scale labelling tasks with low-paid (or even non-paid) workers. We study the problem of recovering the true labels from noisy crowdsourced labels under the popular Dawid-Skene model. To address this inference problem, several algorithms have recently been proposed, but the best known guarantee is still significantly larger than the fundamental limit. We close this gap under a simple but canonical scenario where each worker is assigned at most two tasks. In particular, we introduce a tighter lower bound on the fundamental limit and prove that Belief Propagation (BP) exactly matches this lower bound. The guaranteed optimality of BP is the strongest in the sense that it is information-theoretically impossible for any other algorithm to correctly la - bel a larger fraction of the tasks. In the general setting, when more than two tasks are assigned to each worker, we establish the dominance result on BP that it outperforms other existing algorithms with known provable guarantees. Experimental results suggest that BP is close to optimal for all regimes considered, while existing state-of-the-art algorithms exhibit suboptimal performances. Learning control has become an appealing alternative to the derivation of control laws based on classic control theory. However, a major shortcoming of learning control is the lack of performance guarantees which prevents its application in many real-world scenarios. As a step in this direction, we provide a stability analysis tool for controllers acting on dynamics represented by Gaussian processes (GPs). We consider arbitrary Markovian control policies and system dynamics given as (i) the mean of a GP, and (ii) the full GP distribution. For the first case, our tool finds a state space region, where the closed-loop system is provably stable. In the second case, it is well known that infinite horizon stability guarantees cannot exist. Instead, our tool analyzes finite time stability. Empirical evaluations on simulated benchmark problems support our theoretical results. Learning a classifier from private data distributed across multiple parties is an important problem that has many potential applications. How can we build an accurate and differentially private global classifier by combining locally-trained classifiers from different parties, without access to any partys private data We propose to transfer the knowledge of the local classifier ensemble by first creating labeled data from auxiliary unlabeled data, and then train a global differentially private classifier. We show that majority voting is too sensitive and therefore propose a new risk weighted by class probabilities estimated from the ensemble. Relative to a non-private solution, our private solution has a generalization error bounded by O(epsilon M ). This allows strong privacy without performance loss when the number of participating parties M is large, such as in crowdsensing applications. We demonstrate the performance of our framework with realistic tasks of activity recognition, network intrusion detection, and malicious URL detection. Network Morphism Tao Wei University at Buffalo . Changhu Wang Microsoft Research . Yong Rui Microsoft Research . Chang Wen Chen Paper AbstractWe present a systematic study on how to morph a well-trained neural network to a new one so that its network function can be completely preserved. We define this as network morphism in this research. After morphing a parent network, the child network is expected to inherit the knowledge from its parent network and also has the potential to continue growing into a more powerful one with much shortened training time. The first requirement for this network morphism is its ability to handle diverse morphing types of networks, including changes of depth, width, kernel size, and even subnet. To meet this requirement, we first introduce the network morphism equations, and then develop novel morphing algorithms for all these morphing types for both classic and convolutional neural networks. The second requirement is its ability to deal with non-linearity in a network. We propose a family of parametric-activation functions to facilitate the morphing of any continuous non-linear activation neurons. Experimental results on benchmark datasets and typical neural networks demonstrate the effectiveness of the proposed network morphism scheme. Second-order optimization methods such as natural gradient descent have the potential to speed up training of neural networks by correcting for the curvature of the loss function. Unfortunately, the exact natural gradient is impractical to compute for large models, and most approximations either require an expensive iterative procedure or make crude approximations to the curvature. We present Kronecker Factors for Convolution (KFC), a tractable approximation to the Fisher matrix for convolutional networks based on a structured probabilistic model for the distribution over backpropagated derivatives. Similarly to the recently proposed Kronecker-Factored Approximate Curvature (K-FAC), each block of the approximate Fisher matrix decomposes as the Kronecker product of small matrices, allowing for efficient inversion. KFC captures important curvature information while still yielding comparably efficient updates to stochastic gradient descent (SGD). We show that the updates are invariant to commonly used reparameterizations, such as centering of the activations. In our experiments, approximate natural gradient descent with KFC was able to train convolutional networks several times faster than carefully tuned SGD. Furthermore, it was able to train the networks in 10-20 times fewer iterations than SGD, suggesting its potential applicability in a distributed setting. Budget constrained optimal design of experiments is a classical problem in statistics. Although the optimal design literature is very mature, few efficient strategies are available when these design problems appear in the context of sparse linear models commonly encountered in high dimensional machine learning and statistics. In this work, we study experimental design for the setting where the underlying regression model is characterized by a ell1-regularized linear function. We propose two novel strategies: the first is motivated geometrically whereas the second is algebraic in nature. We obtain tractable algorithms for this problem and also hold for a more general class of sparse linear models. We perform an extensive set of experiments, on benchmarks and a large multi-site neuroscience study, showing that the proposed models are effective in practice. The latter experiment suggests that these ideas may play a small role in informing enrollment strategies for similar scientific studies in the short-to-medium term future. Minding the Gaps for Block Frank-Wolfe Optimization of Structured SVMs Anton Osokin . Jean-Baptiste Alayrac ENS . Isabella Lukasewitz INRIA . Puneet Dokania INRIA and Ecole Centrale Paris . Simon Lacoste-Julien INRIA Paper AbstractIn this paper, we propose several improvements on the block-coordinate Frank-Wolfe (BCFW) algorithm from Lacoste-Julien et al. (2013) recently used to optimize the structured support vector machine (SSVM) objective in the context of structured prediction, though it has wider applications. The key intuition behind our improvements is that the estimates of block gaps maintained by BCFW reveal the block suboptimality that can be used as an adaptive criterion. First, we sample objects at each iteration of BCFW in an adaptive non-uniform way via gap-based sampling. Second, we incorporate pairwise and away-step variants of Frank-Wolfe into the block-coordinate setting. Third, we cache oracle calls with a cache-hit criterion based on the block gaps. Fourth, we provide the first method to compute an approximate regularization path for SSVM. Finally, we provide an exhaustive empirical evaluation of all our methods on four structured prediction datasets. Exact Exponent in Optimal Rates for Crowdsourcing Chao Gao Yale University . Yu Lu Yale University . Dengyong Zhou Microsoft Research Paper AbstractCrowdsourcing has become a popular tool for labeling large datasets. This paper studies the optimal error rate for aggregating crowdsourced labels provided by a collection of amateur workers. Under the Dawid-Skene probabilistic model, we establish matching upper and lower bounds with an exact exponent mI(pi), where m is the number of workers and I(pi) is the average Chernoff information that characterizes the workers8217 collective ability. Such an exact characterization of the error exponent allows us to state a precise sample size requirement m ge frac logfrac in order to achieve an epsilon misclassification error. In addition, our results imply optimality of various forms of EM algorithms given accurate initializers of the model parameters. Unsupervised learning and supervised learning are key research topics in deep learning. However, as high-capacity supervised neural networks trained with a large amount of labels have achieved remarkable success in many computer vision tasks, the availability of large-scale labeled images reduced the significance of unsupervised learning. Inspired by the recent trend toward revisiting the importance of unsupervised learning, we investigate joint supervised and unsupervised learning in a large-scale setting by augmenting existing neural networks with decoding pathways for reconstruction. First, we demonstrate that the intermediate activations of pretrained large-scale classification networks preserve almost all the information of input images except a portion of local spatial details. Then, by end-to-end training of the entire augmented architecture with the reconstructive objective, we show improvement of the network performance for supervised tasks. We evaluate several variants of autoencoders, including the recently proposed 8220what-where8221 autoencoder that uses the encoder pooling switches, to study the importance of the architecture design. Taking the 16-layer VGGNet trained under the ImageNet ILSVRC 2012 protocol as a strong baseline for image classification, our methods improve the validation-set accuracy by a noticeable margin. (LRR) has been a significant method for segmenting data that are generated from a union of subspaces. It is also known that solving LRR is challenging in terms of time complexity and memory footprint, in that the size of the nuclear norm regularized matrix is n-by-n (where n is the number of samples). In this paper, we thereby develop a novel online implementation of LRR that reduces the memory cost from O(n2) to O(pd), with p being the ambient dimension and d being some estimated rank (d 20 reduction in the model size without any loss in accuracy on CIFAR-10 benchmark. We also demonstrate that fine-tuning can further enhance the accuracy of fixed point DCNs beyond that of the original floating point model. In doing so, we report a new state-of-the-art fixed point performance of 6.78 error-rate on CIFAR-10 benchmark. Provable Algorithms for Inference in Topic Models Sanjeev Arora Princeton University . Rong Ge . Frederic Koehler Princeton University . Tengyu Ma Princeton University . Ankur Moitra Paper AbstractRecently, there has been considerable progress on designing algorithms with provable guarantees 8212typically using linear algebraic methods8212for parameter learning in latent variable models. Designing provable algorithms for inference has proved more difficult. Here we take a first step towards provable inference in topic models. We leverage a property of topic models that enables us to construct simple linear estimators for the unknown topic proportions that have small variance, and consequently can work with short documents. Our estimators also correspond to finding an estimate around which the posterior is well-concentrated. We show lower bounds that for shorter documents it can be information theoretically impossible to find the hidden topics. Finally, we give empirical results that demonstrate that our algorithm works on realistic topic models. It yields good solutions on synthetic data and runs in time comparable to a single iteration of Gibbs sampling. This paper develops an approach for efficiently solving general convex optimization problems specified as disciplined convex programs (DCP), a common general-purpose modeling framework. Specifically we develop an algorithm based upon fast epigraph projections, projections onto the epigraph of a convex function, an approach closely linked to proximal operator methods. We show that by using these operators, we can solve any disciplined convex program without transforming the problem to a standard cone form, as is done by current DCP libraries. We then develop a large library of efficient epigraph projection operators, mirroring and extending work on fast proximal algorithms, for many common convex functions. Finally, we evaluate the performance of the algorithm, and show it often achieves order of magnitude speedups over existing general-purpose optimization solvers. We study the fixed design segmented regression problem: Given noisy samples from a piecewise linear function f, we want to recover f up to a desired accuracy in mean-squared error. Previous rigorous approaches for this problem rely on dynamic programming (DP) and, while sample efficient, have running time quadratic in the sample size. As our main contribution, we provide new sample near-linear time algorithms for the problem that 8211 while not being minimax optimal 8211 achieve a significantly better sample-time tradeoff on large datasets compared to the DP approach. Our experimental evaluation shows that, compared with the DP approach, our algorithms provide a convergence rate that is only off by a factor of 2 to 4, while achieving speedups of three orders of magnitude. Energetic Natural Gradient Descent Philip Thomas CMU . Bruno Castro da Silva . Christoph Dann Carnegie Mellon University . Emma Paper AbstractWe propose a new class of algorithms for minimizing or maximizing functions of parametric probabilistic models. These new algorithms are natural gradient algorithms that leverage more information than prior methods by using a new metric tensor in place of the commonly used Fisher information matrix. This new metric tensor is derived by computing directions of steepest ascent where the distance between distributions is measured using an approximation of energy distance (as opposed to Kullback-Leibler divergence, which produces the Fisher information matrix), and so we refer to our new ascent direction as the energetic natural gradient. Partition Functions from Rao-Blackwellized Tempered Sampling David Carlson Columbia University . Patrick Stinson Columbia University . Ari Pakman Columbia University . Liam Paper AbstractPartition functions of probability distributions are important quantities for model evaluation and comparisons. We present a new method to compute partition functions of complex and multimodal distributions. Such distributions are often sampled using simulated tempering, which augments the target space with an auxiliary inverse temperature variable. Our method exploits the multinomial probability law of the inverse temperatures, and provides estimates of the partition function in terms of a simple quotient of Rao-Blackwellized marginal inverse temperature probability estimates, which are updated while sampling. We show that the method has interesting connections with several alternative popular methods, and offers some significant advantages. In particular, we empirically find that the new method provides more accurate estimates than Annealed Importance Sampling when calculating partition functions of large Restricted Boltzmann Machines (RBM) moreover, the method is sufficiently accurate to track training and validation log-likelihoods during learning of RBMs, at minimal computational cost. In this paper we address the identifiability and efficient learning problems of finite mixtures of Plackett-Luce models for rank data. We prove that for any kgeq 2, the mixture of k Plackett-Luce models for no more than 2k-1 alternatives is non-identifiable and this bound is tight for k2. For generic identifiability, we prove that the mixture of k Plackett-Luce models over m alternatives is if kleqlfloorfrac 2rfloor. We also propose an efficient generalized method of moments (GMM) algorithm to learn the mixture of two Plackett-Luce models and show that the algorithm is consistent. Our experiments show that our GMM algorithm is significantly faster than the EMM algorithm by Gormley 038 Murphy (2008), while achieving competitive statistical efficiency. The combinatorial explosion that plagues planning and reinforcement learning (RL) algorithms can be moderated using state abstraction. Prohibitively large task representations can be condensed such that essential information is preserved, and consequently, solutions are tractably computable. However, exact abstractions, which treat only fully-identical situations as equivalent, fail to present opportunities for abstraction in environments where no two situations are exactly alike. In this work, we investigate approximate state abstractions, which treat nearly-identical situations as equivalent. We present theoretical guarantees of the quality of behaviors derived from four types of approximate abstractions. Additionally, we empirically demonstrate that approximate abstractions lead to reduction in task complexity and bounded loss of optimality of behavior in a variety of environments. Power of Ordered Hypothesis Testing Lihua Lei Lihua . William Fithian UC Berkeley, Department of Statistics Paper AbstractOrdered testing procedures are multiple testing procedures that exploit a pre-specified ordering of the null hypotheses, from most to least promising. We analyze and compare the power of several recent proposals using the asymptotic framework of Li 038 Barber (2015). While accumulation tests including ForwardStop can be quite powerful when the ordering is very informative, they are asymptotically powerless when the ordering is weaker. By contrast, Selective SeqStep, proposed by Barber 038 Candes (2015), is much less sensitive to the quality of the ordering. We compare the power of these procedures in different regimes, concluding that Selective SeqStep dominates accumulation tests if either the ordering is weak or non-null hypotheses are sparse or weak. Motivated by our asymptotic analysis, we derive an improved version of Selective SeqStep which we call Adaptive SeqStep, analogous to Storeys improvement on the Benjamini-Hochberg proce - dure. We compare these methods using the GEO-Query data set analyzed by (Li 038 Barber, 2015) and find Adaptive SeqStep has favorable performance for both good and bad prior orderings. PHOG: Probabilistic Model for Code Pavol Bielik ETH Zurich . Veselin Raychev ETH Zurich . Martin Vechev ETH Zurich Paper AbstractWe introduce a new generative model for code called probabilistic higher order grammar (PHOG). PHOG generalizes probabilistic context free grammars (PCFGs) by allowing conditioning of a production rule beyond the parent non-terminal, thus capturing rich contexts relevant to programs. Even though PHOG is more powerful than a PCFG, it can be learned from data just as efficiently. We trained a PHOG model on a large JavaScript code corpus and show that it is more precise than existing models, while similarly fast. As a result, PHOG can immediately benefit existing programming tools based on probabilistic models of code. We consider the problem of online prediction in changing environments. In this framework the performance of a predictor is evaluated as the loss relative to an arbitrarily changing predictor, whose individual components come from a base class of predictors. Typical results in the literature consider different base classes (experts, linear predictors on the simplex, etc.) separately. Introducing an arbitrary mapping inside the mirror decent algorithm, we provide a framework that unifies and extends existing results. As an example, we prove new shifting regret bounds for matrix prediction problems. Hyperparameter selection generally relies on running multiple full training trials, with selection based on validation set performance. We propose a gradient-based approach for locally adjusting hyperparameters during training of the model. Hyperparameters are adjusted so as to make the model parameter gradients, and hence updates, more advantageous for the validation cost. We explore the approach for tuning regularization hyperparameters and find that in experiments on MNIST, SVHN and CIFAR-10, the resulting regularization levels are within the optimal regions. The additional computational cost depends on how frequently the hyperparameters are trained, but the tested scheme adds only 30 computational overhead regardless of the model size. Since the method is significantly less computationally demanding compared to similar gradient-based approaches to hyperparameter optimization, and consistently finds good hyperparameter values, it can be a useful tool for training neural network models. Many of the recent Trajectory Optimization algorithms alternate between local approximation of the dynamics and conservative policy update. However, linearly approximating the dynamics in order to derive the new policy can bias the update and prevent convergence to the optimal policy. In this article, we propose a new model-free algorithm that backpropagates a local quadratic time-dependent Q-Function, allowing the derivation of the policy update in closed form. Our policy update ensures exact KL-constraint satisfaction without simplifying assumptions on the system dynamics demonstrating improved performance in comparison to related Trajectory Optimization algorithms linearizing the dynamics. Due to its numerous applications, rank aggregation has become a problem of major interest across many fields of the computer science literature. In the vast majority of situations, Kemeny consensus(es) are considered as the ideal solutions. It is however well known that their computation is NP-hard. Many contributions have thus established various results to apprehend this complexity. In this paper we introduce a practical method to predict, for a ranking and a dataset, how close the Kemeny consensus(es) are to this ranking. A major strength of this method is its generality: it does not require any assumption on the dataset nor the ranking. Furthermore, it relies on a new geometric interpretation of Kemeny aggregation that, we believe, could lead to many other results. Horizontally Scalable Submodular Maximization Mario Lucic ETH Zurich . Olivier Bachem ETH Zurich . Morteza Zadimoghaddam Google Research . Andreas Krause Paper AbstractA variety of large-scale machine learning problems can be cast as instances of constrained submodular maximization. Existing approaches for distributed submodular maximization have a critical drawback: The capacity 8211 number of instances that can fit in memory 8211 must grow with the data set size. In practice, while one can provision many machines, the capacity of each machine is limited by physical constraints. We propose a truly scalable approach for distributed submodular maximization under fixed capacity. The proposed framework applies to a broad class of algorithms and constraints and provides theoretical guarantees on the approximation factor for any available capacity. We empirically evaluate the proposed algorithm on a variety of data sets and demonstrate that it achieves performance competitive with the centralized greedy solution. Group Equivariant Convolutional Networks Taco Cohen University of Amsterdam . Max Welling University of Amsterdam CIFAR Paper AbstractWe introduce Group equivariant Convolutional Neural Networks (G-CNNs), a natural generalization of convolutional neural networks that reduces sample complexity by exploiting symmetries. G-CNNs use G-convolutions, a new type of layer that enjoys a substantially higher degree of weight sharing than regular convolution layers. G-convolutions increase the expressive capacity of the network without increasing the number of parameters. Group convolution layers are easy to use and can be implemented with negligible computational overhead for discrete groups generated by translations, reflections and rotations. G-CNNs achieve state of the art results on CIFAR10 and rotated MNIST. The partition function is fundamental for probabilistic graphical models8212it is required for inference, parameter estimation, and model selection. Evaluating this function corresponds to discrete integration, namely a weighted sum over an exponentially large set. This task quickly becomes intractable as the dimensionality of the problem increases. We propose an approximation scheme that, for any discrete graphical model whose parameter vector has bounded norm, estimates the partition function with arbitrarily small error. Our algorithm relies on a near minimax optimal polynomial approximation to the potential function and a Clenshaw-Curtis style quadrature. Furthermore, we show that this algorithm can be randomized to split the computation into a high-complexity part and a low-complexity part, where the latter may be carried out on small computational devices. Experiments confirm that the new randomized algorithm is highly accurate if the parameter norm is small, and is otherwise comparable to methods with unbounded error. Correcting Forecasts with Multifactor Neural Attention Matthew Riemer IBM . Aditya Vempaty IBM . Flavio Calmon IBM . Fenno Heath IBM . Richard Hull IBM . Elham Khabiri IBM Paper AbstractAutomatic forecasting of time series data is a challenging problem in many industries. Current forecast models adopted by businesses do not provide adequate means for including data representing external factors that may have a significant impact on the time series, such as weather, national events, local events, social media trends, promotions, etc. This paper introduces a novel neural network attention mechanism that naturally incorporates data from multiple external sources without the feature engineering needed to get other techniques to work. We demonstrate empirically that the proposed model achieves superior performance for predicting the demand of 20 commodities across 107 stores of one of America8217s largest retailers when compared to other baseline models, including neural networks, linear models, certain kernel methods, Bayesian regression, and decision trees. Our method ultimately accounts for a 23.9 relative improvement as a result of the incorporation of external data sources, and provides an unprecedented level of descriptive ability for a neural network forecasting model. Observational studies are rising in importance due to the widespread accumulation of data in fields such as healthcare, education, employment and ecology. We consider the task of answering counterfactual questions such as, 8220Would this patient have lower blood sugar had she received a different medication8221. We propose a new algorithmic framework for counterfactual inference which brings together ideas from domain adaptation and representation learning. In addition to a theoretical justification, we perform an empirical comparison with previous approaches to causal inference from observational data. Our deep learning algorithm significantly outperforms the previous state-of-the-art. Gaussian Processes (GPs) provide a general and analytically tractable way of modeling complex time-varying, nonparametric functions. The Automatic Bayesian Covariance Discovery (ABCD) system constructs natural-language description of time-series data by treating unknown time-series data nonparametrically using GP with a composite covariance kernel function. Unfortunately, learning a composite covariance kernel with a single time-series data set often results in less informative kernel that may not give qualitative, distinctive descriptions of data. We address this challenge by proposing two relational kernel learning methods which can model multiple time-series data sets by finding common, shared causes of changes. We show that the relational kernel learning methods find more accurate models for regression problems on several real-world data sets US stock data, US house price index data and currency exchange rate data. We introduce a new approach for amortizing inference in directed graphical models by learning heuristic approximations to stochastic inverses, designed specifically for use as proposal distributions in sequential Monte Carlo methods. We describe a procedure for constructing and learning a structured neural network which represents an inverse factorization of the graphical model, resulting in a conditional density estimator that takes as input particular values of the observed random variables, and returns an approximation to the distribution of the latent variables. This recognition model can be learned offline, independent from any particular dataset, prior to performing inference. The output of these networks can be used as automatically-learned high-quality proposal distributions to accelerate sequential Monte Carlo across a diverse range of problem settings. Slice Sampling on Hamiltonian Trajectories Benjamin Bloem-Reddy Columbia University . John Cunningham Columbia University Paper AbstractHamiltonian Monte Carlo and slice sampling are amongst the most widely used and studied classes of Markov Chain Monte Carlo samplers. We connect these two methods and present Hamiltonian slice sampling, which allows slice sampling to be carried out along Hamiltonian trajectories, or transformations thereof. Hamiltonian slice sampling clarifies a class of model priors that induce closed-form slice samplers. More pragmatically, inheriting properties of slice samplers, it offers advantages over Hamiltonian Monte Carlo, in that it has fewer tunable hyperparameters and does not require gradient information. We demonstrate the utility of Hamiltonian slice sampling out of the box on problems ranging from Gaussian process regression to Pitman-Yor based mixture models. Noisy Activation Functions Caglar Glehre . Marcin Moczulski . Misha Denil . Yoshua Bengio U. of Montreal Paper AbstractCommon nonlinear activation functions used in neural networks can cause training difficulties due to the saturation behavior of the activation function, which may hide dependencies that are not visible to vanilla-SGD (using first order gradients only). Gating mechanisms that use softly saturating activation functions to emulate the discrete switching of digital logic circuits are good examples of this. We propose to exploit the injection of appropriate noise so that the gradients may flow easily, even if the noiseless application of the activation function would yield zero gradients. Large noise will dominate the noise-free gradient and allow stochastic gradient descent to explore more. By adding noise only to the problematic parts of the activation function, we allow the optimization procedure to explore the boundary between the degenerate saturating) and the well-behaved parts of the activation function. We also establish connections to simulated annealing, when the amount of noise is annealed down, making it easier to optimize hard objective functions. We find experimentally that replacing such saturating activation functions by noisy variants helps optimization in many contexts, yielding state-of-the-art or competitive results on different datasets and task, especially when training seems to be the most difficult, e. g. when curriculum learning is necessary to obtain good results. PD-Sparse. A Primal and Dual Sparse Approach to Extreme Multiclass and Multilabel Classification Ian En-Hsu Yen University of Texas at Austin . Xiangru Huang UTaustin . Pradeep Ravikumar UT Austin . Kai Zhong ICES department, University of Texas at Austin . Inderjit Paper AbstractWe consider Multiclass and Multilabel classification with extremely large number of classes, of which only few are labeled to each instance. In such setting, standard methods that have training, prediction cost linear to the number of classes become intractable. State-of-the-art methods thus aim to reduce the complexity by exploiting correlation between labels under assumption that the similarity between labels can be captured by structures such as low-rank matrix or balanced tree. However, as the diversity of labels increases in the feature space, structural assumption can be easily violated, which leads to degrade in the testing performance. In this work, we show that a margin-maximizing loss with l1 penalty, in case of Extreme Classification, yields extremely sparse solution both in primal and in dual without sacrificing the expressive power of predictor. We thus propose a Fully-Corrective Block-Coordinate Frank-Wolfe (FC-BCFW) algorithm that exploits both primal and dual sparsity to achieve a complexity sublinear to the number of primal and dual variables. A bi-stochastic search method is proposed to further improve the efficiency. In our experiments on both Multiclass and Multilabel problems, the proposed method achieves significant higher accuracy than existing approaches of Extreme Classification with very competitive training and prediction time.
No comments:
Post a Comment