{"id":396,"date":"2017-09-21T18:55:44","date_gmt":"2017-09-21T15:55:44","guid":{"rendered":"https:\/\/plus.google.com\/+JaniUusitalo\/posts\/9pGj2qWyqHz"},"modified":"2017-09-21T18:55:44","modified_gmt":"2017-09-21T15:55:44","slug":"in-pathak-and-agrawals-machine-learning-version-of-this-surprise-driven-curiosity-the-ai-first-mathematically-represents-what-the-current-video","status":"publish","type":"post","link":"https:\/\/mummila.net\/kettinki\/linkki\/396","title":{"rendered":"<svg class=\"svg-icon svg-icon-link\" aria-labelledby=\"title-6a6df5d4901c7\" role=\"img\"><title id=\"title-6a6df5d4901c7\">Link<\/title> <use href=\"#link\" xlink:href=\"#link\"><\/use> <\/svg>396"},"content":{"rendered":"<p>&quot;In Pathak and Agrawal\u2019s machine-learning version of this surprise-driven curiosity, the AI first mathematically represents what the current video frame of Super Mario Bros. looks like. Then it predicts what the game will look like several frames hence. Such a feat is well within the powers of current deep-learning systems. But then Pathak and Agrawal\u2019s ICM does something more. It generates an intrinsic reward signal defined by how wrong this prediction model turns out to be. The higher the error rate \u2014 that is, the more surprised it is \u2014 the higher the value of its intrinsic reward function. In other words, if a surprise is equivalent to noticing when something doesn\u2019t turn out as expected \u2014 that is, to being wrong \u2014 then Pathak and Agrawal\u2019s system gets rewarded for being surprised.<\/p>\n<p>This internally generated signal draws the agent toward unexplored states in the game: informally speaking, it gets curious about what it doesn\u2019t yet know. And as the agent learns \u2014 that is, as its prediction model becomes less and less wrong \u2014 its reward signal from the ICM decreases, freeing the agent up to maximize the reward signal by exploring other, more surprising situations. \u201cIt\u2019s a way to make exploration go faster,\u201d Pathak said.<\/p>\n<p>This feedback loop also allows the AI to quickly bootstrap itself out of a nearly blank-slate state of ignorance. At first, the agent is curious about any basic movement available to its onscreen body: Pressing right nudges Mario to the right, and then he stops; pressing right several times in a row makes Mario move without immediately stopping; pressing up makes him spring into the air, and then come down again; pressing down has no effect. This simulated motor babbling quickly converges on useful actions that move the agent forward into the game, even though the agent doesn\u2019t know it.&quot;<br \/>\n<br \/>\n<a href=\"https:\/\/www.quantamagazine.org\/clever-machines-learn-how-to-be-curious-20170919\/\"><img decoding=\"async\" src=\"https:\/\/d2r55xnwy6nx47.cloudfront.net\/uploads\/2017\/09\/CuriosityCode_520x292.jpg\" \/><\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>&quot;In Pathak and Agrawal\u2019s machine-learning version of this surprise-driven curiosity, the AI first mathematically represents what the current video frame of Super Mario Bros. looks like. Then it predicts what the game will look like several frames hence. Such a feat is well within the powers of current deep-learning systems. But then Pathak and Agrawal\u2019s [&hellip;]<\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[14097],"tags":[],"class_list":["post-396","post","type-post","status-publish","format-standard","hentry","category-in-english"],"_links":{"self":[{"href":"https:\/\/mummila.net\/kettinki\/wp-json\/wp\/v2\/posts\/396","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/mummila.net\/kettinki\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/mummila.net\/kettinki\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/mummila.net\/kettinki\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/mummila.net\/kettinki\/wp-json\/wp\/v2\/comments?post=396"}],"version-history":[{"count":0,"href":"https:\/\/mummila.net\/kettinki\/wp-json\/wp\/v2\/posts\/396\/revisions"}],"wp:attachment":[{"href":"https:\/\/mummila.net\/kettinki\/wp-json\/wp\/v2\/media?parent=396"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/mummila.net\/kettinki\/wp-json\/wp\/v2\/categories?post=396"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/mummila.net\/kettinki\/wp-json\/wp\/v2\/tags?post=396"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}