发布时间:2021-01-01    来源:ag真人登录 nbsp;   浏览:25580次


Our ability to collect data far outpaces our ability to fully utilize it—yet those data may hold the key to solving some of the biggest global challenges facing us today.大家搜集信息的能力比较之下好于剖析用以的能力,殊不知,这种信息有可能包含了大家目前已经应对的国际性挑戰的解决方案。Take, for instance, the frequent outbreaks of waterborne illnesses as a consequence of war or natural disasters. The most recent example can be found in Yemen, where roughly 10,000 new suspected cases of cholera are reported each week—and history is riddled with similar stories. What if we could better understand the environmental factors that contributed to the disease, predict which communities are at higher risk, and put in place protective measures to stem the spread?例如,战争结束后或自然界灾祸引起的水资源性病频烦越来越激烈。近期的事例再次出现在利比亚,每一个礼拜利比亚探索与发现大概一万例疑似传染病病案。

并且历史时间一直相仿的。如果我们能更优地讲解环境要素对此病的危害,提前预测高危小区,以防御性方式来劝阻根源散播,将不容易如何呢?Answers to these questions and others like them could potentially help us avert catastrophe.这种难题和别的相仿难题的回答很有可能会潜在地帮助大家劝阻灾祸。We already collect data related to virtually everything, from birth and death rates to crop yields and traffic flows. IBM estimates that each day, 2.5 quintillion bytes of data are generated. To put that in perspective: thats the equivalent of all the data in the Library of Congress being produced more than 166,000 times per 24-hour period. Yet we dont really harness the power of all this information. Its time that changed—and thanks to recent advances in data analytics and computational services, we finally have the tools to do it.大家彻底为每种物品收集数据,从生育率致死率到谷物自变量和交通条件。IBM企业估计每日有2.五个五万亿元字节数的数据造成。


As a data scientist for Los Alamos National Laboratory, I study data from wide-ranging, public sources to identify patterns in hopes of being able to predict trends that could be a threat to global security. Multiple data streams are critical because the ground-truth data (such as surveys) that we collect is often delayed, biased, sparse, incorrect or, sometimes, nonexistent.做为洛斯阿拉莫斯国家级实验室的数据生物学家,我科学研究来源于广泛公共性来源于的数据,以确定方式,期待必须预测有可能对全世界安全系数造成威胁的发展趋势。好几个数据流是尤为重要的,由于大家收集的基本事实数据(例如调研)常常是推迟的、有成见的、较密的、不精确的,有时候乃至不是不会有的。

For example, knowing mosquito incidence in communities would help us predict the risk of mosquito-transmitted disease such as dengue, the leading cause of illness and death in the tropics. However, mosquito data at a global (and even national) scale are not available.荐个事例,了解蚊虫在一个小区的感染发病率将不容易帮助大家预测蚊虫的感染登革热病的风险性,疟疾是导致热带气候病症和丧命的主要缘故。殊不知,现阶段还没有全世界(乃至全国各地)经营规模的蚊子数据。To address this gap, were using other sources such as satellite imagery, climate data and demographic information to estimate dengue risk. Specifically, we had success predicting the spread of dengue in Brazil at the regional, state and municipality level using these data streams as well as clinical surveillance data and Google search queries that used terms related to the disease. While our predictions arent perfect, they show promise. Our goal is to combine information from each data stream to further refine our models and improve their predictive power.为了更好地弥补这一差别,大家已经利用卫星影像图、气侯数据和人口数量信息等其他来源来估计疟疾风险性。


从总体上,大家顺利地利用这种数据流、临床医学检测数据和用以与病症相关的专业术语的谷歌搜寻搜索,预测了疟疾在墨西哥的地域、州和市一级的涌向。尽管大家的预测并不完美,但他们说明出拥有期待。大家的总体目标是将来源于每一个数据流的信息结合一起,以更进一步完善大家的实体模型并提高他们的预测能力。Similarly, to forecast the flu season, we have found that Wikipedia and Google searches can complement clinical data. Because the rate of people searching the internet for flu symptoms often increases during their onset, we can predict a spike in cases where clinical data lags.某种意义,为了更好地预测流感季节,大家寻找wiki百科和谷歌搜寻能够补充临床医学数据。

因为大家在互联网技术上寻找流感的症状的比例在发病期内经常降低,我们可以预测到临床医学数据缓慢的病案不容易经常会出现猛增。Were using these same concepts to expand our research beyond disease prediction to better understand public sentiment. In partnership with the University of California, were conducting a three-year study using disparate data streams to understand whether opinions expressed on social media map to opinions expressed in surveys.大家用某种意义的定义来扩展大家的科学研究以更优地讲解大家的好点子。大家已经进行一项与美国加州大学协作的历时三年的科学研究,该科学研究应用各有不同的数据流来了解社交网络上所传递的见解否与调研中所诠释的完全一致。

For example, in Colombia, we are conducting a study to see whether social media posts about the peace process between the government and FARC, the socialist guerilla movement, can be ground-truthed with survey data. A University of California, Berkeley researcher is conducting on-the-ground surveys throughout Colombia—including in isolated rural areas—to poll citizens about the peace process. Meanwhile, at Los Alamos, were analyzing social media data and news sources from the same areas to determine if they align with the survey data.比如,在澳大利亚,大家已经进行一项科学研究,想起有关政府部门和社会主义社会游击队员健身运动中间友谊过程的社交网络贴子否可以用调研数据来确认。美国加州大学伯克利大学的一名研究者已经澳大利亚全国各地(还包含偏远的乡村地域)进行实地考察,调研中国公民对友谊过程的见解。此外,在洛斯阿拉莫斯,大家已经剖析来源于同一地域的社交网络数据和新闻来源,以确定他们否与调研数据完全一致。

If we can demonstrate that social media accurately captures a populations sentiment, it could be a more affordable, accessible and timely alternative to what are otherwise expensive and logistically challenging surveys. In the case of disease forecasting, if social media posts did indeed serve as a predictive tool for outbreaks, those data could be used in educational campaigns to inform citizens of the risk of an outbreak (due to vaccine exemptions, for example) and ultimately reduce that risk by promoting protective behaviors (such as washing hands, wearing masks, remaining indoors, etc. ).如果我们能证实社交媒体能精准猎捕群众心态,对比于划算、交通出行十分麻烦的调研来讲,它就可以沦落一种更为性价比高、可出示和立即的取代方式。如预测分析疾病时,假如社交媒体数据信息显而易见是合理地预测分析疾病愈演愈烈的专用工具,这种数据信息就可以用于文化教育群众,对他说她们有疾病愈演愈烈的风险性(比如预苗免税政策),并最终根据提高防御性对策来扩大伤害(如汲取、戴着口罩、待在房间内等)。

All of this illustrates the potential for big data to solve big problems. Los Alamos and other national laboratories that are home to some of the worlds largest supercomputers have the computational power augmented by machine learning and data analysis to take this information and shape it into a story that tells us not only about one state or even nation, but the world as a whole. The information is there; now its time to use it.全部这种都强调用互联网大数据解决困难问题的发展潜力。洛斯阿拉莫斯和别的国家级实验室具有全球仅次的超级计算机,且由于深度学习和数据统计分析,其计算能力更加强悍,因而能够应用信息,传输信息,某种意义造福一个州,一个国家,并且是整个世界。