Sequential Bayesian updating for Big Data

Abstract

The velocity, volume, and variety of big data present both challenges and opportunities for cognitive science. We introduce sequential Bayesian updating as a tool to mine these three core properties. In the Bayesian approach, we summarize the current state of knowledge regarding parameters in terms of their posterior distributions, and use these as prior distributions when new data become available. Crucially, we construct posterior distributions in such a way that we avoid having to repeat computing the likelihood of old data as new data become available, allowing the propagation of information without great computational demand. As a result, these Bayesian methods allow continuous inference on voluminous information streams in a timely manner. We illustrate the advantages of sequential Bayesian updating with data from the MindCrowd project, in which crowd-sourced data are used to study Alzheimer's Dementia. We fit an extended Linear Approach to Threshold with Ergodic Rate model to reaction time data from the project in order to separate two distinct aspects of cognitive functioning: speed of information accumulation and caution.

Citation

Oravecz, Z., Huentelman, M., & Vandekerckhove, J. (2016). Sequential Bayesian updating for Big Data. Big Data in Cognitive Science: From Methods to Insights, 13–33.

Bibtex

@chapter{oravecz_etal:2016:Sequential,
    title   = {{S}equential {B}ayesian updating for {B}ig {D}ata},
    author  = {Oravecz, Zita and Huentelman, Matt and Vandekerckhove, Joachim},
    year    = {2016},
    journal = {Big Data in Cognitive Science: From Methods to Insights},
    pages   = {13--33}
}