Big Data: Tools and Statistical Methods - Virtual Classroom

Date: Tuesday 18 May 2021, 9.30AM
Location: Online
CPD: 12.0 hours
RSS Training


Share this event

Level: Intermediate (I)


The emergence of Big Data as a recognised and sought-after technological capability can be traced to the following factors: the general recognition that data is ubiquitous and is an asset from which organisations can derive business value; the efficient interconnectivity of sensors, devices, networks, services and consumers, which allows data to be transported with relative ease as well as the emergence of middleware processing platforms, such as Hadoop, InfoSphere Streams, Accumulo, Storm, Spark and Elastic Search, which empower developers to efficiently create distributed fault-tolerant applications that execute statistical analytics at scale.
 

Level: Intermediate (I)


The emergence of Big Data as a recognised and sought-after technological capability can be traced to the following factors: the general recognition that data is ubiquitous and is an asset from which organisations can derive business value; the efficient interconnectivity of sensors, devices, networks, services and consumers, which allows data to be transported with relative ease as well as the emergence of middleware processing platforms, such as Hadoop, InfoSphere Streams, Accumulo, Storm, Spark and Elastic Search, which empower developers to efficiently create distributed fault-tolerant applications that execute statistical analytics at scale.

In order to promote the use of advanced statistical methods within a Big Data environment -- an essential requirement if correct conclusions are to be reached -- statisticians and data scientists must use Big Data tools when supporting or performing data analysis.

The objective of this two day virtual course is to train statistically-minded practitioners in the use of common Big Data tools, with an emphasis on the use of advanced statistical methods for analysis. The course will focus on the application of statistical methods in the processing platforms Hadoop and Spark and will highlight how these can be used to analyse data at scale.


Learning Outcomes

Following this course the attendees will:

  • Gain an understanding of the Big Data platforms Hadoop and Spark
  • Develop hands-on experience of using these platforms to analyse data
  • Gain an understanding of the classes of statistical methods used on these platforms                          


Topics Covered

  • The Big Data landscape
  • Hadoop
  • Map Reduce
  • Python and Hadoop
  • An introduction to functional programming and Spark
  • Statistical operations in Spark
  • Anomaly detection in network data


Target Audience

Statisticians and data scientists wishing to use emerging computing platforms (Hadoop and Spark) to perform statistical inference across large datasets.


Knowledge Assumed

Familiarity with at least one of the programming languages mentioned.

Delegates are expected to bring a laptop with the latest version of PuTTY installed.

 

Mark Briers

Mark Briers has over fifteen years of experience working in statistical research, leading multi-million pound research contracts to develop inference methodologies to solve statistical problems. He is an Honorary Senior Lecturer at Imperial College London, Secretary for the Emerging Applications Section, and Committee member for the RSS Conference organising committee. Mark completed his PhD at Cambridge University in 2007 as an Industrial Fellow of the 1851 Royal Commission. He has over 20 publications in statistics and engineering journals and conferences.

 

Fees

   

Registration before
 18 April 2021

 

Registration on/after
 18 April 2021

                                  

Non Member 

RSS Fellow 

RSS CStat: also MIS, FIS & GradStat

 

£611.00+vat 

£520.00+vat 

£490.00+vat

£680.00+vat 

£577.00+vat 

£543.00+va