Big Data
Wikipedia: Big data
Data produced every two days is the equivalent of all data available up to 2003 — Google’s Schmidt
Global data volume doubles every two years
90% of data in the world was produced in the last two years
Is There Hope for Small Firms, the Have-Nots in the World of Big Data?
Generally
- Data (crude oil) versus actionable insights
- Not a Panacea
- Good for increasing engagement with current customers
- Increases lock-in
- Useful when the past is a good predictor of future events (How big data lets us see a little further into the unknown)
- Realtime insights (Beyond
Big Data: Prepare for Real-Time Insights) - Less useful for changing a brand, launching a new product etc.
- A revolution?
Data Sources
- Sensors: Internet of Things (cameras in metro areas, structures. 200+ sensors in cars)
- Connected devices
- Social Media, data exhaust (Free space)
- Clickstream data
- Geo data
- Loyalty cards
- Transactional data
- Structured versus unstructured data
How can you determine who would be most popular in the lead role in 50 Shades ?
Three aspects to Big Data
- Storage
- Analytics
- Visuualization
Attributes of Big data
- Volume
- Velocity (Fast data, actionable now)
- Variety
- Veracity
Changes in analysis
- samples, statistical significance
- correlations, to design algorithms
- The End of Theory: The Data Deluge Makes the Scientific Method Obsolete?
- patterns
- What versus Why?
- Causation versus Correlation
- Limitation: we know what, but we don’t know why?
Hadoop:
- Makes Big data more managable
- Wikipedia: Apache Hadoop
- MapReduce (origins, Google, Yahoo!)
- Apache Hadoop
- Hadoop: What it is, how it works, and what it can do
- Hadoop Marketplace (more permissive license than Linux):
- Cloudera
- Hortonworks
- new architecture for storage and processing
- entirely scalable with cheap hardware
- redundancy
- open source
- developing ecosystem
- Amazon And IBM vs. Open Source Hadoop: Bigness May Not Beat Quality
Algorithm
- Algorithm designed from the data and other sources. Then individual data mapped against the algorithm(s) to present results. Adapt accordingly. It’s a repeating set of rules.
- Algorithmic business model based on Big data analytics: “if this, then that …”
- Algorithms predict the future, based on Big data (the past)
- Algorithms are a set of rules, which are created by understanding a business’ data (big data analysis) and from external sources.
- Input data is unique, fixed rules defined by algorithm(s), output unique
- Replaces human intuition
The Filter Bubble: Algorithm vs. Curator & the Value of Serendipity
What determines your filter bubble?
Machine Learning
- Increasing returns on data, more data, smarter results
- Google (Hummingbird), Amazon, Facebook, Netflix, Walmart, Siri, etc.
Old World, New World
- Old world, Analogue: same content, by vehicle / medium / retail store etc.
- New world, Digital: content driven by algorithms, for example:
Examples of algorithmic driven content to increase engagement and lock-in:
- netflix recommendations (Cinematch) adding meta data
- amazon recommendations, versus human editors
- facebook, origin of newsfeed, design of newsfeed (algorthmic filtering: EdgeRank and EdgeRank Checker)
The Hard Truth About How The Facebook News Feed Works Now - facebook display advertisements and native ads
- google search results: PageRank (better results, more use, Michael Jackson)
- online dating service
- newspaper stories
- Target
- Walmart
- Movie production: Epagogix
- Wall Street computer trading, up to 70% of trades (Black Scholes)
- Facial recognition, algorithms to read expressions
Thought: Fake tweet after AP was hacked (Obama injured) causing an immediate 140 dip in the stock market, before it recovered as quickly: False White House explosion tweet rattles market (Veracity)
Question: How can a mobile app augment old world (shop) with new world (content specific to the individual shopper?)): Dumb retail to Smart retail environment
Additonal Trends
- Moore’s Law, cost of processing
- Cost of Storage
- Cloud Services: variable cost versus fixed cost investment
- Syndicated data (Neilsen scanner data)
- growth in Software as a Service (SaaS) Industry providing Big Data solutions
- Continuous Analytics
- Predictive Analytics
- Sentiment Analytics
Additional issues
- Primary data collection versus secondary use (sometimes unimagined)
- Additional Uses
- design better offers
- target better consumers
- listen
- innovation
- increased agility
- first degree price descrimination: willingness to pay
Cloud Computing
Privacy
- Law (keeping up with evolution of Big Data)
- Three Big Privacy Changes to Plan for in 2014
- terms of service, opt in (changes) Instagram:
Instagram says it won’t sell your photos to advertisers - Google and Facebook changes (facial recognition software?)
- privacy policy statement, binary choice.
- privacy policy changes are a business decision, for example to better target advertising
- scope of reach regarding privacy: Facebook (Facebook Tops 127 Of 137 Countries In June 2013 World Map Of Social Networks), Google etc.
- tension between marketers and privacy, better data = better offers ?
- creep factor
- abuse
- knowledge of data collection
- knowledge of use of data (sometimes unimagined)
- problems with anonymization
- lack of obscurity
- “opt in” versus “opt out”
- reselling data
- secondary use of data
- China versus USA (NSA snooping allegations)
Hackers
‘Worst breach in history’ puts data-security pressure on retail industry: need to move to EMV Standard (high switching costs)