Big data hyper-hypo-hyperbole or reality?

Is “BIG DATA” hype? Have we over-sugared ourselves with too much big data candy? I’ll dodge the answer and instead present you with four interesting resources addressing this issue.

First up is a great Big Data intro video shown at the 2012 SAS Analytics conference. What I like about this video (even though it could have been cut by 30 seconds) is that it really frames the issues well.

Second, is an excellent article on big data recently published by the Harvard Business Review. This article points out that big data will make an impact, but not in the traditional sense. “Traditional” big data analytics focuses on prediction, but in the future big data will have more transformative impact on areas such as mobile-location analytics, personalized medicine, and artificial intelligence.

Third, in this blog post on big dataJason Rushin notes that

In this era of digital everything, nearly every marketer has access to more data than they can reasonably handle. A single web visit by a single customer can result in thousands of data points across items viewed, locations, durations, browser, referral, clickstream, frequency, etc.  Couple that with device, payment methods, demographic data, product attributes, not to mention data across your other channels, and any retailer is quickly drowning in data.

Rushin points that regardless of the size of your data set, your inability to act on this data set is what matters. He advises you to look for solutions that can readily supply BI value and insights.

Finally, I encourage you to spend 40 minutes and watch this video presentation by Jim Stogdill on how corporations will evolve leveraging big data (tasty tidbit: hear how a corporation is compared to a nematode).

Research on visual data mining for use in sentiment analysis

Below is some recent research on visual data mining.

Mining emoticons to assess sentiment

Detection of genuine reviews of products or services using visual data mining

Using visual data mining techniques to generate interactive news flow visualization across social media streams

Proposed UI to better visualize and analyze sentiment

Visual analytics in interactive and security systems

Innovation in social analytics

Data analysis is the new plastics. Remember this scene from the movie the The Graduate?

Below is a curated list of articles from this week of innovative social analytics and business intelligence initiatives.

In this article from O’Reilly Radar, we learn that social network analysis is amalgamation of social science analysis such as sociology, political science, psychology, and anthropology combined with traditional mathematical measurements. At it’s core, social network analysis measures relationships between people and organizations. But cutting edge research is also looking at ways to leverage social network analysis as a form of early warning system for natural disasters. Much social network analysis has been regressive in nature, the future will focus more on real time analysis.

And speaking of real time analytics, the article from the Washington Post makes the argument that real time results may have a significant influence on the up-coming 2012 elections.

Perry is done,” came a Twitter posting from a viewer called (at)PatMcPsu, even while the Texas governor struggled to name the third of three federal agencies he said he would eliminate as president. Another, called (at)sfiorini, messaged, “Whoa? Seriously, Rick Perry? He can’t even name the agencies he wants to abolish. Wow. Just wow.

The key point to remember is that the “real time citizen” is no longer content to remain passive. Additionally, will the “real time citizen” quietly wait for poll stations and voting counts to close in other states before announcing the results of his/her own state? Will be interesting to watch how quiet or loud Mr. and Mrs. Real Time Citizen will react in 2012.

Finally, social app analytics start-up Kontagent snagged $12 million in a Series B round. According to an interview with Kontagent’s founder, what makes Kontagent unique is that does not perform “traditional” social analytics function (such as conversation monitoring, tabulating likes, etc) but performs deep analytics, with a focus on teasing out profitability KPIs, and has a team of data analytics and data visualization scientists working to help clients understand, interpret, and make informed business decisions based on Kontagent’s proprietary data visualization techniques.

 

Research on social proximity

In response to a request by @Gahlord to research the concept of “social proximity” I have found eight articles that broadly sketch the primary issues and principles related to “social proximity”.

In Towards Design Guidelines for Portable Digital Proximities A Case study with Social Net and Social Proximity (.pdf), the authors apparently introduced the concept of social proximity, which they define as:

[T]he relationships between people in space, within social networks, and through time.

In Life in the network: the coming age of computational social science, the authors discuss the rapidly changing pace of computational social science.

In To join or not to join: the illusion of privacy in social networks with mixed public and private user profiles (.pdf) the authors discuss privacy issues related to social media and the natural tension between “public” and “private” information (see also my earlier article relating to this topic).

In Inferring friendship network structure by using mobile phone data, the authors found that it’s possible to infer with 95% accuracy friendships based on mobile data.

In Bridging the Gap Between Physical Location and Online Social Networks (.pdf), the authors demonstrate how to predict friendship between two users using their respective location trails.

In Social distance, heterogeneity, and social interactions (.pdf, and I hope you’re good in mathematics to understand this article), the authors propose a new model to analyze peer group interactions.

In Connectivity Does Not Ensure Community: On Social Capital, Networks and Communities of Place (.pdf), the author proposes that the strongest online communities are those create senses of social ownership within the community.

In Semantic Grounding of Tag Relatedness in Social Bookmarking Systems (.pdf) the authors discuss how collaborative tagging systems can be used to derive a global tagging relatedness structure from an uncontrolled tagging folksonomy.

In The anatomy of a large-scale social search engine (.pdf) the authors present Aardvark, a social search engine.

Curating content and user experience on the Web

This blog article on “controlled serendipity” spurred me to conduct a little content curating of my own, resulting in this gem of a research paper that documents how the BBC utilizes Linked Data technologies to make it easier for BBC users to navigate its vast programming database.

The first article discusses how the Web collective–the user commons if you will–is benefiting from individual efforts at curating content, done largely as a free service driven by a spirit to share.

Sharing has become a reflex action when people find an interesting video, link or story. Great content going viral isn’t new. But the sharing mentality is no longer confined to the occasional gems. It’s for everything we consume online, large or small.

I think anyone engaged in the social Web would readily agree with this sentiment. It’s what makes participating in this distributed forum so fun. The article also points out, however, that the vast content mines that exist can be somewhat difficult to navigate to find true gems. Thus, the implication is that content providers need to step up to the plate and deliver content systems that make it easier on Web “content curators”.

The research paper referenced above describes how the BBC used a concept called Named Entity Recognition (NER) to extract concepts from textual input. This allowed for more efficient human editorial input to ensure that these concepts were accurate. Once approved these “concepts” were transformed into links appearing on a Web page. This process then allowed the BBC to use the “concept links” to create user journeys through their site. All this is based on semantic web principles. The future looks bright, indeed, for those of us who constantly scour the Web for salient content.

I think anyone engaged in the social Web would readily agree with this sentiment. It’s what makes participating in this distributed forum so fun. The article also points out, however, that the vast content mines that exist can be somewhat difficult to navigate to find true gems. Thus, the implication is that content providers need to step up to the plate and deliver content systems that make it easier on Web “content curators”.
The research paper referenced above describes how the BBC used a concept called Named Entity Recognition (NER)http://en.Wikipedia.org/wiki/Named_entity_recognition to extract concepts from textual input. This allowed for more efficient human editorial input to ensure that these concepts were accurate. Once approved these “concepts” were transformed into links appearing on a Web page. This process then allowed the BBC to use the “concept links” to create user journeys through their site. All this is based on semantic webhttp://en.wikipedia.org/wiki/Semantic_Web principles.

Using text analytics to increase customer engagement and loyalty

I love it when research/theory manifests in application/practicality. In 2007, I wrote about research being conducted on semantic analysis related to social media and blogs, and now there are companies using products stemming from this type of research.

Information Week covered text analytics, describing how JetBlue uses text analytics to understand customer sentiment from email messages, which informed the airline how to draft its customer bill of rights. And KMWorld discusses how the burgeoning field of “customer experience analysis” uses text analytics to increase customer engagement and loyalty.

Customers today aren’t just customers–they’re influencers and social networkers. Across the Web at any hour, they’re sharing observations about your company’s products and services, and those of your competitors…These new modes of customer behavior make it essential for companies to move beyond traditional ways of gathering, analyzing, and acting on customer information – Information Week

For a long time, text analytics was a technology in search of a business need. Now, thanks to social media, the need is there; the question is whether the technology can ramp up fast enough to be commercial – KMWorld

Where social media in real estate sometimes has the floor manners of a dog’s breakfast, it’ll become increasingly important for real estate firms to engage in text-sentiment analysis as part of their overall CRM and customer experience efforts. Here’s a list of companies that offer text-sentiment analysis services:

Photo credit: mnapoleon

Future of search and search engines

Here’s an article that details some interesting issues relative to search, recapping a Xconomy Forum on the Future of Search and Information Discovery panel recently held in Seattle. On the dais were Microsoft, Google, and a couple of University of Washington professors. Here’s some salient take-aways:

  • It’s still unresolved whether vertical search will significantly impact general search
  • The nexus between real-time search, consumer intent, and semantic search is where the search gold resides
  • Hurricane Katrina taught Google a lesson about relevance and real time results
  • Opportunities to compete with Google and Bing exist, but only on the edge or fringe such as applications that bypass search engines, employ automated content discovery mechanisms, use semantic search, or perfect mobile geo-search

Interesting quotes:

Google is like smoking cigarettes, it’s a habit that’s going to be difficult to give up. So what can you do? You have to think about the problem space. Google’s approach is to get people in and out of search engine quickly with their result. Not the right way to think about it. Right way to think about it is to think about minimizing time of completing a task, not minimize the amount of time to match a query with a url.

[O]rganize the information in a way that synthesizes the task that you want to accomplish.

Mobile is huge. Apple is the big fish at the moment. Android coming on strong. Won’t hold my breath on Microsoft.

Two things which potentially threaten us. [1] As we become bigger and older, it could become more difficult for Google to innovate…[2] Also worry about diminishment of sense of entrepreneurship.

List of social Web resources 07-02-2009

Chris Brogan interview
Excellent interview with Chris Brogan on how he’d run an airline and implement some social web karma; great insights, well worth the 9:58 investment of your time. The interviewer, Shashank Nigam, CEO, SimpliFlying, asks some really good questions. My comment after listening to the interview: That was seriously cool.

Semantic Web
This post re-confirms to me that the semantic web (i.e., Web 3.0) is still a ways out from being widely deployed, yet absolutely filled with so much promise and visionary thinking.

Dunkin’ Donuts
Insightful post on how Dunkin’ Donuts uses the social web to extend its brand engagement. Dunkin’ Donuts’ recently released Dunkin’ Run app is a nice, simple deployment of a social app that has a built-in ROI component: buying doughnuts.

Vyoom
Interesting TechCrunch profile of Vyoom, which is a social networking site that gives you redeemable points for your participation. The more points you accumulate, the more stuff you can buy. Not sure whether this will work as a stand-alone application/concept, but could certainly see this applied in a rewards program under a major brand (e.g., Southwest’s Rapid Rewards program).

Twitter
Interesting ideas on why Gen Y may not “get” Twitter.

List of social Web resources 06-19-2009

Social media is social what?
A call for dropping the term “media” from the phrase “social media”. Compelling argument to drop the fascination with the platforms and concentrate on the quality of the content and product.

Public relations social web tactics
Long list of new products and services pitched to a Kentucky-based director of social media (two of the brands he reps: Maker’s Mark and Knob Creek bourbons). Very interesting list of social media “newness” and implicit insight into public relations 2.0 tactics.

Interviews with semantic search pioneers
Summary of interviews with key semantic web players from Google, Ask, Hakia, Microsoft, Yahoo, and True Knowledge. Some topics: shift from “popularity” based search results to “credibility” based search results.

Crowdsourcing with Rob Hahn

Crowdsourcing is an important concept in the viability, pertinence, and relevancy of the social web.

A recent crowdsourcing search odyssey of mine (really a two hour drop down the Google search rabbit hole) began with a fairly innocuous @robhahn tweet:

I read recently that a 2-person combat team is four times as effective as a single shooter… anyone have any references to study of this?

This tweet intrigued me, as I thought it likely had something to do with Mr. Hahn’s insurgent marketing in real estate series. @PatrickHealy immediately stepped up to the plate:

@robhahn this should give you what you need: http://bit.ly/15eqQ4

Shortly thereafter I weighed in with this research article. But alas, Mr. Hahn was not satisfied:

@PatrickHealy close… but i’m looking for research showing 2 man team vs. 1 man ops

@ericbryn actually, wanted to see just how much more effective a 2-man fireteam is vs. solo shooter; maybe applies to agents…

Thus, inspired, I began a more substantive series of searches, which yielded these tasty tidbits, but nothing directly on point:

Discussion of information needs assessment and power of teams in edge organizations – Relevant to the insurgency series because the article discusses the shift from top-down command and control decision making to empowering teams and individuals to make relevant decisions based on timely and accurate information. Edge organizations promote a structure comprised of agile distributed networked units, which favors insurgent marketers.

How the information age has affected command decisions in USAF from Desert Storm to 2005 – Relevant to the insurgency series because the author analyzes the USAF shift from centralized to decentralized decision making. Decentralized decision making is key to enabling insurgent marketers to exploit the command and control decision making process that’s sometimes endemic with larger competitors.

Theories about net centric warfare – Relevant to the insurgency series because the article discusses how shared information resources contribute to cohesive mental models of the battlefield that results in increase combat effectiveness. Shared knowledge shared quickly enables insurgent marketers to exploit weaknesses in larger competitors’ information flow.

Discussion of basis for combat operations going to a STRYKER protocol – Relevant to the insurgency series because the report discusses how STRYKER forces are geared to respond anywhere in the world within 96 hours, stressing tactical mobility, lethality, and survivability. Insurgent marketers must strike quickly and with precision to weaken their competitors.

Uses of misinformation in war gaming operations – Relevant to the insurgency series because this article touches on how too much information causes humans to focus on the technical aspects of how the information is delivered rather than the context of the information and how this phenomenon leads to misinformation. An insurgent marketer can exploit this nuance in the sense of releasing highly relevant, highly targeted communications that are in direct contrast to a competitor that focuses on broadcast messaging. Here’s a nice quote from this article:

The gold lies in human thought—assisted by modern communication and computers, not distracted by them.

The reason why I’ve detailed this search odyssey is because I think it’s an interesting exercise in crowdsourcing and thought leadership. Mr. Hahn is a thought-leader in the real estate industry (recently securing a columnist slot within the Inman tribe). But this, in and of itself, is not enough to motivate me to spend a couple of hours helping Mr. Hahn. So what did? Yes my motivation was driven partly out of friendship. But it also has to do with sharing in the learning experience. That is, I enjoy the way he thinks through issues, the cogent arguments he makes for whatever position into which he plows his sword. Part of the way to enrich this experience–a more personal experience with his thought-leadership–is to participate in the germination of an idea. And that, I think, is at the heart of crowdsourcing–the act of helping give birth to a knew idea. The core of crowdsourcing is, essentially, the core of the social web: willingly sharing knowledge, participating in the expansion and distribution of this knowledge, and taking leaps forward together as change agents and innovation artists. Rob, happy reading.

Photo credit: rp72

List of social web resources 6-5-2009

Semantic technology and artificial intelligence

There’s lots of discussion lately about the semantic web and well-deserved praise over applications like Wolfram Alpha that employ semantic web theories to deliver relevant search results. In 2002, a short article discussed the concept of the “wisdom web” and highlighted many of the innovative concepts we’re seeing applied today. Future applications will likely employ intelligent agents to accomplish much of the “secretarial” type functions manually input today by humans into search engines, social networks, and other Web applications and platforms (here’s a great summary of intelligent agents in the evolution of Web applications).

Reinvigorating MLS information

Let’s assume a situation where intellectual property and licensing issues are properly resolved and set with respect to granting outside developers access to MLS content and data.

If you’ve heard of an MLS (or a broker with a VOW) that has engaged a group of skilled programmers similar to what Washington D.C. did with its content and data, please let me know. Don’t you think something wonderful could happen with real estate search similar to what’s about to happen with bioinformatics?

Dialogue between bioinformaticists and semantic Web developers has been steadily increasing for a number of years now as widespread data integration problems have clearly begun to impede the progress of research.

This is not to say that challenges don’t exist,

[I]f you’re talking about traversing [information and data] computationally, then it’s much more challenging to make sure everything means the same thing and that the object that you’re getting to on the next path has the same persistence, quality, and structure that you’re expecting to operate on.

Nevertheless, the vision for a more collaborative and effective future is vibrant,

Ultimately, what the semantic Web community hopes to have are applications that will make the complexity of the technology as invisible as possible.

The real estate industry has an existing standardization body: RETS. It seems to me that an MLS (or broker VOW) could provide great value to its public and real estate industry stakeholders by adopting a RETS standard (thus, at some level, solving the data standardization issue raised above) while opening its data pantry to a group of developers, similar to what Washington D.C. did with its Apps for Democracy contest held last year (according to the Apps for Democracy website, the city realized a $2,300,000 value, not to mention the fact that the public now has some nifty tools),

The first-prize winner in the organization category was a site called D.C. Historic Tours, developed by Internet marketing company Boalt. The information about area attractions came from the city, but Boalt developers decided how to present it…The site uses Google Maps as the basis for enabling users to build their own walking tours of the city. It pulls information from Wikipedia, the Flickr photo-sharing service and a list of historic buildings.

Imagine a pool of widgets, desktop apps, apps for iPhone’s, Blackberries, etc, that slice and dice real estate content and data in novel ways. The public would obviously benefit by accessing real estate information in ways that are most meaningful to them. The content/data provider benefits by engaging the public at a deeper, more relevant, and effective manner. And real estate agents ultimately benefit because a more satisfied, more qualified, and more engaged buyer or seller equates to increased business opportunities.

Photo credits: ducks (SleepingBear), tightrope walker (tallkev)

List of social web resources 5-8-2009

Semantic coolness
I stumbled across the Semantic Interoperability of Metadata and Information in unLike Environments (SIMILE) program at MIT. Rather than try to summarize what they’re doing, here are some examples: Music Composer Research Database, click a composer’s name to see what happens; UK Traffic, click a blue dot on the map to see what happens.

Web 2.0 coolness
Excellent interviews of Tim O’Reilly by HubSpot CEO Brian Halligan. Discusses baseline concepts of what it means to “be Web 2.0”; change in thinking and corporate ethos and individual creed.

Art
Wonderful missive on the nexus between art and Web 2.0. I especially enjoyed the author’s discussion of what “avant-garde” means–as originally put forth in this essay–in the 21st century. Both are meaningful reads because each author broaches core issues relating to a wide cultural shift in collaboration across different societal strata.

Engagement and consumer value propositions

Here’s another recent article on the changing consumer landscape regarding brand affinity and marketing. It parallels themes from my Crowds, Hives, Mobs, Swarms post.

The contemporary savvy consumer is seen as someone who combines areas of competency (particularly technological sophistication, network competency and marketing/advertising literacy) with empowerment (especially self-confidence and self-efficacy).

The paper points out that consumers are focused on value in their online interactions: value-for-time, value-for-attention, and value-for-access for their personal information. In searching for this value, consumers have become self confident in utilizing new technologies to filter and control brand-centric messaging. Additionally, consumers are by and large comfortable tinkering with new technologies on a trial and error basis as opposed to following a script or reading a manual, which has resulted in mega-brands like Google, iPhone, etc. As other brands attempt to match the success of these mega brands, ad spends are increasing in places like social networks as these brands go for consumer “engagement gold”. But there is a downside.

Organisations that serve consumers, employees and citizens in the world of person-centric commerce will be beneficiaries…but along the way there will be losers and casualties, including some businesses that over-estimate the desire of their consumers for engagement at the expense of offering basic value-for-money.

Accordingly, brands need to account for the differences among consumers and their attendant needs regarding value. These differences fall largely along generational lines, but even these lines are blurred as older consumers learn to adopt new technologies and adapt to novel ways of socializing and networking. In conclusion, the paper posits that despite a brand’s overt focus on highly customized, highly relevant, and highly emotional appeals, these efforts may not be enough to get these customers “involved” with the brand because the consumer landscape is too fragmentized and unstable.

Reality mining in real estate services

As always I am grateful to Owyang to lend his insight and foresight. Here’s another excellent missive on the “Intelligent Web”. In summary, he posits that machines will begin extrapolating relationships and driving recommendations for connections from the juxtapositions and nexus between “our behaviors, context, and preferences”. Sounds a bit like the semantic web. Spinning through the comments on this post brought me to the Innovation Insight blog where Guy Hagen explores MIT research related to “reality mining”, which you can find more about on the MIT Web site. And this research paper out of UC DAVIS demonstrates how the MIT Reality Mining data set was utilized in tracking behaviour via mobile phones.

Imagine an iPhone application overlayed on a real estate firm’s listing data set, where the iPhone reports back over time thousands of user’s mobile browsing habits (i.e., driving around looking at homes for sale or rent). Having such data would allow firms to target advertising, Web site promotions, and give predictive insight over their competitors with respect to fluctuating markets (e.g., patterns will emerge over time that will tell a firm which neighborhoods, etc, are capturing consumer interest, thus enabling a firm to deploy marketing and agent resources towards these locations ahead of their competition).

Blog Sentiment and Traditional Media

eMarketer reports that a recent study on traditional media’s use of blogs shows that 57.7% of respondents–US journalists –use blogs to measure sentiment.

Identifying Expressions of Emotion in Text (core study found here, registration required) is an intriguing “blog sentiment” study that identified targeted, emotion-laden words–“seed words”–and retrieved 173 blog posts from the Web that contained such. What the researchers found was that you can algorithmically determine the “emotive pulse” of a blog, or individual blog entries.

Once a product is developed around this method, one practical application of this is that you could quickly cull blog posts that target a certain emotion pertaining to a brand so as to determine positive/neutral/negative sentiment relative to that brand. This is particularly useful for brands concerned about maintaining a “real time” knowledge base as to their status in the blogosphere; thus, enabling brand managers to anticipate/preempt potential public relations crises.