Predicting Outcomes via Social Media Monitoring

Eric Schwartzman

Eric Schwartzman is the principal consult at Schwartzman & Associates, a boutique public relations, SEO, and content marketing agency. He is also the author of 2 best selling books, an international keynote speaker, and a contributor at Adweek, Fast Company, TechCrunch, Yale Climate Connections, The Healdsburg Tribune, and many others.

Podcast: Play in new window | Download (Duration: 45:14 — 41.5MB) | Embed

Subscribe: Email | RSS | More

I had a chance to interview Dr. Marc Teerlink, Global Strategist & Data Scientist at IBM recently.

He’s a fascinating guy, and we spoke about using social media monitoring tools to separate the signal from the noise, how to use past conversations to try and predict the future, why teens are the most difficult audience to monitor and the dangers of relying too heavily on sentiment when predicting outcomes.

You can listen to a recording of our conversation or read it below.

Eric: You’re floating around somewhere in the ocean, yeah?

Marc: Yeah. I do this like once or twice a year. I try to still take a serious race ‑‑‑ serious crossing, actually ‑‑ together with some buddies that I know for quite some time. We go back like 20‑plus years. Those are the good moments in life, I can tell you. Just to be away from it all.

One of the best things about being on the ocean is that the moment you really leave the shore, you have a satellite telephone for emergencies, but nothing else, which means that your team really has to show that it’s as great as you hoped, when you hired them.

Eric: Charting a course in a boat is a very precise exercise. One or two degrees off over the long haul and you’re in the wrong country, yes?

Marc: Yeah, in the wrong country or as Columbus said, “The wrong continent even.”

One of the things I find is actually one of the best management experiences is to be part of a sailing crew. You learn that you are actually the most important of the team. I don’t want that line to be quoted by itself, but if I fail, everybody else fails.

The team is going to be as good as I can make them as an individual, and as the next individual. The first part is, I really need to focus on optimizing myself on a boat, whether I’m the captain or just one of the people pulling some ropes and hoisting some sail, and the moment somebody else said it.

We as a team, are as good as each individual wants to be for the team. Can I optimize myself for the team? Do I have the trust in my fellow team members actually, when we do a handover, the moment I’ll shout that they will hear me and they’ll pick up?

Do I feel comfortable in the decisions, the person who does the navigation, or which the captain makes? The more you become that navigator, that captain yourself, the point is ‑‑ in what matter will I listen to my team members? What moment when I say, “Guys, when I say tack we will tack. We can evaluate later whether it was good or bad, but that’s what we do.”

I think the essence of a good strategy, whether it’s for Big Data or for your company or even for your career or your life is saying, “I am here. I want to be at point X at the horizon. I know what my destination is, or at least, what are the points I’m going to tack? What are really the most important points that I will be heading to, that I need to capitalize to make it to the next step?”

When I coach people that work for me, I always ask them, “What are the points in your life you’ve been able to capitalize so far? Forget about your job experience. Just let’s look to who you are. What can you capitalize, and what’s the next point you want to do?”

The same I have, when we do projects, where we do analytics. What are the core questions you want to answer? I understand, we all want to have in the words of Bob Crandall, “An unfair, sustainable advantage.” [laughs] But what is it you are going to in the meanwhile? What are your key points you can attack, that you want to capitalize, to get there in your market, to make a difference to your customers, to your employees, your suppliers or the stock market ‑‑ whatever is important for you.

Eric: When you think about solving a problem like charting a course for a ship, you’re using maps and you’re using a sextant, and you’re using numbers to figure out your destination. But you’re considering all that information against your knowledge of the atmosphere and the ship, and the crew and the wind.

You’re able to model that information which is a lot of complex information, in your head. Then you think about this idea of Big Data and converting Big Data technologies into business insights. If you’re three degrees off there, you’re going to come away with an insight that is false. In effect, the more data you put into the hopper, the more potential for misinformation there is. How do you balance that precision against, I guess, the imprecise nature particularly of human language?

Marc: That’s a real good point, and I’m going to take that in three pieces of question. Question number one is how do you deal with a large amount of data that you cannot review? I really find it’s important to realize that facts are more important than gut feeling.

That doesn’t mean you need to disregard your gut feeling, but when you have a series of facts about you, like how the weather is going to be, what the atmosphere is, which ships have been on this course, what has been your past experiences, that’s something you can’t disregard. You can still, especially in races, have a hunch. You take a Hail Mary, but at least then you know the facts, and then you’ve got to build on your facts and you put a hunch on top of it.

I don’t think you can ever have 100 percent complete set of data. That brings us to the next point. How do you deal with pollution, or noise? One of the things I like about people like Nate Silver is they made it so clear for data scientists in our profession that separating the signals from the noise gets much harder when you have more data.

Then the point is how I’m going to use the data when it’s sailing our business. I’m not just taking historic data to extrapolate. I think a lot of things go wrong by companies that keep extrapolating, that companies that built predictive models based on classic analytics.

For instance, when you do business intelligence, you do phenomenal reporting, sophisticated reporting, looking back to the past. Nothing wrong with that, gives you a great explanation where things happened.

The next point is what’s going to happen. You can extrapolate. You can say all the cars standing in front of the city of New York trying to enter Manhattan at 8:00, if I can extrapolate it, the city will be filled with cars by 12:00, and they will be stocking cars on top of each other by 3:00. But at the moment, it tops off, and how do you recognize the top off moment? How do you know that it’s a [indecipherable 0:07:33] pattern?

I think the biggest thing is in those three percent that you say, is that a lot of people that have had some statistics or did some business intelligence, they start to do data science in the way that they used to do it. They try to make models that actually are explained by past data and by past datasets, and you’re never ever going to predict with that. You’re going to stay within the safe margin of error, and that’s OK.

Columbus, to go to a new metaphor, really was convinced that if the earth was round and he just went westward, he’ll made it to India, not taking the assumption with a lot of other data that there might be something in between the coast of Spain and the Japanese east coast.

Eric: When you think about the Watson, Watson was 2011. In 2011, Watson winds up beating the two top “Jeopardy!” champions, and that’s natural language processing. The thought when we saw that was, “Oh my god, we’re finally going to be able to call our cell phone provider and tell their voice recognition system what’s wrong, and they’re going to route us in the right direction.”

That really honestly has not been the result. A lot of these voice recognition systems that you talk to still get things wrong. What’s holding us back from really realizing the power of this type of computing?

Marc: Wow, that’s a really good question. I think it’s maturity, to be honest. When people saw Watson in “Jeopardy!” in 2011, beautiful Valentine’s Day victory, the reality was Watson took natural language, said, “Do I really understand the question? Do I have enough information to have context?”

If you were talking about Hilton, Paris, and I was talking about Paris Hilton, are we both talking about the socialite or is one of us talking about a hotel in the capital of France? You needed some context, and in “Jeopardy!” you actually got that context.

Those were the clues, like “Chicks dig me” is actually a category. It’s a category. The moment I get the first question, I know it’s about female archaeologists. When I say the word bats, there’s a lot of bats around here. Am I talking about the animal? Am I talking about the American baseball bat? Am I talking about the British cricket bat? Again, I need a little context that says, “The category is American sports.”

IVR in speech recognition systems are unprepared for that context. They get accent, they get intonation and I would really recommend a lot of people, let’s just start with text. Let’s not even start with spoken words, because you can get write, Mrs. Wright, right now, kind of confusions. Let’s start with texts. Let’s then put text in a context. Can I understand that context? Can I then apply machine learning, so every time there’s an interaction, I learn what works and what doesn’t work.

I don’t program with Q&As like Watson did. Watson did machine learning, and infers, can I give answers based on confidence? And then you come with Watson and we’re two years on the road and we do phenomenal impressive things in health care. I think the work with Memorial Sloan‑Kettering is ground breaking and profound, and yet, the moment I come to the next company, and that’s the reason we opened up the ecosystem. I realize it’s not the external data that’s going to be the issue. It’s going to be understanding their internal data.

The good thing about all these medical companies, these health care providers, they have cases. They have hundreds, thousands of cases about when somebody comes in and has a headache and has a temperature, because they use those cases to teach potential doctors. You go to a company, you say what is the training in the case of somebody in the call center if somebody calls and says, I just bought a phenomenal product for my new laptop and I can’t get it to work. How do you recognize words in the context?

Speech recognition and text recognition requires so much context that you need to train the machine. If you have the cases to train the machine, you have the first. Then the second is you need to get examples and data from the past.

I’ve done few projects in the past two years and I can tell you a lot of companies still have problems in unleashing their internal data. What are the five things stopping before you go to cognitive?

One, can I truly unleash the big data? It’s a little bit like Michelangelo. Can I see the angel that is hidden in the marble and carve it out? Can I start with the core business question and say, this is the context that I’m going to start with speech. I’m not going to try to play Jeopardy again, but on a bank, it’s going to be about mortgage or college advisor. I’m a company that does tracking and skiing kind of equipment and I’m going to help people get advice when they go to an obscure [indecipherable 0:12:46] situation. What is the gear that they’re going to need?

That’s a pretty defined domain, defined context and the questions and the cases to learn are smaller. Then you can say, what kind of data do I need to connect? I need external data I need to reuse. I need social sentiment. Not necessarily social networks, but I need sentiments. I need to combine it with my internal product tables, and I need to combine it with my internal point‑of‑sales, or, the point‑of‑sales of my partners. That’s what goes wrong.

All those internal stuff wasn’t aligned. Most people didn’t know where it came from. They were historical legacy systems. There wasn’t a [indecipherable 0:13:26] data management, and there was no common language.

If I come into a company and ask, hey, sales, report number 12, the word sales, is it with or without returns? And five people give me six answers. How do you expect the machine, cognitive technology like Watson, to deal with it?

My recommendation is do it in steps. Don’t focus directly on cognitive. Get yourself on the journey to this. One, unleash some big data, internal and external, and get some small cases running.

Example, and we did this with a global retailer, can I take social sentiments and reviews about fast‑moving electronic goods, combine it with my point‑of‑sales internally, and actually do a better forecast? And yes, we did. We were 24 percent closer to actual. This was one of the leading retailers in forecasting. Taking their external data, their partner data, their normal forecast data, the sentiment and the locations, we were able to predict it. That’s great. Now, the problem is, how you going to visualize that?

Let’s take an example here, Eric. Do you have an iPad or an iPod, or one of those devices you play music on?

Eric: I do.

Marc: So do I. I’m very happy with my iPhone, and I have 24,630 songs on it. It took all kinds of unleashing big data. I took MP3s, I took CDs, I took vinyls, vinyls from my dad. They’re all on the phone. Took me an enormous amount of time to tag and to get the cover art, et cetera, on it. Now, I want to make a playlist for a party.

Five years ago, when I had a party, I took five CDs, six CDs, and had my playlist ready. Now, I’m spending two hours, and finding a playlist with my daughter in Spotify and before I have the playlist.

So, having the data unleashed doesn’t mean you have it accessible, visualized. My advice for companies is, focus on your business case. What is your most important key question? What is the capability that you want to have? Is it a capability that you want to share? Great. How are you going to visualize it? [indecipherable 0:15:28] make it collaborative because when Marc finds something, how is he going to give it to Eric and back to selling? How is Eric going to trust that my data is good? How is Eric going to feel comfortable?

Like if you take the impact of, hey, recently there was a flood in Thailand. I’m a merchandiser, you’re a merchandiser. Question is, what is the impact of the flood in Thailand? Then you say, oh, I already did some work. How do I find it? How do we share that together? How do we find and share our analytics?

Then the next step is visualize, unleash, collaborate. How are we going to make it predictive? We say, something like this is happening. We need to rapidly get X, Y, and Z in inventory. That’s the moment you have enough opened up that you say, now we can give this to a cognitive system. The cognitive system can learn from our collaboration and our dialogs. It can learn from everything we opened up in our data. It can learn from the way that visualization is effective, and what’s predicted.

Watson isn’t predictive. What everybody says, cognitive technology isn’t predictive. When you say to cognitive technology, what’s going to be the best selling toy this Christmas? It has no clue. When you say, what were the patterns that you saw the last two years around the beginning of November, that gave you an indication of what the best selling toy was? It said, I saw those patterns. It gave me 67 percent confidence that Ironman and the rest of the Avenger set are going to be the best selling toys for Christmas, especially in the Lego category. That’s great.

Can I now take those patterns and put them in a predictive engine and get the outcome of the predictive engine in a collaborative platform? That’s more where we’re going. How many companies do we know that really have an internal collaborative platform, an internal Google+, or an internal Facebook? Most people still send data out over spreadsheets and email. What’s blocking us? We haven’t unleashed it. That’s why startups are doing so much better than larger companies.

Eric: Let’s talk about that for a minute. When it comes to big data, is bigger better? Does more data equal smarter outcomes? Or, is more data potentially a recipe for misinformation and confusion?

Marc: This is a really good question. Is more data better, or is more data a recipe for misinformation? It depends on the quality of the data. One of the things going to “Jeopardy!” is we had a pretty good confidence in the different sources. We knew if the “National Inquirer” was the source, that the fact finding had another principle than ,for instance, if the “New York Times” had been the source.

If Wikipedia had been the source, we knew there was an issue that people could have been updating it, and, therefore, not all data was true. While actually, if we took Reddit information and we took past censors, we know that facts was pretty high.

If I have big data, it’s not important if it’s really big. It’s important like how much can I trust it. It doesn’t matter if it’s a little bit of a censor and a little bit of a text and a little bit of a point of sale. The point is can I combine them? Is the quality good enough for me, and it doesn’t have to be perfect. But is the trust factor that I have consistent enough?

I find the advantage of a really big dataset, that I can really find a needle basically in a stack of needles. What I need to do, coming back to the previous point, is separate signals from the noise and say, “What are the really more important signals?”

If I went back 20 years in time and I wanted to start a coffee shop, or let’s say today I want to start a coffee chain that was founded 20 years ago, I wouldn’t be able to find in my data any statement that somebody said, “Hey, I’m willing to pay five bucks for a cup of coffee instead of 50 cents.”

I would find a lot of people expressing the need to have a place to hang out, to have something that’s a little bit more upscale than fast food. Something can hang out, and if they’re independent contractors or between client location, that they can do one or two hours of work, and indeed the need to have a decent cup of coffee.

With that, the concept for a coffee chain is found, but I wouldn’t have found it in the data. I would have found that need for that experience in the signals between the data. It doesn’t matter how big that set is or not. What’s more important, can I rely and trust on the quality of the set that I get?

Eric: You’re looking at the connections between the data, not necessarily the data itself?

Marc: Right. I’m looking to the connection, or I’m looking to what’s missing. One of the things I really like what we did with Watson in the beginning with the medical applications is that we did not only look to what does somebody say when they’re in the intake with the nurse or the nurse practitioner, but what is it they haven’t said?

They haven’t had backache. They haven’t mentioned headache. They haven’t mentioned diarrhea. Therefore, these things we can take out. Or therefore, these things, you need to do a check question like, “Is it sure you don’t have that?”

It’s the same with data. It’s interesting. What’s in the data is as much interesting as what’s between the gaps, what’s missing.

Eric: It seems like the biggest gap and the biggest area for confusion is language, because language is so imprecise. You think about the different types of data that you would think about and analyze, machine data and transactional data.

Then when you get down to social data, and I would include sentiment and social media, and even I would say in our day and age news media, because you have so many enthusiasts who are creating blog posts for brands like Forbes. It would seem to me that in those areas there’s the biggest potential for misinformation.

Marc: Oh, absolutely.

Eric: I don’t know how you get your arms around that without being a human being.

Marc: One of the pleasures that I have is actually, I do some teaching as well, and I do that academically. I’m also doing that, for instance, at a high school, the international school where my children go. Filtering the emotions, even with language, is not enough because language can have sarcasm. “Yeah, right, that’s phenomenal,” or, “I really love that,” and the tone of voice is enough to say, “Yeah, not.”

Teenagers are phenomenal, by the way, in creating new versions of language, so context is really important.

Let’s take your smartphone, let’s say. It doesn’t matter if you have an Android or you have Apple with Siri. Take the Google Voice or Siri and say, “Text my wife I love her.”

Your partner is going to get a text message saying, “I love her,” and you have to explain to her like, “No, no, no. I love you, not her,” because she’s going to ask who is her. The context is so important. When we get sentiment statements on the web in the digital world, how do we find what’s important? Actually, you need to try to combine it with behavior.

For example, when I’m talking about a Maserati, are the words that I use, do they give you more confidence that I talk about the joy of ownership? Do I talk about the intention to buy? Do I talk about my personal point of view about the latest model or am I just spreading admiration about it?

The first thing is can I get the sentiment around it, the intent, and that’s actually pretty easy. You can do a lot of word tagging and word weighing. When you weigh that, you see, “OK.” That’s using one language.

The second part is how much still do I trust the source? I need to start ranking how much do I trust this source. Is it somebody just blogging on the Internet? Is this source that actually does fact finding?

What we know from social media and analytics and from social sentiment analytics is that in Eastern European countries, a lot of the English printed media tend to be a little bit more biased in a positive way than, for instance, the national languages. I really need some human being to double check how much confidence do I have in this kind of source?

Next thing is I need to look to, whether I have confidence in the source or not, what’s the influence on the source? If Eric Schwartzman, for instance, is talking about something that he really finds important, how many people does he hit, and how many people follow up by change in anticipated behavior?

When you go to football, it’s not the first person that stands up that makes the wave that counts. It’s the second person, because that person makes the rest stand up to do wave. I need to know in a dialogue who is that second person. Who do they influence?

I’m not interested in your privacy, in your name, your address. I need to know what waves are created so then I can say that someone like you in the past have created waves. Your wave, even if you come from a blog post that I have some questions about, your wave is going to be important because you have impact. Whether the wave is fair or not fair, but you have impact.

I take impact, I take influence, I take the sentiment and the intent around it and I get some kind of confidence about what’s being said. Then I look to real facts. What is the people really do? I look to a point of sale table. I look to tweets. I look to census data, 40‑50 taps. Those two I take together to make a model.

I still say if I am not sure about the sentiment, I look to the facts. Because you and I can stand outside a big supermarket chain and ask people, “Would you pay a buck more for a more environmental friendly detergent?” Most people will tell us straight in the eyes, “Yeah, I would.” Then you go the point of sales table afterwards, and yeah, most don’t. You still need to get some facts to balance those sentiments.

Eric: I want to ask you a very basic question. I wouldn’t even call this a big data question, but I’d be very curious on your guidance in this area. A lot of our listeners use tools to analyze social data and news media.

The popular tools in this category that a lot of our listeners use might be a simple free tool like Feedly. I don’t know if you’re familiar with it. It’s an easy news monitoring, social media monitoring tool. There’s another by a French company called Netvibes, and then some of the premium services that a lot of our listeners use are a product by Salesforce.com called Radian6, actually recently purchased by Salesforce.com.

They use a product called Sysomos. Some of them use Meltwater. Some of them use a product called Recorded Future, and they’re all easy to use. You don’t have to be a technical person. There’s a graphical user interface, and you can try to make sense and get business insight from mainstream news media as well as social media. Well, social media where the social media user allows that information to be public.

From my standpoint, it’s an outgrowth of the time when we used to have what was called news clipping services. These were services where you would give them the name of your company or the name of your competitor or the name of your brand, and they would actually have a team of people who would read magazines and newspapers. Two or three weeks after the story came out, they would clip it out and send it to you in the mail, pre‑Internet.

That’s obviously been replaced now with these online news clipping services. Then for the folks in marketing and PR, that’s often grown into, “Hey, how do we analyze what’s being said about us on social media, or our competitors on social media and the news media, and try to pull some meaningful business intelligence from that to show a return on investment for what we do from a marketing and PR standpoint?”

If someone was in that boat and they’re using a tool like one of the ones I mentioned, what would be the best type of metrics for them to look to to prove value?

Marc: Yeah, I love that question. I really, really love that question. I think it’s such an important question. First, a tool is as good as you set it up. What’s good isn’t free. How do I use the tool? How does it have value, and what are the metrics?

At first, what are you looking for in the tool? An example is when people only look to their brands with a tool, they might do a pretty good job and say, “Hey, some people still talk positive about our brand.” By the way, the amount of friends and fans I have on Facebook or growing up in Google+ might also be pretty good, I would say. Nevertheless, that’s wrong.

Now, the second part that I have is like yeah, but let’s say I am a company, I make bread. Bread is pretty commodity. I will find zilch serious things about my brand. Do you really come home and say, “Oh honey, I bought a loaf of Brand X?” You’re like, “I bought rye bread, I bought brown bread,” whatever. Especially in the US, the offering is growing rapidly compared to what’s available in Europe. So, you need to look again to what are people talking about.

For instance, people might not be talking about my brand of bread. They might actually be pretty much talking about what they like about, so I need to use my tool different. What is it that people in context of bread are talking about? They’re talking about they want it more salty, they want it less salty. They’re talking about how they combine it, how they use it.

The first thing I would recommend everybody is, if you really want to use those tools greatly, do not just look to your brands because you will get limited data, and you’ll always look to the past. Look to the experience, the words and the experience. Take the output of one question. What are the experiences people are talking about, and seed them then in your next set of questions. In Bob’s relationship, does it work, and how does it apply? That’s a pretty powerful approach.

Having said that, we come to the next part, and say, what are the metrics that I’m going to do? These are the typical kind of sentiments. You take those experiences and make those experiences different kind of metrics. They are not important for your executives. They are important for your marketing operations, for your trade promotion, and for your practical. Then you roll them up. Then you’re going to say, market share. Let’s give an example how one of my clients uses this today.

Originally, what they used to do is have a dashboard where they had their normal sales and their normal distribution parameters. They say, how much did we sell? What are our numbers? What’s in the pipeline by our distributors?

They did beverage, alcoholic and non‑alcoholic beverage. Based on that, this is our sales. This is the amount. But then they like, we have 10 percent more, 10 percent less. They didn’t know if they did well. They would buy external data. This is typical kind of companies that would provide them [indecipherable 0:31:56] , Nielsen, and those kind of consolidators that would come with market share. They say, six weeks after the fact, I know that my sales are up five percent, and the market also went up five percent, so my market share is still the same. That’s weeks later, after the fact.

Now, what they start doing, they start doing two things. Take sentiment analytics out of forms, out of transactions, net promoter scores with my business partners, my distributors, so I can see how happy my distributors and business partners are.

I’ll add that to my normal numbers, at point‑of‑sales directly. I get a better impact that point‑of‑sales data, how I’m doing in particular stores versus others.

I take the sentiment, and when I take the different changes in sentiment, those metrics together will actually give me an indication what’s happening in the market. Am I doing better or worse than my competition?

Sentiment by itself, worthless. So, people like Eric Schwartzman’s podcast, so what? Do they like it more than someone else? Are they actually moving in large amounts to it? Does it get a bigger influence? Will it actually help you to attract more sponsoring or advertisement, or whatever? That’s what really counts.

The metrics are not static metrics, but are what I call key performance predictors. What does it mean for me? The sentiment is changing. I will go to a change of portfolio. Are people moving from product A to product B? Are people talking more about my service? Are people talking more about my online versus my offline channel? Those are the strategic things that matter.

There is no one answer I can give in a call like that. It is very particular for a company. I can say like this example, this CPG company, this beverage company. We took roughly the brand perception, the repeat that it came back, the association with other products, with other experiences, with going out any day of the week, any area of the country, and those gave a pretty good indication if we should do more trade promotion. If we should get more bar and entertainment venues to stock a little bit more, those kind of things. In every part of business, you listen to the signals, you make decisions.

Eric: One of the things you mentioned earlier was you mentioned it was the connections between the data and about the data that’s not there. With the exception of recorded future, none of the platforms I mentioned to you really look at that. My social graph can be expressed numerically based on who I interact with regularly, who I don’t interact with, who I’m connected with that I exchange messages with versus public posts.

So, there’s a lot of data there that can express how I interact with who, and quantify and describe who I am in the context of others in a social network, yet, there’s really very few tools for analyzing that. I know of one called Tracker, and this is a tool, a people search engine that allows you to find experts on a given topic based on their social interactions, based on the actions people take when they share.

Have you done any projects where you’ve looked at the social graph and tried to bring that into a big data model to help express a more accurate outcome?

Marc: I mentioned one example earlier. What we did there is, we used IBM technology there, which is in my case, quite obvious. We used IBM’s BigInsight, which has some social media analytics in it.

The good thing is, I keep my data raw. I get all my data in and keep them raw. While I can visualize it and make all kinds of graphs, it allows me to cross reference or cross fertilize those data. I can take the content of people’s social strings and link them to my point‑of‑sales data. That’s really where the power is. I can see what people talk about, what isn’t what they’re talking about, and what happens.

If I isolate, for instance, locations, locations because people put it as part of their social string, location because people actually talk about the place where they live. The moment I get the data in, I don’t know the persons. I don’t care. I don’t need to know their names and their user IDs. What I need to know is just unique identifier, that I know that the 12 views, the 12 tweets are from the same user. But I’m not interested in their name and that’s not how I want it in.

Privacy and ethics are so incredibly important when you use this data. You need to not only look to what’s legally allowed, but also to what’s morally important and ethically when you take the data.

Now, I have the data in my hands. Can I cross reference it with other data that I have to see if those gaps are bigger or smaller? Instead of graphing it in the beginning, there is nothing wrong with it, taking a data snapshot is phenomenal if you want to share it. But, it doesn’t help you to explore.

To explore I need to actually to start connecting, build connections, build it in the model, run that model from different angles. Look to the confidence I get out of some [indecipherable 0:37:40] and then I visualize.

Nothing wrong with those tools. I think those tools are incredible good small pocket knives with a few options on it. I’d rather work with a larger set of toolkits when I do these things.

Eric: Tell me if you agree. My thought is, any of the tools I just mentioned, if you can’t take that data and compare it to machine data, transactional data, other datasets, you’re not looking at the full picture.

Marc: Correct. You’re not looking at the full picture. Again, some people don’t want that. That’s OK, right? Some people don’t want to look at the group. They just want a snapshot like how many people more like my campaign on Facebook. That’s good. That’s great for that. How many people are complaining that they have been delayed with our airplane per airport? Great! That’s actually very valuable.

Knowing how much people are upset about your service per location is a phenomenal insight already. Let’s not downplay that. Those tools are great for that. If you really want to get insight and discovery, you need to get the raw data those tools would use and mix and match that with other data internally or externally.

Eric: Marc, I don’t want you to get too far off course in your boat, wherever you’re sailing, so I’m going to give you my final question. Obviously, we’ve had a number of reports of US companies losing deals to European competitors partly because of the intrusive US government snooping leaked by Edward Snowden. This has resulted in fear, particularly among multinational corporations, that if they go with a US vendor, they’re essentially giving their data over to the NSA and being scrutinized by the PRISM program.

Are you hearing anything about that? What’s your prediction with that? Has the National Security Agency potentially damaged the prospects of US companies by overreaching?

Marc: That’s a very complex question. It’s a little bit out of my pay grade as well. I spent half the time in Europe at the moment. One of the things I notice is, of course, any kind of breach of data, any kind of security, any kind of hacking and snooping, all of those kind of things have impact. The moment it starts to be done by national agencies, it definitely has impact and it will do something. Today, it’s the NSA. Tomorrow, it might be somewhere else.

The hardest is we need to make this a very positive opportunity. We need to say by country, for the European Union, maybe for a bunch of countries, and for companies, how do I deal with the accountability and ownership of my data? Is the data mine? Is it the consumer’s? How do I deal with security? How do I deal with governance?

European law already says if the data is created in Europe, you need to do a lot of work to get it outside of the EU. You can’t just bring it in and outsource it to India. You can’t just host it in the US. That’s the reason why companies like Apple and Microsoft have been rapidly creating data centers in Europe to deal with that legislation long before Snowden and the NSA.

Does any kind of negative thing about this harm, if it’s NSA or one of the larger game platforms being hacked? It makes everybody uncomfortable. Safety about my data to conspiracy. The human psyche is full of worry about these things.

It’s a phenomenal opportunity for us now that this has happened and in the open, to actually go into debate about ownership, accountability, privacy, the need for a chief data officer. The need for companies to define what accountability and ownership is on their data, and to deal with it, absolutely. I think, yeah.

I’ve been yelling the last 10 years that data is becoming a serious asset. Social data and external data is now being part of your production chain already.

Data is not just the new gold or the new oil. You need to refine it and store each object on its value. If you have a lot of crude oil, or a lot of gold, you not just letting it outside in your yard where everybody can come by and touch it. You protect it. You take care of it. It’s the same with data. We need these companies to be more aware of one of the most important assets that drives our Western society.