Author Topic: (Decentralized) Big Data Methodology: Discussion of White Paper & More  (Read 2275 times)

rhinomonkey

  • Jr. Member
  • **
  • Posts: 64
    • View Profile
Hi all,

I took the liberty of opening this discussion so that we can research and develop ideas in regards to the methodology for the future big data market(s). I am hoping to be active in this discussion and will enjoy reading more from others who are probably far more experienced/ knowledgeable in this arena than myself.

For people new to Spreadcoin, it is important to note that the project centers around the value of decentralization. As Georgem has mentioned before, the big data market(s), must remain true to the principles of decentralization and privacy, so please let that be a guide for the ideas and research posted herein.

rhinomonkey

  • Jr. Member
  • **
  • Posts: 64
    • View Profile
Cross posting - Here is a quote from Coins101 in BTCT (pg. 164: March 2, 2016) with an attached document that is a work in progress

"For my day job, I actually deal with big data on pretty much a daily basis. I also have to stand up in front of senior executives and explain to them why stuff costs so much and why they need to increase a budget to get better quality data and data analysis tools.  I put the attached together initially for my benefit (I sort of need this format to think things through as though I am about to stand up in front of people to explain stuff) and then for others once some of the details were worked out....so, peer into my head to see work in progress:"

https://docs.google.com/presentation/d/1qyExM5c1Ws7eUEj3J_gJ76zFe2WQaLdqJHMbJIJ_jng/edit?usp=sharing

georgem

  • Tech Admin
  • ******
  • Posts: 880
    • View Profile
Cross posting - Here is a quote from Coins101 in BTCT (pg. 164: March 2, 2016) with an attached document that is a work in progress

"For my day job, I actually deal with big data on pretty much a daily basis. I also have to stand up in front of senior executives and explain to them why stuff costs so much and why they need to increase a budget to get better quality data and data analysis tools.  I put the attached together initially for my benefit (I sort of need this format to think things through as though I am about to stand up in front of people to explain stuff) and then for others once some of the details were worked out....so, peer into my head to see work in progress:"

https://docs.google.com/presentation/d/1qyExM5c1Ws7eUEj3J_gJ76zFe2WQaLdqJHMbJIJ_jng/edit?usp=sharing


Yes, coins101 has excellent material.
There have been so many interesting quotes and posts spread all over the forum (and slack), it'll be great to gather them and discuss them here.

georgem

  • Tech Admin
  • ******
  • Posts: 880
    • View Profile
One important factor of big data will be the conversion of IPs into a more privacyfriendly geolocation data.

I have spinned of a new thread to discuss IP 2 geolocation specifically:

http://spreadcointalk.org/index.php?topic=750.0

rhinomonkey

  • Jr. Member
  • **
  • Posts: 64
    • View Profile
This topic is very important and we definitely need to begin to have some serious discussion. I'm sure working on this while Georgem is working on SN's will set us up for a smoother and faster transition into future services. There is very little traffic through the Spreadcoin forums, but I am going to do my best to continue this discussion here, even if I talk to myself for a while. Over the weekend and the coming week I will be digging further into the WP/ big data proposal and researching various things associated with it.

Additionally, I may open up a discussion on Bitcointalk - It will undoubtedly get more traffic. More people - more ideas - better research! Crowd sourcing will be a great resource. I will see what the best way of beginning this conversation in BTCT is (and where to put it), and additionally, how best to facilitate conversation here!

Coins101 is the expert in this area, so I will be keeping an eye out for anything he posts in the main thread to cross post here. He does this often, but it will certainly be helpful to have it in one place.

If I get time, I will post other previous quotes on this topic that I find in Bitcoin talk in order to produce a more complete picture of what has happened and what direction we are headed!

rhinomonkey

  • Jr. Member
  • **
  • Posts: 64
    • View Profile
Sup Folks,

After having a few conversations last night I began to realize that trying to facilitate a conversation about "big data methodology" may be easier if it were known precisely what that is. It is a rather vague combination of two buzzwords to people who don't know specifically what it means. So I thought I would break it down into some smaller questions (though not exhaustive) that we would like to focus on. I will definitely make different subtopics with these specific questions later on but I figured getting started with a few here would not hurt.

"Big data methodology" as I understand it, involves a few separate facets that need to be explored. These questions have been discussed to some degree by Georgem and Coins101. But to make this more accessible I have enumerated a few questions below.

What information do we gather?

What data is valuable to end users?

How do we ensure the accuracy of said data collection?

And most importantly: How do we gather information in a way that respects privacy and the principles of decentralization?

More on this later. Please feel free to add some questions that are important to this topic that I have not listed.

georgem

  • Tech Admin
  • ******
  • Posts: 880
    • View Profile
Sup Folks,

After having a few conversations last night I began to realize that trying to facilitate a conversation about "big data methodology" may be easier if it were known precisely what that is.

Yes exactly.

The thing is, the way coins101/stonehedge define "big data" in their proposal is already very oriented towards fin-tech specifically, but "big data" is a much more general term.

For example, search engines (like google) are representing "big data" in a much more general way.
And let's not forget that one main "field of interest" that has motivated me since I joined spreadcoin, is the creation of a decentralized searchengine.



So, when I say "big data" I always also mean a new kind of decentralized crypto-focused search engine, that will run on a network of servicenodes.

Just keep that in mind when talking about "big data". This term is HUGE!

So basically, you have "big data" wherever you ...

a) ...get access to enormous amounts of big data,
b) ...have a digestion process where those large amounts of data are prepared for user consumption.

Daft Like Jack

  • Newbie
  • *
  • Posts: 1
    • View Profile
Sup Folks,

After having a few conversations last night I began to realize that trying to facilitate a conversation about "big data methodology" may be easier if it were known precisely what that is. It is a rather vague combination of two buzzwords to people who don't know specifically what it means. So I thought I would break it down into some smaller questions (though not exhaustive) that we would like to focus on. I will definitely make different subtopics with these specific questions later on but I figured getting started with a few here would not hurt.

"Big data methodology" as I understand it, involves a few separate facets that need to be explored. These questions have been discussed to some degree by Georgem and Coins101. But to make this more accessible I have enumerated a few questions below.

What information do we gather?

What data is valuable to end users?

How do we ensure the accuracy of said data collection?

And most importantly: How do we gather information in a way that respects privacy and the principles of decentralization?

More on this later. Please feel free to add some questions that are important to this topic that I have not listed.

One question that I have is centered around the viability of collecting "Big Data" from bitcoin transactions.  Since bitcoin is a digital currency it can be spent from any location to purchase goods at any other location.  Thus how does knowing the location of the user bring any added value?  Brick and mortar establishments cannot then cater to the goods that an individual is purchasing online, because the very nature of doing business online cuts out the middlemen.  Targeted advertising in this vein does not seem to make sense. 

Perhaps meta advertising, such as campaigns for a specific region might make sense, but the idea that individual store locations are going to greatly benefit from the data sounds dubious.  People using bitcoin as a mode of payments are leaps and bounds ahead of the typical grocery/retail goods shopper and thus it might not be a great inference to assume they are doing primarily online shopping, which in turn means that the marketing data would need to be utilized in an online space and not by companies expecting paying customers to walk into their store.

If these assumptions and inferences have any merit it would seem that the data we can collect would be most useful to companies that have large online presence and advertising money to spend on internet campaigns.  I am curious what you think about the more targeted approach coins mentioned in a previous post.

rhinomonkey

  • Jr. Member
  • **
  • Posts: 64
    • View Profile
Sup Folks,

After having a few conversations last night I began to realize that trying to facilitate a conversation about "big data methodology" may be easier if it were known precisely what that is. It is a rather vague combination of two buzzwords to people who don't know specifically what it means. So I thought I would break it down into some smaller questions (though not exhaustive) that we would like to focus on. I will definitely make different subtopics with these specific questions later on but I figured getting started with a few here would not hurt.

"Big data methodology" as I understand it, involves a few separate facets that need to be explored. These questions have been discussed to some degree by Georgem and Coins101. But to make this more accessible I have enumerated a few questions below.

What information do we gather?

What data is valuable to end users?

How do we ensure the accuracy of said data collection?

And most importantly: How do we gather information in a way that respects privacy and the principles of decentralization?

More on this later. Please feel free to add some questions that are important to this topic that I have not listed.

One question that I have is centered around the viability of collecting "Big Data" from bitcoin transactions.  Since bitcoin is a digital currency it can be spent from any location to purchase goods at any other location.  Thus how does knowing the location of the user bring any added value?  Brick and mortar establishments cannot then cater to the goods that an individual is purchasing online, because the very nature of doing business online cuts out the middlemen.  Targeted advertising in this vein does not seem to make sense. 

Perhaps meta advertising, such as campaigns for a specific region might make sense, but the idea that individual store locations are going to greatly benefit from the data sounds dubious.  People using bitcoin as a mode of payments are leaps and bounds ahead of the typical grocery/retail goods shopper and thus it might not be a great inference to assume they are doing primarily online shopping, which in turn means that the marketing data would need to be utilized in an online space and not by companies expecting paying customers to walk into their store.

If these assumptions and inferences have any merit it would seem that the data we can collect would be most useful to companies that have large online presence and advertising money to spend on internet campaigns.  I am curious what you think about the more targeted approach coins mentioned in a previous post.

Thanks for the great question! I will look into this matter and have posted about it in the Spreadcoin Bitcointalk Forum. Georgem has as well. We will get someone over here to provide some enlightenment!

Stay tuned.  ;D

rhinomonkey

  • Jr. Member
  • **
  • Posts: 64
    • View Profile
Daft - Coins was able to respond to you on BTCT. Here is his response:

Quote from: coins101

Quote
One question that I have is centered around the viability of collecting "Big Data" from bitcoin transactions.  Since bitcoin is a digital currency it can be spent from any location to purchase goods at any other location.  Thus how does knowing the location of the user bring any added value?

It's a good question.

First, you need to know how consumers, consume payment systems. To do that, all you have to do is look at how many mobile Bitcoin wallets there are.

Second, you need to know how consumers, consume communications systems. To do that, all you have to do is look at how many smartphones there are.

Mobile is by far the most popular everything.

Lastly, you need to know what Big Data is: exponentially growing user generated data. Every 10 minutes, the bitcoin blockchain is different, and bigger than it was.

So, combine these things. Users transacting using mobile payments systems, powered by smartphones and generating lots of data.

People out in areas where there are lots of shops makes big bitcoin data valuable to a whole host of commercial organisations. None of them particularly caring about who you are. They can get all the information about you that they need from the types of places you shop, or areas you spend a lot of money and the time of day you do stuff. 

You can get tons of useful information from data aggregation - add up everything and divide by users, by location, by time, etc.

So, our big bitcoin data market will generate heat maps of purchasing activity. 

People sitting at home buying stuff on Overstock. Meh.  What it might show is pockets of urban sprawl and what time of day or week people shop online. Apart from that. It's not very attractive.

Mobile data. That's where the action is. Best thing is that its hard to get if you're a small business, so we'll change that and make it easy.

rhinomonkey

  • Jr. Member
  • **
  • Posts: 64
    • View Profile
Re: (Decentralized) Big Data Methodology: Discussion of White Paper & More
« Reply #10 on: March 11, 2016, 09:42:15 pm »
Sup Folks,

After having a few conversations last night I began to realize that trying to facilitate a conversation about "big data methodology" may be easier if it were known precisely what that is.

Yes exactly.

The thing is, the way coins101/stonehedge define "big data" in their proposal is already very oriented towards fin-tech specifically, but "big data" is a much more general term.

For example, search engines (like google) are representing "big data" in a much more general way.
And let's not forget that one main "field of interest" that has motivated me since I joined spreadcoin, is the creation of a decentralized searchengine.



So, when I say "big data" I always also mean a new kind of decentralized crypto-focused search engine, that will run on a network of servicenodes.

Just keep that in mind when talking about "big data". This term is HUGE!

So basically, you have "big data" wherever you ...

a) ...get access to enormous amounts of big data,
b) ...have a digestion process where those large amounts of data are prepared for user consumption.

Georgem, if Big Data is that broad, then the methodology doesn't only have to be through bitcoin transactions correct??

Also, what in particular, from a coding perspective is possible? Like where at the most basic level will service nodes be able to pull data from?

georgem

  • Tech Admin
  • ******
  • Posts: 880
    • View Profile
Re: (Decentralized) Big Data Methodology: Discussion of White Paper & More
« Reply #11 on: March 11, 2016, 09:58:47 pm »
Georgem, if Big Data is that broad, then the methodology doesn't only have to be through bitcoin transactions correct??

Also, what in particular, from a coding perspective is possible? Like where at the most basic level will service nodes be able to pull data from?

Yes, the most atomic thing in a cryptocurrency is a transaction.

But basically the whole internet is where we can pull our data from.

More specifically: network traffic and blockchains. But we could also scan websites for interesting data.

More generally: everything we already have access to (whatever can be openly accessed on the internet), plus maybe also what other people are willing to give us voluntarily (coins101's geolocation GPS data falls in that category)

From a coding perspective very much is possible.

To just gather/record data is easy.

To guarantee data integrity over the whole servicenodes network is much harder.

Plus once we have a steady data stream, how do we digest/reduce it to something we can offer back as a service?

The endresult must always be a digest (something smaller deduced from the giant dataset we dissected), so that it can be queried in a searchrequest and given back as a result.



rhinomonkey

  • Jr. Member
  • **
  • Posts: 64
    • View Profile
Re: (Decentralized) Big Data Methodology: Discussion of White Paper & More
« Reply #12 on: March 11, 2016, 10:26:13 pm »
Georgem, if Big Data is that broad, then the methodology doesn't only have to be through bitcoin transactions correct??

Also, what in particular, from a coding perspective is possible? Like where at the most basic level will service nodes be able to pull data from?

Yes, the most atomic thing in a cryptocurrency is a transaction.

But basically the whole internet is where we can pull our data from.

More specifically: network traffic and blockchains. But we could also scan websites for interesting data.

More generally: everything we already have access to (whatever can be openly accessed on the internet), plus maybe also what other people are willing to give us voluntarily (coins101's geolocation GPS data falls in that category)

From a coding perspective very much is possible.

To just gather/record data is easy.

To guarantee data integrity over the whole servicenodes network is much harder.

Plus once we have a steady data stream, how do we digest/reduce it to something we can offer back as a service?

The endresult must always be a digest (something smaller deduced from the giant dataset we dissected), so that it can be queried in a searchrequest and given back as a result.

This is fascinating, and has a grander magnitude of possibilities than I originally imagined.

Would dark web traffic be possible to monitor?

Or mining databases for their data?

I mean, this opens up a ton of options - at least ideas off the top of my head. And some of these things have fewer privacy concerns than transactional data.

Possible to monitor cyber attacks?

I'm just throwing out ideas that are coming to mind, so take them with a few grains of salt (or a salt mine).

georgem

  • Tech Admin
  • ******
  • Posts: 880
    • View Profile
Re: (Decentralized) Big Data Methodology: Discussion of White Paper & More
« Reply #13 on: March 12, 2016, 01:34:15 pm »
Or mining databases for their data?

Sure, if those databases are open to the public.

We can also gather trading data from all the exchange APIs out there, and archive them into the biggest historical tick dataset that ever existed.
But crypto exchanges come and go, so we would need a system where servicenode operators get to choose which exchange they want to scan, etc...

I mean, this opens up a ton of options - at least ideas off the top of my head. And some of these things have fewer privacy concerns than transactional data.

Yes exactly, it makes more sense to first make a few experiments with less concerning things that don't compromise anyone's privacy.

rhinomonkey

  • Jr. Member
  • **
  • Posts: 64
    • View Profile
Re: (Decentralized) Big Data Methodology: Discussion of White Paper & More
« Reply #14 on: March 29, 2016, 11:28:37 pm »
Here are the main references to types of data that could obtained from BTC within the whitepaper:

"Data generated by the block chain or relayed by full nodes could include auditing and accounting data for businesses using Bitcoin: in depth reviews of smart contracts; logisitics processing of orders placed using Bitcoin; consumer transaction trends; smart phone geolocation based transaction trends; and any number of data points from new services yet to be created using Bitcoin." (pg. 47)

Examples of other possible data acquisitions from BTC listed elsewhere in the WP (pg. 52):

- Global Transaction Trends Data
- Regional Transaction Trends Data
- Contracts Data
- Contracts Entered Into
- Contracts Completed
- Industry Specific Data
- Data on Transactions With Major Exchange Hubs
- Data on Transactions With Popular Services



Coins101 is currently looking into smart phone geolocation trends, as I understand it. We will be looking over the rest of these and determining / elaborating on viable methodologies.