Beta

×

Welcome to the Slashdot Beta site -- learn more here. Use the link in the footer or click here to return to the Classic version of Slashdot.

Thank you!

Before you choose to head back to the Classic look of the site, we'd appreciate it if you share your thoughts on the Beta; your feedback is what drives our ongoing development.

Beta is different and we value you taking the time to try it out. Please take a look at the changes we've made in Beta and  learn more about it. Thanks for reading, and for making the site better!

Ask Slashdot: How to set up a big data/data science project portfolio?

Anonymous Coward writes | about 2 years ago

1

An anonymous reader writes "I am a mid-career IT professional in the middle of a transition from IT to a domain within the biological sciences. My planned academic route to the target new domain will take at least 3-5 years to finish. In the interim, I want to work in (and earn from) the IT domain of Big Data/Data Science, since that is more aligned with the skills I need in my target new domain: data analysis, visualization, signal processing, imaging, simulation etc. The problem is that apart from early career stints, I've very little and only surface level experience with these topics. So I want to ask Slashdot for suggestions on the tasks Ive set myself to accomplish this transition. Specifically:

  1. What are the foundational topics I need to learn. What parts of math, statistics, machine learning, text analysis, scientific programming...?
  2. What books to read?
  3. What courses (preferably open/online) to take?
  4. I want to set up an online portfolio of big-data projects that I work on to showcase skills that I acquire in this domian. What are some of the more challenging, topical and novel applications areas and open problems to showcase in a portfolio, such that it is distinctive and interesting. E.g., consumer behavior, neuro-/bio-informatics, socio-economic trends ...
  5. How do I find sources of open/non-propreitary data sets to use for my portfolio projects?
  6. What hosting resources do I need to set up a portfolio of big-data projects? Any suggestions on specific hosting providers?
  7. What tools should I strive to learn (preferably FOSS): E.g., Hadoop, R, Octave, Python ...?
  8. What are the industry and trade bodies that cater to big-data professionals?
  9. How do I acquire mentor(s)/guide(s) who can informally guide me through the above skill acquisition and portfolio creation tasks?
  10. Any othe Data Science related wisdom
"

cancel ×

1 comment

Sorry! There are no comments related to the filter you selected.

Start Small (1)

jda104 (1652769) | about 2 years ago | (#40531157)

I'm in Computational Biology, and I'd say that the most valuable skills you should learn (and the ones most often seen in this field) are more mathematical and/or statistical than "big data." Understanding how to properly normalize your data or calculate a p-value will take you much further than being able to spin up a 100-node Hadoop instance in most labs.

I think you should spend the first year on your home PC. Download RStudio and work through a few R Tutorials, then find some data/questions that interest you and poke around. Post your results to a blog so that you'll have something to show for the time you spend and release the code on GitHub so that it's open to future employers.

I'd say get comfortable with a data analysis language (R will probably serve you best currently), and a data manipulation language (Python, Perl, etc.) and start asking questions of data that's around you (your email archives, a log of Internet sites you visit, your spending records, etc.). Once you've found that well-designed algorithms can't handle some dataset you're looking at, then look at Hadoop and other "big-data" projects.

When you're ready, I'd steer your towards Next-Generation Sequencing data. Most of the bioinformatics questions being asked (and funded) now have at least some interaction with NGS, and analysts capable of working with that data are highly valuable. Check out the 1,000 Genomes project when you're ready to start playing with free Sequencing data.
Check for New Comments
Slashdot Login

Need an Account?

Forgot your password?
or Connect with...

Don't worry, we never post anything without your permission.

Submission Text Formatting Tips

We support a small subset of HTML, namely these tags:

  • b
  • i
  • p
  • br
  • a
  • ol
  • ul
  • li
  • dl
  • dt
  • dd
  • em
  • strong
  • tt
  • blockquote
  • div
  • quote
  • ecode

"ecode" can be used for code snippets, for example:

<ecode>    while(1) { do_something(); } </ecode>