Psst! Wanna look at some COVID-19 data?

Maybe you’ve heard enough about this lately. But if you’re trying to learn data science, whether with Python or R, this is a chance to do it with something very timely.

The European Centre for Disease Prevention and Control publishes new data daily here in csv, xml and json format. It also includes some R code for reading it.

This site has some basic Python code to help you get the data and put it into a data frame with pandas. If you don’t know any pandas, it’s a great way to work with tabular data in Python.

So knock yourself out, and let me know if you find something interesting.

More cheap books, from Packt

Packt is offering two ways to get your cheap book fix.

Mega-Bundles consist of books selected around a theme, 15 for $50.

Build your own bundles are 10 books for $40. There is a great variety to choose from, including everything this blog discusses and much more besides.

Some of the books are shortish – 100 pages or so. You can see the page counts for the individual books. Of course length and quality are two different things.

Time may be short, so check these out soon. If you recommend particular books, leave a comment.

No Starch Press books in new Humble Bundle

Get a bunch of ebooks cheap, in PDF, .mobi and .epub formats. But hustle – the promotion ends about 20 days from now, just before Christmas.

The books cover R, Python, Javascript, SQL – even F# and Haskell for you functional programming lovers. One promises Bayesian statistics “the fun way” with Star Wars, rubber ducks and LEGO – how can you possibly resist that? And for those of you who don’t foul up statistics enough on your own, there’s “Statistics Done Wrong” to help you out.

No ereader, no problem! If you’re reading this you can get one fast. The books are available as PDFs so you’re definitely covered on any machine that can read a PDF, and that’s about anything. The free Kindle app will handle the books too, with more conveniences and less obnoxious scrolling.

I usually read my ebooks on a Kindle or Nook. This gives me a nice legible screen right next to my main monitor to help me work. The Kindles will read .mobi format, Nooks like .epub. Not sure about Ipads or Google devices – haven’t tried them.

You might have to jump through some hoops to get the books loaded on the ereaders, but it can be done. I’ve done it by “sideloading”, and Kindles will let you email books below a certain size to your Kindle email account, assuming you have one.

An app like Calibre will let you convert formats and also serve as a reader on a PC or Mac, maybe even on Linux. This also makes it easier to grab code straight from the books. Yep, it’s free.

No excuses – get started!

Cheap Python book deal from Humble Bundle

Several books can be had and you can name your own price. All titles are from the No Starch Press, including the “Automate the Boring Stuff with Python” book that has a corresponding Udemy class.

Keep an eye on Humble Bundle because this isn’t the first time they’ve offered interesting Python bundles, and they have others too. You have a bit more than a week, so pounce right here!

Free Python help online via browser

Just starting Python and having problems? The answer may be Python Tutor. Watch the video at the link for more info, then click the ” Visualize your code and get live help now ” link to proceed.

Also check out Pyfiddle. It will let you use Python 2.6 or Python 3.6 -make sure to pick the right one. Load your code and run it right there, at least for beginner level. It also allegedly supports several other languages too. You can save your code and send a link to others so they can take a look.

As I write this both of the above are free.

If you have collaboration software then your tutor can share screens with you and maybe even control your machine. That’s a post for another time. If you’re on a campus be sure to check if you have this or other help options available.

And you’re going to remember all this generous help from friendly strangers and pay it forward, right?

Practice, practice!

Knowing language syntax is one thing. Really knowing a language means taking on challenges. But what challenges?

Here are some ideas. Presuming that you’re interested in data science, you need to be good at procuring, preparing and analyzing data. Let’s look at the procuring part.

What to work on?

Do you know your “why”? Are you looking at data science because it’s been called “sexy”? Do you just want to make a living? Or are you really animated by the possibilities? Know yourself – don’t start a career that might make you miserable just because it is hyped.

Let’s see if any of the following interest you.

Sports

It’s baseball season – how about some sabermetrics? Yes, baseball statistics are so well studied they even have their own name, and there are books and movies about it.

Maybe you’re a football fan (not soccer, or metric football as some of us call it). How about some player stats for your fantasy league? You can bet that NFL teams are using statistics to improve their results.

Alright soccer fans, here’s something for you. There may well be far better sources – I just found these from some casual googling with “download <sport> stats”.

Social issues

Maybe you’re more interested in human welfare and poverty. How about Gapminder? The late Hans Rosling did some terrific work, like the best stats you’ve ever seen. His recent book is well regarded by some very influential people.

What about crime?

More sites: government data, tuberculosis, the Center for Disease Control, the Guardian

All of the above are free currently. If you can pay, or meet various eligibility criteria as a legitimate researcher, many more are available.

Finance, entertainment…

Some financial information on Quandl is free.

Do you like movies? Here are movie reviews from Amazon.

I wonder – what did Facebook use to train its system to recognize pictures to filter out?

more

Other sites have assembled lists of good data sources. The most extensive one may well be at KDNuggets, which is terrific for all sorts of data science issues and is a permanent link here.

This post could go on forever, and eventually there will be a dedicated data page here on this site. But the point of this particular post is to get you something to work on to develop your coding and analysis skills. So let’s work on it.

What to do with the data?

A lot of data science work is nothing but reading, cleaning and manipulating data. You might not know what to do with data yet, but you can prep it for the people who do, so get good at this so you can apprentice with the people who do the advanced analyses. And in doing so you can develop your SAS, R, Python, command line and other coding skills.

Specifically, you want to how to:

  • Find and download data sources.
  • Read whatever data you find in whatever form.
  • Automate these processes and deal with problems that come up.
  • Process, filter and join the read data into forms tidy enough to support further analysis.

If you don’t have your own ideas…

Here are courses in R, Python and SAS from Coursera that can help.

To learn and practice R, try the Johns Hopkins data science program via Coursera. You’ll be installing and learning R and also learning other skills you’ll be using regularly.

For Python, check out this program from UCSD via Coursera. It assumes that you already know a little Python – if you don’t, look here. It uses Python 3.

For SAS, Coursera offers a class for beginners and one with more advanced statistics. You access SAS either by setting up a virtual machine (requiring a local installation) or by using the SAS Academic Edition (via a browser). The courses are here. These are not as extensive as the R and Python courses above, but SAS has only recently begun on Coursera and I think more is coming.

Last I knew the you could take the courses above for free, but expect to pay if you want to get documented certifications and grading. Incidentally, I have no commercial tie to Coursera and in fact pay for their services (although they’re welcome to give promotional consideration…). There are other sources, I’m just not as familiar with them.

Enough reading – let’s practice our code!

Learning Python

Now there’s an evergreen topic. This post will probably be updated periodically. Yes, there’s lots of room for disagreement, but we have to start somewhere – suggestions are welcome. We probably can agree that as one of my favorite cliches goes, you don’t have to be great to start, but you have to start to be great!

First off, know that there’s Python 2 and Python 3. For learning, I don’t know of any good reason to start with anything but Python 3 – by now most current courses teach it anyway, and 2 will reach “end of life” at the beginning of 2020. There’s a lot of older media for Python 2 though – pay attention to what you’re getting.

I like Zed Shaw’s “Learn Python the Hard Way” book. He’s very opinionated and has his detractors, but if you do it his way you’ll start building Python into your head and fingers. He covers Windows, Macs and Linux. He’ll have you installing Python if you don’t already have it, so be prepared to do that. He’ll also have you doing your work with a simple text editor and the command line, which everybody has access to. He also has a lot of videos for this and his other Learn Code the Hard Way projects.

You might also like “Automate the Boring Stuff With Python“. The author suggests that you use the IDLE editor that usually comes with Python installations. It’s also available as an online course on Udemy. Incidentally, about any course on Udemy will go on sale from time to time.

Another approach is Dr. Chuck’s “Python for Everybody” via Coursera. Last I looked the book and course were available gratis. He’ll have you doing some cool things before it’s over. As I recall the early parts won’t require you to have a Python installation, but to finish the specialization you’ll need it.

You can’t install Python on your machine? There are options like Python Anywhere. Or you can use the likes of Datacamp or Dataquest to get started with learning syntax, but at some point you’ll want access to a full installation to understand how to use Python in real applications.

Maybe it’s not for you, but for your kids. OK, there are a number of books about Python for kids, and places to turn them into code ninjas.

But get started, and do the work (unless you think greats like Michael Jordan got their skills from reading books and blogs). Do not cut and paste – type the code, make and fix the inevitable mistakes, keep trying. If you don’t like doing that maybe data science isn’t for you. It takes time for some things to internalize to the point where you’re productive enough to get things done.

Learn how to find help on Google, StackOverflow or elsewhere. Get used to it – the world doesn’t have time to babysit you. In the early learning stages at least, about everyone has made the same mistakes you’re making, and chances are that somebody has already documented the answer for you, so go looking for the answers and build research skills.

Enough for now. Time to get started!