MongoDB Queries

Connect to MongoDB

Connect via Mongo Shell to the database mongodbqueries on a DBI5 cluster. mongosh mongodb+srv://5hif:5hif@dbi5.4e2bjau.mongodb.net/ --username 5hif

Password is 5hif

use mongodbqueries

Vienna Museums (collection museum)

1.1) How many museums are listed in the collection?

db.museum.countDocuments()

count documents

1.2) Which museums can be found in the 4th district of Vienna (only address information of interest)?

db.museum.find({ "properties.BEZIRK": 4 }, { "properties.ADRESSE": 1, _id: 0 })
  • to only show specific fields, use projection (2nd parameter of find)
  • set _id: 0 to hide the _id field (shown by default)
  • use 1 to show a field, 0 to hide a field

museum addresses in 4th district

1.3) What are the first three museums sorted with alphabetical order by name? Make sure to list just the names and the ids

db.museum.find({}, { "properties.NAME": 1 }).sort({ "properties.NAME": 1 }).limit(3)
  • sorting in ascending order is 1, descending order is -1
  • limit(n) limits the number of documents returned to n

museum names and ids

1.4) Which museums are located in the 13th or 14th district? Sort the result first by district, then by name ascending

db.museum.find({ "properties.BEZIRK": { $in: [13, 14] } }).sort({ "properties.BEZIRK": 1, "properties.NAME": 1 })

or with $or operator:

db.museum.find({ $or: [ { "properties.BEZIRK": 13 }, { "properties.BEZIRK": 14 } ] }).sort({ "properties.BEZIRK": 1, "properties.NAME": 1 })

museum addresses in 4th district

1.5) How many museums are located outside the 1st district?

db.museum.countDocuments({ "properties.BEZIRK": { $ne: 1 } })
  • $ne matches all values that are not equal to a specified value

museum addresses outside 1st district

1.6) How much is the number of museums outside the inner districts 1 to 9?

db.museum.countDocuments({ "properties.BEZIRK": { $gt: 9 } })

museum addresses outside inner districts

1.7) In which districts do you find specific museums (“Bezirksmuseen”, district numbers ONLY)?

db.museum.distinct("properties.BEZIRK", { "properties.NAME": { $regex: "Bezirksmuseum" } })

museum districts for Bezirksmuseen

1.8) Which museums have an address where the house number is 5?

db.museum.find({ "properties.ADRESSE": { $regex: " 5$" } }, { "properties.NAME": 1, "properties.ADRESSE": 1, _id: 0 })
  • $regex provides regular expression capabilities for pattern matching strings in queries
  • $ matches the end of a string

museum addresses with house number 5

Which districts, in ascending order, have 5 or more, and how many, museums?

db.museum.aggregate([
  { $group: { _id: "$properties.BEZIRK", museumCount: { $sum: 1 } } },
  { $match: { museumCount: { $gte: 5 } } },
  { $sort: { _id: 1 } }
])
  • $group groups input documents by the specified _id expression and applies the accumulator expressions, properties.BEZIRKisthefieldtogroupby(theproperties.BEZIRK is the field to group by (the indicates a field)
  • $sum calculates the sum of numeric values
  • $match filters the documents to pass only the documents that match the specified condition(s)
  • $sort sorts the documents based on the specified field(s) (1 for ascending, -1 for descending)

districts with 5 or more museums - result

Finding the top 3 districts with the most museums

db.museum.aggregate([
  { $group: { _id: "$properties.BEZIRK", museumCount: { $sum: 1 } } },
  { $sort: { museumCount: -1 }},
  { $limit: 3 }
])
  • $limit limits the number of documents returned to n

districts with top 3 museums - result


Twitter (collection tweets)

2.1) How many tweets in the collection origin from users with friends’ count in 80,90?

db.tweets.countDocuments({ "user.friends_count": { $gte: 80, $lte: 90 } })
  • $gte matches values that are greater than or equal to a specified value
  • $lte matches values that are less than or equal to a specified value

other filter operators:

  • $in matches any of the values specified in an array
  • $ne matches all values that are not equal to a specified value

tweets from users with friends count in 80,90

2.2) Are there any documents that miss the “source” field?

db.tweets.countDocuments({ "source": { $exists: false } })
  • $exists matches documents that have the specified field

tweets missing source field

2.3) How many tweets have exactly three hashtags?

db.tweets.countDocuments({ "entities.hashtags": { $size: 3 } })
  • $size matches any array with the specified number of elements

tweets with exactly three hashtags

2.4) Give me the tweets contain a hashtag “love” (case-sensitive)?

db.tweets.find({ "entities.hashtags.text": "love" }, { "text": 1, "entities.hashtags": 1 })

tweets with hashtag love (case-sensitive)

2.5) How many tweets contain a hashtag “love” (case-insensitive)?

db.tweets.countDocuments({ "entities.hashtags.text": { $regex: "love", $options: "i" } })
  • i makes the search case-insensitive (default is case-sensitive)

tweets with hashtag love (case-insensitive)

2.6) What is the number of tweets per user language in descending order?

db.tweets.aggregate([
  { $group: { _id: "$user.lang", tweetCount: { $sum: 1 } } },
  { $sort: { tweetCount: -1 } }
])

tweets per user language - result

2.7) Extend 2.6 and include the average number of followers and the size in bytes of the longest tweet text per user language

db.tweets.aggregate([
  { $group: { _id: "$user.lang", tweetCount: { $sum: 1 }, avgFollowers: { $avg: "$user.followers_count" }, 
    longestTweetSize: { $max: { $strLenBytes: { $ifNull: ["$text", ""] } } } }},
  { $sort: { tweetCount: -1 } }
])
  • $avg calculates the average of numeric values
  • $max returns the maximum value
  • $strLenBytes returns the number of bytes in a string

tweets per user language with avg followers and longest tweet size - result

Navigation